Musings of a Technical or Professional Nature




October 30, 2022

I’ve been inspired to start using for some microblogging, and not just about data science. In fact, probably mostly not about data science. Why not? I have the site, after all. Read more



A Trip to the Virtual Attic

June 06, 2020

When the world feels like it’s falling apart around you, it feels good to solve little problems that are completely under your control. And that’s what I’ve been doing this past week. This was originally posted at Multo. Read more

Back to Where I was Before (Almost)

May 30, 2020

Back in the good old days, was a static site that I maintained myself, in pure HTML. But that (to me) was so much of a hassle that I never did even the little bit of site maintenance that the website required. So I moved it to Read more

The vtreat package two ways

May 19, 2020

We recently did a couple of talks about our vtreat data treatment package: one for the Python version, and one for the R version. If you are fitting machine learning models on messy real-world data, then you might find vtreat useful. Do check out one of the introductory talks below. Read more



Balancing Classes Before Training Classifiers - Addressing a Folk Theorem

February 27, 2015

We’ve been wanting to get more into training over at Win-Vector, but I don’t want to completely give up client work, because clients and their problems are often the inspiration for cool solutions – and good blog articles. Working on the video course for the last couple of months has given me some good ideas, too. Read more


Recent post on Win-Vector blog, plus some musings on Audience

December 21, 2014

I put a new post up on Win-Vector a couple of days ago called "The Geometry of Classifiers", a follow-up post to a recent paper by Fernandez-Delgado, et al. that investigates several classifiers against a body of data sets, mostly from the UCI Machine Learning Repository. Our article follows up the study with seven additional additional classifier implementations from scikit-learn and an interactive Shiny app to explore the results. Read more

Design, Problem Solving, and Good Taste

November 25, 2014

I ran across this essay recently on the role of design standards for scientific data visualization. The author, Jure Triglav, draws his inspiration from the creation and continued use of the NYCTA Graphics Standards, which were instituted in the late 1960s to unify the signage for the New York City subway system. Read more

A Moment’s Digression

October 21, 2014

I had a data nerd moment while reading a novel the other day. I got in an argument with the book. But I think the book started it. It's a frivolous discussion, probably, but sometimes those are the most fun. Read more


Big News! Practical Data Science with R is content complete!

December 19, 2013

It's been a while since I've posted here, but I have good news: the last appendix has gone to the editors. The book is now content complete. What a relief! We are hoping to release the book late in the first quarter of next year. In the meantime, you can still get early drafts of our chapters through Manning’s Early Access program, if you haven’t yet. The link is here. Read more

Goldbach’s Celestial Atlas

July 29, 2013

Christian Goldbach, Prussian mathematician. Probably most famous for the Goldbach conjecture, one of the oldest unsolved problems in mathematics:

Every even integer greater than 2 can be expressed as the sum of two primes.
Read more

Dragons of Probability

July 11, 2013

"No insults, please!" said Pugg. "For I am not your usual uncouth pirate, but refined and with a Ph.D. and therefore extremely high-strung."
-- from "The Sixth Sally, or how Trurl and Klapaucius Created a Demon of the Second Kind to Defeat the Pirate Pugg"
Read more

Mathematics versus Computer Science

June 26, 2013

…until the development of computers the possibility of dealing successfully with the complex itself was never really envisaged. Perhaps the most successful substitute for such a possibility, as well as the nearest approach to it, came in mathematics. … To find the simple in the complex, the finite in the infinite -- that is not a bad description of the aim and essence of mathematics.</p>
Read more

Bon Mots from Professor Rota

March 19, 2013

As I've posted previously, we are writing a data science book. The preview of the first chapter of our book should come out in about a month or so. We are almost finished with the revisions to the first four chapters, and we've started refining the outline of the next three. Exciting! It happens that I've been rereading mathematician Gian-Carlo Rota's collection of essays, Indiscrete Thoughts, and I've found a few passages that really speak to me, now that I'm in book-writing mode. Enjoy. Read more

What’s Wrong with a Low(er)-Stress Job?

January 05, 2013

So there's this article that's been making the rounds called "The 10 Least Stressful Jobs of 2013"; perhaps you've read it. I don't normally bother with articles like that, but it came to my attention because some of my old graduate-school friends (who are professors) threw a mini-rant on social media over the fact that University Professor is the Number One least stressful job of the year, according to the article. And just now, I tripped over a blog post where a librarian takes umbrage over the fact that they also on the list. Read more


On Balance

December 18, 2012

One of my favorite cheesy movies is a gem from 1984 called The Adventures of Buckaroo Banzai Across the 8th Dimension. For those who haven't seen it, Buckaroo Banzai is a brilliant young neurosurgeon and particle physicist who spends his days conducting cutting-edge research. At night, he and his research colleagues -- all engineers and scientists and doctors -- rock New Jersey as a band called the Hong Kong Cavaliers. In between the brilliant science and the rock-star night life, the Cavaliers find time to save the world from an alien invasion led by none other than John Lithgow. Read more

Good News: We’re Writing a Book!

December 06, 2012

I’m happy to announce that John Mount and I have just signed a contract with Manning Publications to write a book on Data Science. We have both talked about doing this for quite a while, and we are excited that we finally have the opportunity. Read more

I Write, Therefore I Think

October 11, 2012

I came across an interesting article in The Atlantic a little while back that discussed the connection between writing and thinking. New Dorp, a Staten Island high school in a poor and working-class neighborhood, was able to improve student performance when they realized that their students couldn’t write. These underperforming students often could read and could do math. The majority of them were well-behaved, and seemed to want to learn. Yet they couldn't pass standard proficiency tests, and couldn't graduate. All because they couldn't form complex sentences. Read more

On Being a Data Scientist

September 19, 2012

When people ask me what it means to be a data scientist, I used to answer, "it means you don't have to hold my hand." By which I meant that as a data scientist (a consulting data scientist), I can handle the data collection, the data cleaning and wrangling, the analysis, and the final presentation of results (both technical and for the business audience) with a minimal amount of assistance from my clients or their people. Not no assistance, of course, but little enough that I'm not interfering too much with their day-to-day job. Read more

On Writing Technical Articles for the Nonspecialist

September 04, 2012

I came across a post from Emily Willingham the other day: "Is a PhD required for Good Science Writing?". As a science writer with a science PhD, her answer is: is it not required, and it can often be an impediment. I saw a similar sentiment echoed once by Lee Gutkind, the founder and editor of the journal Creative Nonfiction. I don't remember exactly what he wrote, but it was something to the effect that scientists are exactly the wrong people to produce literary, accessible writing about matters scientific. Read more