Categories
Data Science Statistics Writing

A Couple Recent Win-Vector Posts

I’ve been neglecting to announce my Win-Vector posts here — but I’ve not stopped writing them. Here are the two most recent: Wanted: A Perfect Scatterplot (with Marginals) In which I explore how to make what Matlab calls a “scatterhist:” a scatterplot, with marginal distribution plots on the sides. My version optionally adds the best […]

Categories
Data Science Statistics Writing

Two New Articles

Two new articles, one on the Win-Vector blog, plus a guest post on the Fliptop blog: Random Test/Train Split is not Always Enough discusses the potential limitations of a randomized test/train split when your training data and future data are not truly exchangeable, due to time dependent effects, serial correlation, concept changes, or data-grouping. Don’t […]

Categories
Data Science Musings Statistics Writing

Recent post on Win-Vector blog, plus some musings on Audience

  I put a new post up on Win-Vector a couple of days ago called “The Geometry of Classifiers”, a follow-up post to a recent paper by Fernandez-Delgado, et al. that investigates several classifiers against a body of data sets, mostly from the UCI Machine Learning Repository. Our article follows up the study with seven […]

Categories
Data Science Statistics Writing

New article up on Win-Vector — Vtreat: a package for variable treatment

We are writing an R package to implement some of the data treatment practices that we discuss in Chapters 4 and 6 of Practical Data Science with R. There’s an article describing the package up on the Win-Vector blog: When you apply machine learning algorithms on a regular basis, on a wide variety of data […]

Categories
Musings Writing

Follow me via RSS!

I went back to using RSS to follow blogs and other websites recently; I don’t know why I ever stopped. My email doesn’t get clogged by notifications anymore, and I don’t lose blog updates in the ever-flowing stream of Twitter or Facebook or the WordPress reader. I can follow any blog on any platform as […]