I have a new article up on Win-Vector, discussing differential privacy and the new recent results on applying differential privacy to enable reuse of holdout data in machine learning. Differential privacy was originally developed to facilitate secure analysis over sensitive data, with mixed success. Itâ€™s back in the news again now, with exciting results from […]

# Tag: machine learning

I’ve been neglecting to announce my Win-Vector posts here — but I’ve not stopped writing them. Here are the two most recent: Wanted: A Perfect Scatterplot (with Marginals) In which I explore how to make what Matlab calls a “scatterhist:” a scatterplot, with marginal distribution plots on the sides. My version optionally adds the best […]

We’ve been wanting to get more into training over at Win-Vector, but I don’t want to completely give up client work, because clients and their problems are often the inspiration for cool solutions — and good blog articles. Working on the video course for the last couple of months has given me some good ideas, […]

## New Data Science Video Course

John Mount and I are proud to announce our new data science video course, Introduction to Data Science, now available through Udemy! The course is 28 lectures, totaling over five hours long. We cover the use of common predictive modeling algorithms in R, including linear and logistic regression, random forest, and gradient boosting. We also […]

## Two New Articles

Two new articles, one on the Win-Vector blog, plus a guest post on the Fliptop blog: Random Test/Train Split is not Always Enough discusses the potential limitations of a randomized test/train split when your training data and future data are not truly exchangeable, due to time dependent effects, serial correlation, concept changes, or data-grouping. Don’t […]