Data Science Statistics

The vtreat package two ways

We recently did a couple of talks about our vtreat data treatment package: one for the Python version, and one for the R version. If you are fitting machine learning models on messy real-world data, then you might find vtreat useful. Do check out one of the introductory talks below. Preparing Messy Data for Supervised […]

Data Science Musings Statistics Writing

Recent post on Win-Vector blog, plus some musings on Audience

  I put a new post up on Win-Vector a couple of days ago called “The Geometry of Classifiers”, a follow-up post to a recent paper by Fernandez-Delgado, et al. that investigates several classifiers against a body of data sets, mostly from the UCI Machine Learning Repository. Our article follows up the study with seven […]