Categories
Data Science Statistics

The vtreat package two ways

We recently did a couple of talks about our vtreat data treatment package: one for the Python version, and one for the R version. If you are fitting machine learning models on messy real-world data, then you might find vtreat useful. Do check out one of the introductory talks below. Preparing Messy Data for Supervised […]

Categories
Data Science Statistics

VTREAT library up on CRAN

Our R variable treatment library vtreat has been accepted by CRAN! The purpose of the vtreat library is to reliably prepare data for supervised machine learning. We try to leave as much as possible to the machine learning algorithms themselves, but cover most of the truly necessary typically ignored precautions. The library is designed to […]

Categories
Data Science Statistics Writing

New article up on Win-Vector — Vtreat: a package for variable treatment

We are writing an R package to implement some of the data treatment practices that we discuss in Chapters 4 and 6 of Practical Data Science with R. There’s an article describing the package up on the Win-Vector blog: When you apply machine learning algorithms on a regular basis, on a wide variety of data […]