Categories
Data Science Statistics

New Win-Vector Package replyr: for easier dplyr

Using dplyr with a specific data frame, where all the columns are known, is an effective and pleasant way to execute declarative (SQL-like) operations on dataframes and dataframe-like objects in R. It also has the advantage of working not only on local data, but also on dplyr-supported remote data stores, like SQL databases or Spark. […]

Categories
Data Science Statistics

Principal Components Regression: A Three-Part Series and Upcoming Talk

Well, since the last time I posted here, the Y-Aware PCR series has grown to three parts! I’m pleased with how it came out. The three parts are as follows: Part 1: A review of standard “x-only” PCR, with a worked example. I also show some issues that can arise with the standard approach. Part […]

Categories
Data Science Statistics

Principal Components Regression: A Two-Part Series

I’m kicking off a two-part series on Principal Components Regression on the Win-Vector blog today. The first article demonstrates some of the pitfalls of using standard Principal Components Analysis in a predictive modeling context. John Mount has posted an introduction to my first article on the Revolutions blog, explaining our motivation in developing this series. […]

Categories
Data Science

Starting Strong in 2016

We had a busy January here at Win-Vector, and it shows no sign of abating. John and I had the pleasure of attending the first Shiny Developers Conference, held by RStudio and hosted at Stanford University (see here for a review of the conference, by a fellow attendee). The event energized us to resharpen our […]

Categories
Data Science Statistics

VTREAT library up on CRAN

Our R variable treatment library vtreat has been accepted by CRAN! The purpose of the vtreat library is to reliably prepare data for supervised machine learning. We try to leave as much as possible to the machine learning algorithms themselves, but cover most of the truly necessary typically ignored precautions. The library is designed to […]