Categories
Data Science Science Statistics

A Trunkful of Win-Vector R Packages

If you follow the Win-Vector blog, you know that we have developed a number of R packages that encapsulate our data science working process and philosophy. The biggest package, of course, is our data preparation package, vtreat, which implements many of the data treatment principles that I describe in my white-paper, here.

Categories
Data Science Statistics

New Win-Vector Package replyr: for easier dplyr

Using dplyr with a specific data frame, where all the columns are known, is an effective and pleasant way to execute declarative (SQL-like) operations on dataframes and dataframe-like objects in R. It also has the advantage of working not only on local data, but also on dplyr-supported remote data stores, like SQL databases or Spark. […]

Categories
Data Science Statistics

Upcoming Talks

I will be speaking at the Women who Code Silicon Valley meetup on Thursday, October 27. The talk is called Improving Prediction using Nested Models and Simulated Out-of-Sample Data. In this talk I will discuss nested predictive models. These are models that predict an outcome or dependent variable (called y) using additional submodels that have […]

Categories
Data Science Musings Statistics

Practical Data Science with R now in Chinese Translation!

Our publisher, Manning, has kindly sent us complimentary copies of the new Simplified Chinese translation of Practical Data Science with R. We can’t read it, of course, but it’s cool (and a bit intimidating) to see what our work looks like in another language and character set. Here are a couple of peeks inside, just […]

Categories
Data Science Statistics

Principal Components Regression: A Three-Part Series and Upcoming Talk

Well, since the last time I posted here, the Y-Aware PCR series has grown to three parts! I’m pleased with how it came out. The three parts are as follows: Part 1: A review of standard “x-only” PCR, with a worked example. I also show some issues that can arise with the standard approach. Part […]