Principal Components Regression: A Two-Part Series
May 17, 2016
I’m kicking off a two-part series on Principal Components Regression on the Win-Vector blog today. The first article demonstrates some of the pitfalls of using standard Principal Components Analysis in a predictive modeling context. John Mount has posted an introduction to my first article on the Revolutions blog, explaining our motivation in developing this series.
The second article will demonstrate some y-approaches that alleviate the issues that we point out in Part 1.
In principal components regression (PCR), we use principal components analysis (PCA) to decompose the independent (x) variables into an orthogonal basis (the principal components), and select a subset of those components as the variables to predict y. PCR and PCA are useful techniques for dimensionality reduction when modeling, and are especially useful when the independent variables are highly colinear.
Generally, one selects the principal components with the highest variance — that is, the components with the largest singular values — because the subspace defined by these principal components captures most of the variation in the data, and thus represents a smaller space that we believe captures most of the qualities of the data. Note, however, that standard PCA is an “x-only” decomposition, and as Jolliffe (1982) shows through examples from the literature, sometimes lower-variance components can be critical for predicting y, and conversely, high variance components are sometimes not important.