Two New Articles
January 17, 2015
Two new articles, one on the Win-Vector blog, plus a guest post on the Fliptop blog:
Random Test/Train Split is not Always Enough discusses the potential limitations of a randomized test/train split when your training data and future data are not truly exchangeable, due to time dependent effects, serial correlation, concept changes, or data-grouping.
Don’t Use Black-Box Testing to Select a Predictive Lead Scoring Vendor is a commissioned piece for one of our clients, and hosted on their blog. This is related to the first post: if you are running an evaluation of a potential vendor’s decision system, then that test should reflect the environment in which the decision system will be deployed. In particular, if your data has any of the non-exchangeablity properties that we discuss above, then your evaluation setup should reflect that.