Two New Articles

Two new articles, one on the Win-Vector blog, plus a guest post on the Fliptop blog:

Random Test/Train Split is not Always Enough discusses the potential limitations of a randomized test/train split when your training data and future data are not truly exchangeable, due to time dependent effects, serial correlation, concept changes, or data-grouping.

Don’t Use Black-Box Testing to Select a Predictive Lead Scoring Vendor is a commissioned piece for one of our clients, and hosted on their blog. This is related to the first post: if you are running an evaluation of a potential vendor’s decision system, then that test should reflect the environment in which the decision system will be deployed. In particular, if your data has any of the non-exchangeablity properties that we discuss above, then your evaluation setup should reflect that.

About nzumel
I dance. I'm a data scientist. I'm a dancing data scientist. In my spare time, I like to read folklore (and research about folklore), ghost stories, random cognitive science papers, and to sometimes blog about it all.

