A Couple Recent Win-Vector Posts

I’ve been neglecting to announce my Win-Vector posts here — but I’ve not stopped writing them. Here are the two most recent:

Wanted: A Perfect Scatterplot (with Marginals)

In which I explore how to make what Matlab calls a “scatterhist:” a scatterplot, with marginal distribution plots on the sides. My version optionally adds the best linear fit to the scatterplot:


I also show how to do with it with ggMarginal(), from the ggExtra package.

Working with Sessionized Data 1: Evaluating Hazard Models

This is the start of a mini-series of posts, discussing the analysis of sessionized log data.


Log data is a very thin data form where different facts about different individuals are written across many different rows. Converting log data into a ready for analysis form is called sessionizing. We are going to share a short series of articles showing important aspects of sessionizing and modeling log data. Each article will touch on one aspect of the problem in a simplified and idealized setting. In this article we will discuss the importance of dealing with time and of picking a business appropriate goal when evaluating predictive models.

Click on the links to read.


