I’ve been neglecting to announce my Win-Vector posts here — but I’ve not stopped writing them. Here are the two most recent:
In which I explore how to make what Matlab calls a “scatterhist:” a scatterplot, with marginal distribution plots on the sides. My version optionally adds the best linear fit to the scatterplot:
I also show how to do with it with
ggMarginal(), from the
This is the start of a mini-series of posts, discussing the analysis of sessionized log data.
Log data is a very thin data form where different facts about different individuals are written across many different rows. Converting log data into a ready for analysis form is called sessionizing. We are going to share a short series of articles showing important aspects of sessionizing and modeling log data. Each article will touch on one aspect of the problem in a simplified and idealized setting. In this article we will discuss the importance of dealing with time and of picking a business appropriate goal when evaluating predictive models.
Click on the links to read.