John Oliver on Scientific Studies

An excellent rant from John Oliver on the way science stories are handled in the media, and on the need for some healthy skepticism. And the need to track down sources for the studies yourself, to the extent that this is possible.

Also, I love the “TODD Talks” skit at the end.

Upcoming Webinar: Data Preparation with R

I’m happy to announce my upcoming webinar, sponsored by Microsoft Data Science:

Data Preparation with R
Thursday, March 17, 2016 10:00 A.M. – 11:00 A.M. (Pacific time)

Data quality is the single most important item to the success of your data science project. Preparing data for analysis is one of the most important, laborious and yet, neglected aspects of data science. Many of the routine steps can be automated in a principled manner. This webinar will lay out the statisitcal fundamentals of preparing data. Our speaker, Nina Zumel, principal consultant and co-founder of Win-Vector, LLC, will cover what goes wrong with data and how you can detect the problems and fix them.

Details and registration here. I’m looking forward to it!

Starting Strong in 2016

464px We Can Do It

We had a busy January here at Win-Vector, and it shows no sign of abating. John and I had the pleasure of attending the first Shiny Developers Conference, held by RStudio and hosted at Stanford University (see here for a review of the conference, by a fellow attendee). The event energized us to resharpen our Shiny skills, and I’ve put together a little gallery of the Shiny apps that we’ve developed and featured on the Win-Vector blog. It’s a small gallery at the moment, but I expect it will grow.

In addition, I gave a repeat presentation of the Differential Privacy talk that I gave to the Bay Area Women in Data Science and Machine Learning Meetup last December, and am gearing up for a planned webinar on Prepping Data for Analysis in R (the webinar has not yet been announced by the hosts — more details soon).

And I’ve managed to slip in a couple of Win-Vector blog posts, too:

Using PostgreSQL in R: A quick how-to

Finding the K in K-means by Parametric Bootstrap (with Shiny app!)

We are also looking forward to giving a presentation at the ODSC San Francisco Meetup on March 31, and participating in the R Day all-day tutorial at Strata/Hadoop World Santa Clara on March 29.

2016 is shaping up to be a good year.


Image: World War II era poster by J. Howard Miller. Source: Wikipedia

Upcoming Appearances

We have two public appearances coming up in the next few weeks:

Workshop at ODSC, San Francisco – November 14

John and I will be giving a two-hour workshop called Preparing Data for Analysis using R: Basic through Advanced Techniques. We will cover key issues in this important but often neglected aspect of data science, what can go wrong, and how to fix it. This is part of the Open Data Science Conference (ODSC) at the Marriot Waterfront in Burlingame, California, November 14-15. If you are attending this conference, we look forward to seeing you there!

You can find an abstract for the workshop, along with links to software and code you can download ahead of time, here.

An Introduction to Differential Privacy as Applied to Machine Learning: Women in ML/DS – December 2

I will give a talk to the Bay Area Women in Machine Learning & Data Science Meetup group, on applying differential privacy for reusable hold-out sets in machine learning. The talk will also cover the use of differential privacy in effects coding (what we’ve been calling “impact coding”) to reduce the bias that can arise from the use of nested models. Information about the talk, and the meetup group, can be found here.

I’m looking forward to these upcoming appearances, and I hope you can make one or both of them.

Popular Articles on Win-Vector

NewImage19

John has just put up an article on the Win-Vector blog, highlighting some of our popular series of articles, as well as our more popular posts. If you like the articles that I point to on this blog, check out some of the other posts written by John, too.

As readers have surely noticed the Win-Vector LLC blog isn’t a stream of short notes, but instead a collection of long technical articles. It is the only way we can properly treat topics of consequence.

What not everybody may have noticed is a number of these articles are serialized into series for deeper comprehension.

Our series include:

Check out the original article for more details about these series, and for a pointer to our page of popular posts.

We’ve also updated the company website, so please do visit that, too.