Talks and Presentations

Preparing Messy Data For Supervised Learning (Python)

December 29, 2019

Talk, PyData Los Angeles 2019, Los Angeles, California

An introduction to the principles of the vtreat package for fitting machine learning models on messy real-world data, and to its Python implementation.

Practical Data Science with R

September 03, 2019

Talk, Bay Area R Users Group (BARUG), Oakland, California

A preview of our then about-to-be released second edition of Practical Data Science with R. We discussed the direction that the R community had taken since our first edition, and how this affected the second edition.

Myths of Data Science: Things You Should and Should Not Believe

November 02, 2017

Talk, ODSC West 2017, San Francisco Bay Area

In this talk, we go back to fundamentals and look closely at some usually unexamined assumptions about statistics and machine learning. We debunk “myths” that arise in common data science tasks, and offer potential fixes to issues that can arise.

Statistically Validate Models with R

March 31, 2016

Talk, San Francisco Data Science ODSC Meetup, San Francisco, California

John Mount and I demonstrate methods to reliably evaluate machine learning models using R and R graphics.

Validating Models in R

March 29, 2016

Talk, R Day, Strata + Hadoop World 2016, San Jose, California

John Mount and I demonstrate a number of techniques, R packages, and code for validating predictive models. Part of Strata+Hadoop R Day.

An Introduction to Differential Privacy as Applied to Machine Learning

December 02, 2015

Talk, Bay Area Women in Machine Learning and Data Science Meetup, San Francisco, California

A brief introduction to the ideas behind differential privacy, and a review of how differential privacy can be used to enable safer re-use of holdout data in machine learning.

Prepping Data for Analysis Using R

November 18, 2015

Workshop, ODSC West 2015, San Francisco Bay Area

This workshop (co-presented with John Mount) lays out the fundamentals of preparing data and provides interactive demonstrations in the open source R analysis environment.