Advanced Data Preparation for Supervised Machine Learning
Talk, Why R? Webinar, Online
An introduction to the principles of the vtreat
package for fitting machine learning models on messy real-world data, and to its R implementation.
Talk, Why R? Webinar, Online
An introduction to the principles of the vtreat
package for fitting machine learning models on messy real-world data, and to its R implementation.
Talk, PyData Los Angeles 2019, Los Angeles, California
An introduction to the principles of the vtreat
package for fitting machine learning models on messy real-world data, and to its Python implementation.
Talk, Bay Area R Users Group (BARUG), Oakland, California
A preview of our then about-to-be released second edition of Practical Data Science with R. We discussed the direction that the R community had taken since our first edition, and how this affected the second edition.
Talk, ODSC West 2017, San Francisco Bay Area
In this talk, we go back to fundamentals and look closely at some usually unexamined assumptions about statistics and machine learning. We debunk “myths” that arise in common data science tasks, and offer potential fixes to issues that can arise.
Talk, Women Who Code Silicon Valley, Palo Alto, California
A discussion of nested predictive models and how to properly fit them.
Talk, San Francisco Data Science ODSC Meetup, San Francisco, California
John Mount and I demonstrate methods to reliably evaluate machine learning models using R and R graphics.
Talk, R Day, Strata + Hadoop World 2016, San Jose, California
John Mount and I demonstrate a number of techniques, R packages, and code for validating predictive models. Part of Strata+Hadoop R Day.
Talk, Bay Area Women in Machine Learning and Data Science Meetup, San Francisco, California
A brief introduction to the ideas behind differential privacy, and a review of how differential privacy can be used to enable safer re-use of holdout data in machine learning.
Workshop, ODSC West 2015, San Francisco Bay Area
This workshop (co-presented with John Mount) lays out the fundamentals of preparing data and provides interactive demonstrations in the open source R analysis environment.