Posts by Category

Back to Where I was Before (Almost)

Published:

Back in the good old days, ninazumel.com was a static site that I maintained myself, in pure HTML. But that (to me) was so much of a hassle that I never did even the little bit of site maintenance that the website required. So I moved it to wordpress.com. Read more

Upcoming Talk, USF Data Science Seminar

Published:

John Mount and I will be giving a talk for the online University of San Francisco Seminar Series in Data Science: Read more

The vtreat package two ways

Published:

We recently did a couple of talks about our vtreat data treatment package: one for the Python version, and one for the R version. If you are fitting machine learning models on messy real-world data, then you might find vtreat useful. Do check out one of the introductory talks below. Read more

WVPlots and Color Controls

Published:

I've put a new release of the WVPlots package up on CRAN. This release adds consistent palette and/or other color controls to most of the functions in the package. Read more

Popular Articles on Win-Vector

Published:

John has just put up an article on the Win-Vector blog, highlighting some of our popular series of articles, as well as our more popular posts. If you like the articles that I point to on this blog, check out some of the other posts written by John, too. Read more

New on Win-Vector: A Simpler Explanation of Differential Privacy

Published:

I have a new article up on Win-Vector, discussing differential privacy and the new recent results on applying differential privacy to enable reuse of holdout data in machine learning. Read more

New on Win-Vector: Checking your Data for Signal

Published:

I have a new article up on the Win-Vector Blog, on checking your input variables for signal: Read more

New on Win-Vector: Variable Selection for Sessionized Data

Published:

I’ve just put up the next installment of the new “Working with Sessionized Data” series on Win-Vector. Read more

Balancing Classes Before Training Classifiers - Addressing a Folk Theorem

Published:

We’ve been wanting to get more into training over at Win-Vector, but I don’t want to completely give up client work, because clients and their problems are often the inspiration for cool solutions – and good blog articles. Working on the video course for the last couple of months has given me some good ideas, too. Read more

Recent post on Win-Vector blog, plus some musings on Audience

Published:

I put a new post up on Win-Vector a couple of days ago called "The Geometry of Classifiers", a follow-up post to a recent paper by Fernandez-Delgado, et al. that investigates several classifiers against a body of data sets, mostly from the UCI Machine Learning Repository. Our article follows up the study with seven additional additional classifier implementations from scikit-learn and an interactive Shiny app to explore the results. Read more

Design, Problem Solving, and Good Taste

Published:

I ran across this essay recently on the role of design standards for scientific data visualization. The author, Jure Triglav, draws his inspiration from the creation and continued use of the NYCTA Graphics Standards, which were instituted in the late 1960s to unify the signage for the New York City subway system. Read more

A Moment’s Digression

Published:

I had a data nerd moment while reading a novel the other day. I got in an argument with the book. But I think the book started it. It's a frivolous discussion, probably, but sometimes those are the most fun. Read more

New Article on Bandit Formulations for A/B Testing

Published:

I have a new article up on the Win-Vector blog: Bandit Formulations for A/B Tests: Some Intuition. The article discusses the bandit problem formulation as an alternative to significance-based formulations for A/B tests. Read more

Big News! Practical Data Science with R is content complete!

Published:

It's been a while since I've posted here, but I have good news: the last appendix has gone to the editors. The book is now content complete. What a relief! We are hoping to release the book late in the first quarter of next year. In the meantime, you can still get early drafts of our chapters through Manning’s Early Access program, if you haven’t yet. The link is here. Read more

Book Update, and Thoughts on Topical versus Archival Blogging

Published:

We are sending substantive drafts of the first four chapters of our data science book out for review. Manning, our publisher, hopes to launch the book in their Early Access Program (MEAP) by early May. Crossing our fingers! Read more

Bon Mots from Professor Rota

Published:

As I've posted previously, we are writing a data science book. The preview of the first chapter of our book should come out in about a month or so. We are almost finished with the revisions to the first four chapters, and we've started refining the outline of the next three. Exciting! It happens that I've been rereading mathematician Gian-Carlo Rota's collection of essays, Indiscrete Thoughts, and I've found a few passages that really speak to me, now that I'm in book-writing mode. Enjoy. Read more

On Balance

Published:

One of my favorite cheesy movies is a gem from 1984 called The Adventures of Buckaroo Banzai Across the 8th Dimension. For those who haven't seen it, Buckaroo Banzai is a brilliant young neurosurgeon and particle physicist who spends his days conducting cutting-edge research. At night, he and his research colleagues -- all engineers and scientists and doctors -- rock New Jersey as a band called the Hong Kong Cavaliers. In between the brilliant science and the rock-star night life, the Cavaliers find time to save the world from an alien invasion led by none other than John Lithgow. Read more

Good News: We’re Writing a Book!

Published:

I’m happy to announce that John Mount and I have just signed a contract with Manning Publications to write a book on Data Science. We have both talked about doing this for quite a while, and we are excited that we finally have the opportunity. Read more

On Being a Data Scientist

Published:

When people ask me what it means to be a data scientist, I used to answer, "it means you don't have to hold my hand." By which I meant that as a data scientist (a consulting data scientist), I can handle the data collection, the data cleaning and wrangling, the analysis, and the final presentation of results (both technical and for the business audience) with a minimal amount of assistance from my clients or their people. Not no assistance, of course, but little enough that I'm not interfering too much with their day-to-day job. Read more

Goldbach’s Celestial Atlas

Published:

Christian Goldbach, Prussian mathematician. Probably most famous for the Goldbach conjecture, one of the oldest unsolved problems in mathematics:

Every even integer greater than 2 can be expressed as the sum of two primes.

Mathematics versus Computer Science

Published:

…until the development of computers the possibility of dealing successfully with the complex itself was never really envisaged. Perhaps the most successful substitute for such a possibility, as well as the nearest approach to it, came in mathematics. … To find the simple in the complex, the finite in the infinite -- that is not a bad description of the aim and essence of mathematics.</p>

Bon Mots from Professor Rota

Published:

As I've posted previously, we are writing a data science book. The preview of the first chapter of our book should come out in about a month or so. We are almost finished with the revisions to the first four chapters, and we've started refining the outline of the next three. Exciting! It happens that I've been rereading mathematician Gian-Carlo Rota's collection of essays, Indiscrete Thoughts, and I've found a few passages that really speak to me, now that I'm in book-writing mode. Enjoy. Read more

A Trip to the Virtual Attic

Published:

When the world feels like it’s falling apart around you, it feels good to solve little problems that are completely under your control. And that’s what I’ve been doing this past week. This was originally posted at Multo. Read more

On Persistence and Sincerity

Published:

We’re in the middle of marketing efforts here at Win-Vector, and I’ve just spent a few hours going through the Win-Vector blog so I could update our Popular Articles page (I have to do that for Multo someday, too). Read more

Recent post on Win-Vector blog, plus some musings on Audience

Published:

I put a new post up on Win-Vector a couple of days ago called "The Geometry of Classifiers", a follow-up post to a recent paper by Fernandez-Delgado, et al. that investigates several classifiers against a body of data sets, mostly from the UCI Machine Learning Repository. Our article follows up the study with seven additional additional classifier implementations from scikit-learn and an interactive Shiny app to explore the results. Read more

Design, Problem Solving, and Good Taste

Published:

I ran across this essay recently on the role of design standards for scientific data visualization. The author, Jure Triglav, draws his inspiration from the creation and continued use of the NYCTA Graphics Standards, which were instituted in the late 1960s to unify the signage for the New York City subway system. Read more

A Moment’s Digression

Published:

I had a data nerd moment while reading a novel the other day. I got in an argument with the book. But I think the book started it. It's a frivolous discussion, probably, but sometimes those are the most fun. Read more

Popularity and Social Networks: Life is still like high school

Published:

I remember setting up the Multo blog a few years ago: my first blog explicitly meant for public consumption. On the "Follow" widget -- the button that allows readers to follow a blog via email notifications -- there is an option to show the count of the blog's followers. Read more

Dragons of Probability

Published:

"No insults, please!" said Pugg. "For I am not your usual uncouth pirate, but refined and with a Ph.D. and therefore extremely high-strung."
-- from "The Sixth Sally, or how Trurl and Klapaucius Created a Demon of the Second Kind to Defeat the Pirate Pugg"

Mathematics versus Computer Science

Published:

…until the development of computers the possibility of dealing successfully with the complex itself was never really envisaged. Perhaps the most successful substitute for such a possibility, as well as the nearest approach to it, came in mathematics. … To find the simple in the complex, the finite in the infinite -- that is not a bad description of the aim and essence of mathematics.</p>

Book Update, and Thoughts on Topical versus Archival Blogging

Published:

We are sending substantive drafts of the first four chapters of our data science book out for review. Manning, our publisher, hopes to launch the book in their Early Access Program (MEAP) by early May. Crossing our fingers! Read more

Bon Mots from Professor Rota

Published:

As I've posted previously, we are writing a data science book. The preview of the first chapter of our book should come out in about a month or so. We are almost finished with the revisions to the first four chapters, and we've started refining the outline of the next three. Exciting! It happens that I've been rereading mathematician Gian-Carlo Rota's collection of essays, Indiscrete Thoughts, and I've found a few passages that really speak to me, now that I'm in book-writing mode. Enjoy. Read more

What’s Wrong with a Low(er)-Stress Job?

Published:

So there's this article that's been making the rounds called "The 10 Least Stressful Jobs of 2013"; perhaps you've read it. I don't normally bother with articles like that, but it came to my attention because some of my old graduate-school friends (who are professors) threw a mini-rant on social media over the fact that University Professor is the Number One least stressful job of the year, according to the article. And just now, I tripped over a blog post where a librarian takes umbrage over the fact that they also on the list. Read more

On Balance

Published:

One of my favorite cheesy movies is a gem from 1984 called The Adventures of Buckaroo Banzai Across the 8th Dimension. For those who haven't seen it, Buckaroo Banzai is a brilliant young neurosurgeon and particle physicist who spends his days conducting cutting-edge research. At night, he and his research colleagues -- all engineers and scientists and doctors -- rock New Jersey as a band called the Hong Kong Cavaliers. In between the brilliant science and the rock-star night life, the Cavaliers find time to save the world from an alien invasion led by none other than John Lithgow. Read more

I Write, Therefore I Think

Published:

I came across an interesting article in The Atlantic a little while back that discussed the connection between writing and thinking. New Dorp, a Staten Island high school in a poor and working-class neighborhood, was able to improve student performance when they realized that their students couldn’t write. These underperforming students often could read and could do math. The majority of them were well-behaved, and seemed to want to learn. Yet they couldn't pass standard proficiency tests, and couldn't graduate. All because they couldn't form complex sentences. Read more

On Being a Data Scientist

Published:

When people ask me what it means to be a data scientist, I used to answer, "it means you don't have to hold my hand." By which I meant that as a data scientist (a consulting data scientist), I can handle the data collection, the data cleaning and wrangling, the analysis, and the final presentation of results (both technical and for the business audience) with a minimal amount of assistance from my clients or their people. Not no assistance, of course, but little enough that I'm not interfering too much with their day-to-day job. Read more

Design, Problem Solving, and Good Taste

Published:

I ran across this essay recently on the role of design standards for scientific data visualization. The author, Jure Triglav, draws his inspiration from the creation and continued use of the NYCTA Graphics Standards, which were instituted in the late 1960s to unify the signage for the New York City subway system. Read more

What is Verification by Multiplicity?

Published:

There's been a buzz the last few days about the 715 new planets that NASA has verified, using data from the Kepler Space Telescope. This discovery doubles the number of known planets, and turned up four new planets that could possibly support life. Read more

Popularity and Social Networks: Life is still like high school

Published:

I remember setting up the Multo blog a few years ago: my first blog explicitly meant for public consumption. On the "Follow" widget -- the button that allows readers to follow a blog via email notifications -- there is an option to show the count of the blog's followers. Read more

Goldbach’s Celestial Atlas

Published:

Christian Goldbach, Prussian mathematician. Probably most famous for the Goldbach conjecture, one of the oldest unsolved problems in mathematics:

Every even integer greater than 2 can be expressed as the sum of two primes.

Dragons of Probability

Published:

"No insults, please!" said Pugg. "For I am not your usual uncouth pirate, but refined and with a Ph.D. and therefore extremely high-strung."
-- from "The Sixth Sally, or how Trurl and Klapaucius Created a Demon of the Second Kind to Defeat the Pirate Pugg"

On Writing Technical Articles for the Nonspecialist

Published:

I came across a post from Emily Willingham the other day: "Is a PhD required for Good Science Writing?". As a science writer with a science PhD, her answer is: is it not required, and it can often be an impediment. I saw a similar sentiment echoed once by Lee Gutkind, the founder and editor of the journal Creative Nonfiction. I don't remember exactly what he wrote, but it was something to the effect that scientists are exactly the wrong people to produce literary, accessible writing about matters scientific. Read more

The vtreat package two ways

Published:

We recently did a couple of talks about our vtreat data treatment package: one for the Python version, and one for the R version. If you are fitting machine learning models on messy real-world data, then you might find vtreat useful. Do check out one of the introductory talks below. Read more

WVPlots and Color Controls

Published:

I've put a new release of the WVPlots package up on CRAN. This release adds consistent palette and/or other color controls to most of the functions in the package. Read more

Popular Articles on Win-Vector

Published:

John has just put up an article on the Win-Vector blog, highlighting some of our popular series of articles, as well as our more popular posts. If you like the articles that I point to on this blog, check out some of the other posts written by John, too. Read more

New on Win-Vector: A Simpler Explanation of Differential Privacy

Published:

I have a new article up on Win-Vector, discussing differential privacy and the new recent results on applying differential privacy to enable reuse of holdout data in machine learning. Read more

New on Win-Vector: Checking your Data for Signal

Published:

I have a new article up on the Win-Vector Blog, on checking your input variables for signal: Read more

New on Win-Vector: Variable Selection for Sessionized Data

Published:

I’ve just put up the next installment of the new “Working with Sessionized Data” series on Win-Vector. Read more

Balancing Classes Before Training Classifiers - Addressing a Folk Theorem

Published:

We’ve been wanting to get more into training over at Win-Vector, but I don’t want to completely give up client work, because clients and their problems are often the inspiration for cool solutions – and good blog articles. Working on the video course for the last couple of months has given me some good ideas, too. Read more

Recent post on Win-Vector blog, plus some musings on Audience

Published:

I put a new post up on Win-Vector a couple of days ago called "The Geometry of Classifiers", a follow-up post to a recent paper by Fernandez-Delgado, et al. that investigates several classifiers against a body of data sets, mostly from the UCI Machine Learning Repository. Our article follows up the study with seven additional additional classifier implementations from scikit-learn and an interactive Shiny app to explore the results. Read more

Design, Problem Solving, and Good Taste

Published:

I ran across this essay recently on the role of design standards for scientific data visualization. The author, Jure Triglav, draws his inspiration from the creation and continued use of the NYCTA Graphics Standards, which were instituted in the late 1960s to unify the signage for the New York City subway system. Read more

A Moment’s Digression

Published:

I had a data nerd moment while reading a novel the other day. I got in an argument with the book. But I think the book started it. It's a frivolous discussion, probably, but sometimes those are the most fun. Read more

New Article on Bandit Formulations for A/B Testing

Published:

I have a new article up on the Win-Vector blog: Bandit Formulations for A/B Tests: Some Intuition. The article discusses the bandit problem formulation as an alternative to significance-based formulations for A/B tests. Read more

What is Verification by Multiplicity?

Published:

There's been a buzz the last few days about the 715 new planets that NASA has verified, using data from the Kepler Space Telescope. This discovery doubles the number of known planets, and turned up four new planets that could possibly support life. Read more

Big News! Practical Data Science with R is content complete!

Published:

It's been a while since I've posted here, but I have good news: the last appendix has gone to the editors. The book is now content complete. What a relief! We are hoping to release the book late in the first quarter of next year. In the meantime, you can still get early drafts of our chapters through Manning’s Early Access program, if you haven’t yet. The link is here. Read more

Dragons of Probability

Published:

"No insults, please!" said Pugg. "For I am not your usual uncouth pirate, but refined and with a Ph.D. and therefore extremely high-strung."
-- from "The Sixth Sally, or how Trurl and Klapaucius Created a Demon of the Second Kind to Defeat the Pirate Pugg"

Good News: We’re Writing a Book!

Published:

I’m happy to announce that John Mount and I have just signed a contract with Manning Publications to write a book on Data Science. We have both talked about doing this for quite a while, and we are excited that we finally have the opportunity. Read more

On Being a Data Scientist

Published:

When people ask me what it means to be a data scientist, I used to answer, "it means you don't have to hold my hand." By which I meant that as a data scientist (a consulting data scientist), I can handle the data collection, the data cleaning and wrangling, the analysis, and the final presentation of results (both technical and for the business audience) with a minimal amount of assistance from my clients or their people. Not no assistance, of course, but little enough that I'm not interfering too much with their day-to-day job. Read more

On Writing Technical Articles for the Nonspecialist

Published:

I came across a post from Emily Willingham the other day: "Is a PhD required for Good Science Writing?". As a science writer with a science PhD, her answer is: is it not required, and it can often be an impediment. I saw a similar sentiment echoed once by Lee Gutkind, the founder and editor of the journal Creative Nonfiction. I don't remember exactly what he wrote, but it was something to the effect that scientists are exactly the wrong people to produce literary, accessible writing about matters scientific. Read more

On Persistence and Sincerity

Published:

We’re in the middle of marketing efforts here at Win-Vector, and I’ve just spent a few hours going through the Win-Vector blog so I could update our Popular Articles page (I have to do that for Multo someday, too). Read more

New on Win-Vector: Checking your Data for Signal

Published:

I have a new article up on the Win-Vector Blog, on checking your input variables for signal: Read more

New on Win-Vector: Variable Selection for Sessionized Data

Published:

I’ve just put up the next installment of the new “Working with Sessionized Data” series on Win-Vector. Read more

Recent post on Win-Vector blog, plus some musings on Audience

Published:

I put a new post up on Win-Vector a couple of days ago called "The Geometry of Classifiers", a follow-up post to a recent paper by Fernandez-Delgado, et al. that investigates several classifiers against a body of data sets, mostly from the UCI Machine Learning Repository. Our article follows up the study with seven additional additional classifier implementations from scikit-learn and an interactive Shiny app to explore the results. Read more

Popularity and Social Networks: Life is still like high school

Published:

I remember setting up the Multo blog a few years ago: my first blog explicitly meant for public consumption. On the "Follow" widget -- the button that allows readers to follow a blog via email notifications -- there is an option to show the count of the blog's followers. Read more

Big News! Practical Data Science with R is content complete!

Published:

It's been a while since I've posted here, but I have good news: the last appendix has gone to the editors. The book is now content complete. What a relief! We are hoping to release the book late in the first quarter of next year. In the meantime, you can still get early drafts of our chapters through Manning’s Early Access program, if you haven’t yet. The link is here. Read more

Book Update, and Thoughts on Topical versus Archival Blogging

Published:

We are sending substantive drafts of the first four chapters of our data science book out for review. Manning, our publisher, hopes to launch the book in their Early Access Program (MEAP) by early May. Crossing our fingers! Read more

Bon Mots from Professor Rota

Published:

As I've posted previously, we are writing a data science book. The preview of the first chapter of our book should come out in about a month or so. We are almost finished with the revisions to the first four chapters, and we've started refining the outline of the next three. Exciting! It happens that I've been rereading mathematician Gian-Carlo Rota's collection of essays, Indiscrete Thoughts, and I've found a few passages that really speak to me, now that I'm in book-writing mode. Enjoy. Read more

On Balance

Published:

One of my favorite cheesy movies is a gem from 1984 called The Adventures of Buckaroo Banzai Across the 8th Dimension. For those who haven't seen it, Buckaroo Banzai is a brilliant young neurosurgeon and particle physicist who spends his days conducting cutting-edge research. At night, he and his research colleagues -- all engineers and scientists and doctors -- rock New Jersey as a band called the Hong Kong Cavaliers. In between the brilliant science and the rock-star night life, the Cavaliers find time to save the world from an alien invasion led by none other than John Lithgow. Read more

Good News: We’re Writing a Book!

Published:

I’m happy to announce that John Mount and I have just signed a contract with Manning Publications to write a book on Data Science. We have both talked about doing this for quite a while, and we are excited that we finally have the opportunity. Read more