There's been a buzz the last few days about the 715 new planets that NASA has verified, using data from the Kepler Space Telescope. This discovery doubles the number of known planets, and turned up four new planets that could possibly support life. Read more
I've put a new release of the WVPlots package up on CRAN. This release adds consistent palette and/or other color controls to most of the functions in the package. Read more
I had a data nerd moment while reading a novel the other day. I got in an argument with the book. But I think the book started it. It's a frivolous discussion, probably, but sometimes those are the most fun. Read more
As I've posted previously, we are writing a data science book. The preview of the first chapter of our book should come out in about a month or so. We are almost finished with the revisions to the first four chapters, and we've started refining the outline of the next three. Exciting! It happens that I've been rereading mathematician Gian-Carlo Rota's collection of essays, Indiscrete Thoughts, and I've found a few passages that really speak to me, now that I'm in book-writing mode. Enjoy. Read more
…until the development of computers the possibility of dealing successfully with the complex itself was never really envisaged. Perhaps the most successful substitute for such a possibility, as well as the nearest approach to it, came in mathematics. … To find the simple in the complex, the finite in the infinite -- that is not a bad description of the aim and essence of mathematics.</p>
There's been a buzz the last few days about the 715 new planets that NASA has verified, using data from the Kepler Space Telescope. This discovery doubles the number of known planets, and turned up four new planets that could possibly support life. Read more
It's been a while since I've posted here, but I have good news: the last appendix has gone to the editors. The book is now content complete. What a relief! We are hoping to release the book late in the first quarter of next year. In the meantime, you can still get early drafts of our chapters through Manning’s Early Access program, if you haven’t yet. The link is here. Read more
We recently did a couple of talks about our vtreat data treatment package: one for the Python version, and one for the R version. If you are fitting machine learning models on messy real-world data, then you might find vtreat useful. Do check out one of the introductory talks below. Read more
I put a new post up on Win-Vector a couple of days ago called "The Geometry of Classifiers", a follow-up post to a recent paper by Fernandez-Delgado, et al. that investigates several classifiers against a body of data sets, mostly from the UCI Machine Learning Repository. Our article follows up the study with seven additional additional classifier implementations from scikit-learn and an interactive Shiny app to explore the results. Read more
We recently did a couple of talks about our vtreat data treatment package: one for the Python version, and one for the R version. If you are fitting machine learning models on messy real-world data, then you might find vtreat useful. Do check out one of the introductory talks below. Read more
I've put a new release of the WVPlots package up on CRAN. This release adds consistent palette and/or other color controls to most of the functions in the package. Read more
I put a new post up on Win-Vector a couple of days ago called "The Geometry of Classifiers", a follow-up post to a recent paper by Fernandez-Delgado, et al. that investigates several classifiers against a body of data sets, mostly from the UCI Machine Learning Repository. Our article follows up the study with seven additional additional classifier implementations from scikit-learn and an interactive Shiny app to explore the results. Read more
I've put a new release of the WVPlots package up on CRAN. This release adds consistent palette and/or other color controls to most of the functions in the package. Read more
We’ve been wanting to get more into training over at Win-Vector, but I don’t want to completely give up client work, because clients and their problems are often the inspiration for cool solutions – and good blog articles. Working on the video course for the last couple of months has given me some good ideas, too. Read more
I put a new post up on Win-Vector a couple of days ago called "The Geometry of Classifiers", a follow-up post to a recent paper by Fernandez-Delgado, et al. that investigates several classifiers against a body of data sets, mostly from the UCI Machine Learning Repository. Our article follows up the study with seven additional additional classifier implementations from scikit-learn and an interactive Shiny app to explore the results. Read more
I've put a new release of the WVPlots package up on CRAN. This release adds consistent palette and/or other color controls to most of the functions in the package. Read more
I’m happy to announce that John Mount and I have just signed a contract with Manning Publications to write a book on Data Science. We have both talked about doing this for quite a while, and we are excited that we finally have the opportunity. Read more
I came across an interesting article in The Atlantic a little while back that discussed the connection between writing and thinking. New Dorp, a Staten Island high school in a poor and working-class neighborhood, was able to improve student performance when they realized that their students couldn’t write. These underperforming students often could read and could do math. The majority of them were well-behaved, and seemed to want to learn. Yet they couldn't pass standard proficiency tests, and couldn't graduate. All because they couldn't form complex sentences. Read more
John has just put up an article on the Win-Vector blog, highlighting some of our popular series of articles, as well as our more popular posts. If you like the articles that I point to on this blog, check out some of the other posts written by John, too. Read more
I put a new post up on Win-Vector a couple of days ago called "The Geometry of Classifiers", a follow-up post to a recent paper by Fernandez-Delgado, et al. that investigates several classifiers against a body of data sets, mostly from the UCI Machine Learning Repository. Our article follows up the study with seven additional additional classifier implementations from scikit-learn and an interactive Shiny app to explore the results. Read more
I remember setting up the Multo blog a few years ago: my first blog explicitly meant for public consumption. On the "Follow" widget -- the button that allows readers to follow a blog via email notifications -- there is an option to show the count of the blog's followers. Read more
We are sending substantive drafts of the first four chapters of our data science book out for review. Manning, our publisher, hopes to launch the book in their Early Access Program (MEAP) by early May. Crossing our fingers! Read more
I came across a post from Emily Willingham the other day: "Is a PhD required for Good Science Writing?". As a science writer with a science PhD, her answer is: is it not required, and it can often be an impediment. I saw a similar sentiment echoed once by Lee Gutkind, the founder and editor of the journal Creative Nonfiction. I don't remember exactly what he wrote, but it was something to the effect that scientists are exactly the wrong people to produce literary, accessible writing about matters scientific. Read more
I’m happy to announce that John Mount and I have just signed a contract with Manning Publications to write a book on Data Science. We have both talked about doing this for quite a while, and we are excited that we finally have the opportunity. Read more
I had a data nerd moment while reading a novel the other day. I got in an argument with the book. But I think the book started it. It's a frivolous discussion, probably, but sometimes those are the most fun. Read more
We’re in the middle of marketing efforts here at Win-Vector, and I’ve just spent a few hours going through the Win-Vector blog so I could update our Popular Articles page (I have to do that for Multo someday, too). Read more
We’ve been wanting to get more into training over at Win-Vector, but I don’t want to completely give up client work, because clients and their problems are often the inspiration for cool solutions – and good blog articles. Working on the video course for the last couple of months has given me some good ideas, too. Read more
I put a new post up on Win-Vector a couple of days ago called "The Geometry of Classifiers", a follow-up post to a recent paper by Fernandez-Delgado, et al. that investigates several classifiers against a body of data sets, mostly from the UCI Machine Learning Repository. Our article follows up the study with seven additional additional classifier implementations from scikit-learn and an interactive Shiny app to explore the results. Read more
I ran across this essay recently on the role of design standards for scientific data visualization. The author, Jure Triglav, draws his inspiration from the creation and continued use of the NYCTA Graphics Standards, which were instituted in the late 1960s to unify the signage for the New York City subway system. Read more
…until the development of computers the possibility of dealing successfully with the complex itself was never really envisaged. Perhaps the most successful substitute for such a possibility, as well as the nearest approach to it, came in mathematics. … To find the simple in the complex, the finite in the infinite -- that is not a bad description of the aim and essence of mathematics.</p>
One of my favorite cheesy movies is a gem from 1984 called The Adventures of Buckaroo Banzai Across the 8th Dimension. For those who haven't seen it, Buckaroo Banzai is a brilliant young neurosurgeon and particle physicist who spends his days conducting cutting-edge research. At night, he and his research colleagues -- all engineers and scientists and doctors -- rock New Jersey as a band called the Hong Kong Cavaliers. In between the brilliant science and the rock-star night life, the Cavaliers find time to save the world from an alien invasion led by none other than John Lithgow. Read more
One of my favorite cheesy movies is a gem from 1984 called The Adventures of Buckaroo Banzai Across the 8th Dimension. For those who haven't seen it, Buckaroo Banzai is a brilliant young neurosurgeon and particle physicist who spends his days conducting cutting-edge research. At night, he and his research colleagues -- all engineers and scientists and doctors -- rock New Jersey as a band called the Hong Kong Cavaliers. In between the brilliant science and the rock-star night life, the Cavaliers find time to save the world from an alien invasion led by none other than John Lithgow. Read more
I have a new article up on Win-Vector, discussing differential privacy and the new recent results on applying differential privacy to enable reuse of holdout data in machine learning. Read more
There's been a buzz the last few days about the 715 new planets that NASA has verified, using data from the Kepler Space Telescope. This discovery doubles the number of known planets, and turned up four new planets that could possibly support life. Read more
We recently did a couple of talks about our vtreat data treatment package: one for the Python version, and one for the R version. If you are fitting machine learning models on messy real-world data, then you might find vtreat useful. Do check out one of the introductory talks below. Read more
I have a new article up on Win-Vector, discussing differential privacy and the new recent results on applying differential privacy to enable reuse of holdout data in machine learning. Read more
I came across a post from Emily Willingham the other day: "Is a PhD required for Good Science Writing?". As a science writer with a science PhD, her answer is: is it not required, and it can often be an impediment. I saw a similar sentiment echoed once by Lee Gutkind, the founder and editor of the journal Creative Nonfiction. I don't remember exactly what he wrote, but it was something to the effect that scientists are exactly the wrong people to produce literary, accessible writing about matters scientific. Read more
We are sending substantive drafts of the first four chapters of our data science book out for review. Manning, our publisher, hopes to launch the book in their Early Access Program (MEAP) by early May. Crossing our fingers! Read more
We recently did a couple of talks about our vtreat data treatment package: one for the Python version, and one for the R version. If you are fitting machine learning models on messy real-world data, then you might find vtreat useful. Do check out one of the introductory talks below. Read more
I ran across this essay recently on the role of design standards for scientific data visualization. The author, Jure Triglav, draws his inspiration from the creation and continued use of the NYCTA Graphics Standards, which were instituted in the late 1960s to unify the signage for the New York City subway system. Read more
I have a new article up on Win-Vector, discussing differential privacy and the new recent results on applying differential privacy to enable reuse of holdout data in machine learning. Read more
As I've posted previously, we are writing a data science book. The preview of the first chapter of our book should come out in about a month or so. We are almost finished with the revisions to the first four chapters, and we've started refining the outline of the next three. Exciting! It happens that I've been rereading mathematician Gian-Carlo Rota's collection of essays, Indiscrete Thoughts, and I've found a few passages that really speak to me, now that I'm in book-writing mode. Enjoy. Read more
I came across an interesting article in The Atlantic a little while back that discussed the connection between writing and thinking. New Dorp, a Staten Island high school in a poor and working-class neighborhood, was able to improve student performance when they realized that their students couldn’t write. These underperforming students often could read and could do math. The majority of them were well-behaved, and seemed to want to learn. Yet they couldn't pass standard proficiency tests, and couldn't graduate. All because they couldn't form complex sentences. Read more
We’re in the middle of marketing efforts here at Win-Vector, and I’ve just spent a few hours going through the Win-Vector blog so I could update our Popular Articles page (I have to do that for Multo someday, too). Read more
We’ve been wanting to get more into training over at Win-Vector, but I don’t want to completely give up client work, because clients and their problems are often the inspiration for cool solutions – and good blog articles. Working on the video course for the last couple of months has given me some good ideas, too. Read more
We’re in the middle of marketing efforts here at Win-Vector, and I’ve just spent a few hours going through the Win-Vector blog so I could update our Popular Articles page (I have to do that for Multo someday, too). Read more
I've put a new release of the WVPlots package up on CRAN. This release adds consistent palette and/or other color controls to most of the functions in the package. Read more
It's been a while since I've posted here, but I have good news: the last appendix has gone to the editors. The book is now content complete. What a relief! We are hoping to release the book late in the first quarter of next year. In the meantime, you can still get early drafts of our chapters through Manning’s Early Access program, if you haven’t yet. The link is here. Read more
I’m happy to announce that John Mount and I have just signed a contract with Manning Publications to write a book on Data Science. We have both talked about doing this for quite a while, and we are excited that we finally have the opportunity. Read more
When the world feels like it’s falling apart around you, it feels good to solve little problems that are completely under your control. And that’s what I’ve been doing this past week. This was originally posted at Multo. Read more
Back in the good old days, ninazumel.com was a static site that I maintained myself, in pure HTML. But that (to me) was so much of a hassle that I never did even the little bit of site maintenance that the website required. So I moved it to wordpress.com. Read more
I ran across this essay recently on the role of design standards for scientific data visualization. The author, Jure Triglav, draws his inspiration from the creation and continued use of the NYCTA Graphics Standards, which were instituted in the late 1960s to unify the signage for the New York City subway system. Read more
I ran across this essay recently on the role of design standards for scientific data visualization. The author, Jure Triglav, draws his inspiration from the creation and continued use of the NYCTA Graphics Standards, which were instituted in the late 1960s to unify the signage for the New York City subway system. Read more
So there's this article that's been making the rounds called "The 10 Least Stressful Jobs of 2013"; perhaps you've read it. I don't normally bother with articles like that, but it came to my attention because some of my old graduate-school friends (who are professors) threw a mini-rant on social media over the fact that University Professor is the Number One least stressful job of the year, according to the article. And just now, I tripped over a blog post where a librarian takes umbrage over the fact that they also on the list. Read more
There's been a buzz the last few days about the 715 new planets that NASA has verified, using data from the Kepler Space Telescope. This discovery doubles the number of known planets, and turned up four new planets that could possibly support life. Read more
One of my favorite cheesy movies is a gem from 1984 called The Adventures of Buckaroo Banzai Across the 8th Dimension. For those who haven't seen it, Buckaroo Banzai is a brilliant young neurosurgeon and particle physicist who spends his days conducting cutting-edge research. At night, he and his research colleagues -- all engineers and scientists and doctors -- rock New Jersey as a band called the Hong Kong Cavaliers. In between the brilliant science and the rock-star night life, the Cavaliers find time to save the world from an alien invasion led by none other than John Lithgow. Read more
We’ve been wanting to get more into training over at Win-Vector, but I don’t want to completely give up client work, because clients and their problems are often the inspiration for cool solutions – and good blog articles. Working on the video course for the last couple of months has given me some good ideas, too. Read more
I have a new article up on Win-Vector, discussing differential privacy and the new recent results on applying differential privacy to enable reuse of holdout data in machine learning. Read more
We’ve been wanting to get more into training over at Win-Vector, but I don’t want to completely give up client work, because clients and their problems are often the inspiration for cool solutions – and good blog articles. Working on the video course for the last couple of months has given me some good ideas, too. Read more
I put a new post up on Win-Vector a couple of days ago called "The Geometry of Classifiers", a follow-up post to a recent paper by Fernandez-Delgado, et al. that investigates several classifiers against a body of data sets, mostly from the UCI Machine Learning Repository. Our article follows up the study with seven additional additional classifier implementations from scikit-learn and an interactive Shiny app to explore the results. Read more
John has just put up an article on the Win-Vector blog, highlighting some of our popular series of articles, as well as our more popular posts. If you like the articles that I point to on this blog, check out some of the other posts written by John, too. Read more
We’re in the middle of marketing efforts here at Win-Vector, and I’ve just spent a few hours going through the Win-Vector blog so I could update our Popular Articles page (I have to do that for Multo someday, too). Read more
…until the development of computers the possibility of dealing successfully with the complex itself was never really envisaged. Perhaps the most successful substitute for such a possibility, as well as the nearest approach to it, came in mathematics. … To find the simple in the complex, the finite in the infinite -- that is not a bad description of the aim and essence of mathematics.</p>
As I've posted previously, we are writing a data science book. The preview of the first chapter of our book should come out in about a month or so. We are almost finished with the revisions to the first four chapters, and we've started refining the outline of the next three. Exciting! It happens that I've been rereading mathematician Gian-Carlo Rota's collection of essays, Indiscrete Thoughts, and I've found a few passages that really speak to me, now that I'm in book-writing mode. Enjoy. Read more
When people ask me what it means to be a data scientist, I used to answer, "it means you don't have to hold my hand." By which I meant that as a data scientist (a consulting data scientist), I can handle the data collection, the data cleaning and wrangling, the analysis, and the final presentation of results (both technical and for the business audience) with a minimal amount of assistance from my clients or their people. Not no assistance, of course, but little enough that I'm not interfering too much with their day-to-day job. Read more
So there's this article that's been making the rounds called "The 10 Least Stressful Jobs of 2013"; perhaps you've read it. I don't normally bother with articles like that, but it came to my attention because some of my old graduate-school friends (who are professors) threw a mini-rant on social media over the fact that University Professor is the Number One least stressful job of the year, according to the article. And just now, I tripped over a blog post where a librarian takes umbrage over the fact that they also on the list. Read more
I remember setting up the Multo blog a few years ago: my first blog explicitly meant for public consumption. On the "Follow" widget -- the button that allows readers to follow a blog via email notifications -- there is an option to show the count of the blog's followers. Read more
One of my favorite cheesy movies is a gem from 1984 called The Adventures of Buckaroo Banzai Across the 8th Dimension. For those who haven't seen it, Buckaroo Banzai is a brilliant young neurosurgeon and particle physicist who spends his days conducting cutting-edge research. At night, he and his research colleagues -- all engineers and scientists and doctors -- rock New Jersey as a band called the Hong Kong Cavaliers. In between the brilliant science and the rock-star night life, the Cavaliers find time to save the world from an alien invasion led by none other than John Lithgow. Read more
I remember setting up the Multo blog a few years ago: my first blog explicitly meant for public consumption. On the "Follow" widget -- the button that allows readers to follow a blog via email notifications -- there is an option to show the count of the blog's followers. Read more
…until the development of computers the possibility of dealing successfully with the complex itself was never really envisaged. Perhaps the most successful substitute for such a possibility, as well as the nearest approach to it, came in mathematics. … To find the simple in the complex, the finite in the infinite -- that is not a bad description of the aim and essence of mathematics.</p>
We’ve been wanting to get more into training over at Win-Vector, but I don’t want to completely give up client work, because clients and their problems are often the inspiration for cool solutions – and good blog articles. Working on the video course for the last couple of months has given me some good ideas, too. Read more
When the world feels like it’s falling apart around you, it feels good to solve little problems that are completely under your control. And that’s what I’ve been doing this past week. This was originally posted at Multo. Read more
I had a data nerd moment while reading a novel the other day. I got in an argument with the book. But I think the book started it. It's a frivolous discussion, probably, but sometimes those are the most fun. Read more
As I've posted previously, we are writing a data science book. The preview of the first chapter of our book should come out in about a month or so. We are almost finished with the revisions to the first four chapters, and we've started refining the outline of the next three. Exciting! It happens that I've been rereading mathematician Gian-Carlo Rota's collection of essays, Indiscrete Thoughts, and I've found a few passages that really speak to me, now that I'm in book-writing mode. Enjoy. Read more
I came across a post from Emily Willingham the other day: "Is a PhD required for Good Science Writing?". As a science writer with a science PhD, her answer is: is it not required, and it can often be an impediment. I saw a similar sentiment echoed once by Lee Gutkind, the founder and editor of the journal Creative Nonfiction. I don't remember exactly what he wrote, but it was something to the effect that scientists are exactly the wrong people to produce literary, accessible writing about matters scientific. Read more
I put a new post up on Win-Vector a couple of days ago called "The Geometry of Classifiers", a follow-up post to a recent paper by Fernandez-Delgado, et al. that investigates several classifiers against a body of data sets, mostly from the UCI Machine Learning Repository. Our article follows up the study with seven additional additional classifier implementations from scikit-learn and an interactive Shiny app to explore the results. Read more
We’re in the middle of marketing efforts here at Win-Vector, and I’ve just spent a few hours going through the Win-Vector blog so I could update our Popular Articles page (I have to do that for Multo someday, too). Read more
We’re in the middle of marketing efforts here at Win-Vector, and I’ve just spent a few hours going through the Win-Vector blog so I could update our Popular Articles page (I have to do that for Multo someday, too). Read more
We’re in the middle of marketing efforts here at Win-Vector, and I’ve just spent a few hours going through the Win-Vector blog so I could update our Popular Articles page (I have to do that for Multo someday, too). Read more
Back in the good old days, ninazumel.com was a static site that I maintained myself, in pure HTML. But that (to me) was so much of a hassle that I never did even the little bit of site maintenance that the website required. So I moved it to wordpress.com. Read more
I put a new post up on Win-Vector a couple of days ago called "The Geometry of Classifiers", a follow-up post to a recent paper by Fernandez-Delgado, et al. that investigates several classifiers against a body of data sets, mostly from the UCI Machine Learning Repository. Our article follows up the study with seven additional additional classifier implementations from scikit-learn and an interactive Shiny app to explore the results. Read more
I remember setting up the Multo blog a few years ago: my first blog explicitly meant for public consumption. On the "Follow" widget -- the button that allows readers to follow a blog via email notifications -- there is an option to show the count of the blog's followers. Read more
I remember setting up the Multo blog a few years ago: my first blog explicitly meant for public consumption. On the "Follow" widget -- the button that allows readers to follow a blog via email notifications -- there is an option to show the count of the blog's followers. Read more
There's been a buzz the last few days about the 715 new planets that NASA has verified, using data from the Kepler Space Telescope. This discovery doubles the number of known planets, and turned up four new planets that could possibly support life. Read more
As I've posted previously, we are writing a data science book. The preview of the first chapter of our book should come out in about a month or so. We are almost finished with the revisions to the first four chapters, and we've started refining the outline of the next three. Exciting! It happens that I've been rereading mathematician Gian-Carlo Rota's collection of essays, Indiscrete Thoughts, and I've found a few passages that really speak to me, now that I'm in book-writing mode. Enjoy. Read more
I came across a post from Emily Willingham the other day: "Is a PhD required for Good Science Writing?". As a science writer with a science PhD, her answer is: is it not required, and it can often be an impediment. I saw a similar sentiment echoed once by Lee Gutkind, the founder and editor of the journal Creative Nonfiction. I don't remember exactly what he wrote, but it was something to the effect that scientists are exactly the wrong people to produce literary, accessible writing about matters scientific. Read more
We’re in the middle of marketing efforts here at Win-Vector, and I’ve just spent a few hours going through the Win-Vector blog so I could update our Popular Articles page (I have to do that for Multo someday, too). Read more
I came across a post from Emily Willingham the other day: "Is a PhD required for Good Science Writing?". As a science writer with a science PhD, her answer is: is it not required, and it can often be an impediment. I saw a similar sentiment echoed once by Lee Gutkind, the founder and editor of the journal Creative Nonfiction. I don't remember exactly what he wrote, but it was something to the effect that scientists are exactly the wrong people to produce literary, accessible writing about matters scientific. Read more
I came across an interesting article in The Atlantic a little while back that discussed the connection between writing and thinking. New Dorp, a Staten Island high school in a poor and working-class neighborhood, was able to improve student performance when they realized that their students couldn’t write. These underperforming students often could read and could do math. The majority of them were well-behaved, and seemed to want to learn. Yet they couldn't pass standard proficiency tests, and couldn't graduate. All because they couldn't form complex sentences. Read more
We recently did a couple of talks about our vtreat data treatment package: one for the Python version, and one for the R version. If you are fitting machine learning models on messy real-world data, then you might find vtreat useful. Do check out one of the introductory talks below. Read more
I've put a new release of the WVPlots package up on CRAN. This release adds consistent palette and/or other color controls to most of the functions in the package. Read more
I ran across this essay recently on the role of design standards for scientific data visualization. The author, Jure Triglav, draws his inspiration from the creation and continued use of the NYCTA Graphics Standards, which were instituted in the late 1960s to unify the signage for the New York City subway system. Read more
We recently did a couple of talks about our vtreat data treatment package: one for the Python version, and one for the R version. If you are fitting machine learning models on messy real-world data, then you might find vtreat useful. Do check out one of the introductory talks below. Read more
I came across an interesting article in The Atlantic a little while back that discussed the connection between writing and thinking. New Dorp, a Staten Island high school in a poor and working-class neighborhood, was able to improve student performance when they realized that their students couldn’t write. These underperforming students often could read and could do math. The majority of them were well-behaved, and seemed to want to learn. Yet they couldn't pass standard proficiency tests, and couldn't graduate. All because they couldn't form complex sentences. Read more
When the world feels like it’s falling apart around you, it feels good to solve little problems that are completely under your control. And that’s what I’ve been doing this past week. This was originally posted at Multo. Read more