Practical Data Science with R now in Chinese Translation!

Our publisher, Manning, has kindly sent us complimentary copies of the new Simplified Chinese translation of Practical Data Science with R.

PDSwRChinese

We can’t read it, of course, but it’s cool (and a bit intimidating) to see what our work looks like in another language and character set. Here are a couple of peeks inside, just for fun.

IMG_2100

(Click for a bigger photo)

IMG_2099

(Click for a bigger photo)

I wonder if Manning is planning any other translated editions? I’ll keep you posted.

John Oliver on Scientific Studies

An excellent rant from John Oliver on the way science stories are handled in the media, and on the need for some healthy skepticism. And the need to track down sources for the studies yourself, to the extent that this is possible.

Also, I love the “TODD Talks” skit at the end.

On Persistence and Sincerity

5245227711 370acc245e z
…propaganda, Boris Artzybasheff. Image: James Vaughan, some rights reserved.

We’re in the middle of marketing efforts here at Win-Vector, and I’ve just spent a few hours going through the Win-Vector blog so I could update our Popular Articles page (I have to do that for Multo, someday, too).

As I went through the blog, I had a number of thoughts:

  • Wow, this is a lot of posts.
  • Wow, we write about a lot of topics.
  • Wow, this is some really great stuff!

I can’t take credit for all that. The Win-Vector blog is John’s baby; he started it way back in July of 2007, and as it’s his only blog, it’s his primary mode of expression (Facebook for cooking, Win-Vector for the techy stuff). He writes more of the posts than I do. But the blog has been good for some of my hobby horses, too.[1]

The excuse for the Win-Vector blog is that it’s “marketing” for the company. And it is; we promote ourselves sometimes: our company, our book, our video courses. But mostly it’s here because we wanted a place to talk about what we care about, and a place to share things we thought would help other people.

Read more of this post

Recent post on Win-Vector blog, plus some musings on Audience

 

mds

I put a new post up on Win-Vector a couple of days ago called “The Geometry of Classifiers”, a follow-up post to a recent paper by Fernandez-Delgado, et al. that investigates several classifiers against a body of data sets, mostly from the UCI Machine Learning Repository. Our article follows up the study with seven additional additional classifier implementations from scikit-learn and an interactive Shiny app to explore the results.

As you might guess, we did our little study not only because we were interested in the questions of classifier performance and classifier similarity, but because we wanted an excuse to play with scikit-learn and Shiny. We’re proud of the results (the app is cool!), but we didn’t consider this an especially ground-breaking post. Much to our surprise, this article got over 2000 views the day we posted it (a huge number, for us), up to nearly 3000 as I write this. It’s already our eighth most popular post of this year (an earlier post by John on the Fernandez-Delgado paper, a comment about some of their data treatment is also doing quite well: #2 for the month and #21 for the year).

Read more of this post

Design, Problem Solving, and Good Taste

Subway

Image: A Case for Spaceships (Jure Triglav)

I ran across this essay recently on the role of design standards for scientific data visualization. The author, Jure Triglav, draws his inspiration from the creation and continued use of the NYCTA Graphics Standards, which were instituted in the late 1960s to unify the signage for the New York City subway system. As the author puts it, the Graphics Standards Manual is “a timeless example of great design elegantly solving a real problem.” Thanks to the unified iconography, a traveler on the New York subway knows exactly what to look for to navigate the subway system, no matter which station they may be in. And the iconography is beautiful, too.

Unimark

Unimark, the design company that designed the Graphics Standards.
Aren’t they a hip, mod looking group? And I’m jealous of those lab coats.
Image: A Case for Spaceships (Jure Triglav)

What works to clarify subway travel will work to clarify the morass of graphs and charts that pass for scientific visualization, Triglav argues. And we should start with the work of the Joint Committee on Standards for Graphical Presentation, a group of statisticians, engineers, scientists, and mathematicians who first adopted a set of standards in 1914, revised in 1936, 1938, and 1960.

I agree with him — mostly.

Read more of this post