December 6, 2016
dplyr with a specific data frame, where all the columns are known, is an effective and pleasant way to execute declarative (SQL-like) operations on dataframes and dataframe-like objects in R. It also has the advantage of working not only on local data, but also on
dplyr-supported remote data stores, like SQL databases or Spark.
However, once we know longer know the column names, the pleasure quickly fades. The currently recommended way to handle
dplyr‘s non-standard evaluation is via the
lazyeval package. This is not pretty. I never want to write anything like the following, ever again.
# target is a moving target, so to speak target = "column_I_want" library(lazyeval) # return all the rows where target column is NA dframe %>% filter_(interp(~ is.na(col), col=as.name(target)) )
This example is fairly simple, but the more complex the
dplyr expression, and the more columns involved, the more unwieldy the
lazyeval solution becomes.
The difficulty of parameterizing
dplyr expressions is part of the motivation for Win-Vector’s new package,
replyr. I’ve just posted an article to the Win-Vector blog, on the function
replyr::let, which lets us parametrize
dplyr expressions without