Hadley Wickham has once again1 made R ridiculously better. Not only is dplyr
incredibly fast, but the new syntax allows for some really complex operations to be expressed in a ridiculously beautiful way.
Consider a data set, course
, with a student identifier, sid
, a course identifier, courseno
, a quarter, quarter
, and a grade on a scale of 0 to 4, gpa
. What if I wanted to know the number of a courses a student has failed over the entire year, as defined by having an overall grade of less than a 1.0?
In dplyr:
course %.%
group_by(sid, courseno) %.%
summarise(gpa = mean(gpa)) %.%
filter(gpa <= 1.0) %.%
summarise(fails = n())
I refuse to even sully this post with the way I would have solved this problem in the past.
-
Seriously, how many of the packages he has managed/written are indispensable to using R today? It is no exaggeration to say that the world would have many more Stata, SPSS, and SAS users if not for Hadleyverse. ↩︎