I’ve been using the tidy eval framework introduced with dplyr 0.7 for about two months now, and it’s time for an update to my original post on tidy eval. My goal is not to explain tidy eval to you, but rather to show you some simple examples that you can easily generalize from.
library(tidyverse)starwars
# A tibble: 87 × 14
name height mass hair_…¹ skin_…² eye_c…³ birth…⁴ sex gender homew…⁵
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr>
1 Luke Skywa… 172 77 blond fair blue 19 male mascu… Tatooi…
2 C-3PO 167 75 <NA> gold yellow 112 none mascu… Tatooi…
3 R2-D2 96 32 <NA> white,… red 33 none mascu… Naboo
4 Darth Vader 202 136 none white yellow 41.9 male mascu… Tatooi…
5 Leia Organa 150 49 brown light brown 19 fema… femin… Aldera…
6 Owen Lars 178 120 brown,… light blue 52 male mascu… Tatooi…
7 Beru White… 165 75 brown light blue 47 fema… femin… Tatooi…
8 R5-D4 97 32 <NA> white,… red NA none mascu… Tatooi…
9 Biggs Dark… 183 84 black light brown 24 male mascu… Tatooi…
10 Obi-Wan Ke… 182 77 auburn… fair blue-g… 57 male mascu… Stewjon
# … with 77 more rows, 4 more variables: species <chr>, films <list>,
# vehicles <list>, starships <list>, and abbreviated variable names
# ¹hair_color, ²skin_color, ³eye_color, ⁴birth_year, ⁵homeworld
Using strings to refer to column names
To refer to columns in a data frame with strings, we need to convert those strings into symbol objects with rlang::sym and rlang::syms. We then use the created symbol objects in dplyr functions with the prefixes !! and !!!. This is because dplyr verbs expect input that looks like code. Using the sym/syms functions we can convert strings into objects that look like code.
mass <- rlang::sym("mass") # create a single symbolgroups <- rlang::syms(c("homeworld", "species")) # create a list of symbolsstarwars %>%group_by(!!!groups) %>%# use list of symbols with !!!summarize(avg_mass =mean(!!mass)) # use single symbol with !!
# A tibble: 58 × 3
# Groups: homeworld [49]
homeworld species avg_mass
<chr> <chr> <dbl>
1 Alderaan Human NA
2 Aleen Minor Aleena 15
3 Bespin Human 79
4 Bestine IV Human 110
5 Cato Neimoidia Neimodian 90
6 Cerea Cerean 82
7 Champala Chagrian NA
8 Chandrila Human NA
9 Concord Dawn Human 79
10 Corellia Human 78.5
# … with 48 more rows
# A tibble: 58 × 3
# Groups: homeworld [49]
homeworld species summarized_mean
<chr> <chr> <dbl>
1 Alderaan Human NA
2 Aleen Minor Aleena 15
3 Bespin Human 79
4 Bestine IV Human 110
5 Cato Neimoidia Neimodian 90
6 Cerea Cerean 82
7 Champala Chagrian NA
8 Chandrila Human NA
9 Concord Dawn Human 79
10 Corellia Human 78.5
# … with 48 more rows
Details about unquoting
!! and !!! are syntactic sugar on top of the functions UQ() and UQS(), respectively. It used to be that !! and !!! had low operator precedence, meaning that in terms of PEMDAS they came pretty much last. But now we can use them more intuitively:
# A tibble: 3 × 14
name height mass hair_…¹ skin_…² eye_c…³ birth…⁴ sex gender homew…⁵
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr>
1 Leia Organa 150 49 brown light brown 19 fema… femin… Aldera…
2 Bail Presto… 191 NA black tan brown 67 male mascu… Aldera…
3 Raymus Anti… 188 79 brown light brown NA male mascu… Aldera…
# … with 4 more variables: species <chr>, films <list>, vehicles <list>,
# starships <list>, and abbreviated variable names ¹hair_color, ²skin_color,
# ³eye_color, ⁴birth_year, ⁵homeworld
We can also use UQ and UQS directly to be explicit about what we’re unquoting.
filter(starwars, UQ(homeworld) =="Alderaan")
# A tibble: 3 × 14
name height mass hair_…¹ skin_…² eye_c…³ birth…⁴ sex gender homew…⁵
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr>
1 Leia Organa 150 49 brown light brown 19 fema… femin… Aldera…
2 Bail Presto… 191 NA black tan brown 67 male mascu… Aldera…
3 Raymus Anti… 188 79 brown light brown NA male mascu… Aldera…
# … with 4 more variables: species <chr>, films <list>, vehicles <list>,
# starships <list>, and abbreviated variable names ¹hair_color, ²skin_color,
# ³eye_color, ⁴birth_year, ⁵homeworld
Creating non-standard functions
Sometimes it is nice to write functions that use accept non-standard inputs, like dplyr verbs. For example, we might want to write a function with the same effect as
# A tibble: 58 × 3
# Groups: homeworld [49]
homeworld species avg_mass
<chr> <chr> <dbl>
1 Alderaan Human NA
2 Aleen Minor Aleena 15
3 Bespin Human 79
4 Bestine IV Human 110
5 Cato Neimoidia Neimodian 90
6 Cerea Cerean 82
7 Champala Chagrian NA
8 Chandrila Human NA
9 Concord Dawn Human 79
10 Corellia Human 78.5
# … with 48 more rows
To this we need to capture our input in quosures with quo and quos when programming interactively.
groups <-quos(homeworld, species) # capture a list of variables as raw inputmass <-quo(mass) # capture a single variable as raw inputstarwars %>%group_by(!!!groups) %>%# use !!! to access variables from `quos`summarize(avg_mass =sum(!!mass)) # use !! to access the variable in `quo`
# A tibble: 58 × 3
# Groups: homeworld [49]
homeworld species avg_mass
<chr> <chr> <dbl>
1 Alderaan Human NA
2 Aleen Minor Aleena 15
3 Bespin Human 79
4 Bestine IV Human 110
5 Cato Neimoidia Neimodian 90
6 Cerea Cerean 82
7 Champala Chagrian NA
8 Chandrila Human NA
9 Concord Dawn Human 79
10 Corellia Human 157
# … with 48 more rows
There’s some nice symmetry here in that we unwrap both rlang::sym and quo with !! and both rlang::syms and quos with !!!.
We might be interested in using this behavior in a function. To do this we replace calls to quo with calls to enquo.
summarize_by <-function(df, to_summarize, ...) { to_summarize <-enquo(to_summarize) # enquo captures a single argument groups <-quos(...) # quos captures multiple arguments df %>%group_by(!!!groups) %>%# unwrap quos with !!!summarize(summ =sum(!!to_summarize)) # unwrap enquo with !!}
Now our function call is non-standardized. Note that quos can capture an arbitrary number of arguments, like we have here. So both of the following calls are valid
summarize_by(starwars, mass, homeworld)
# A tibble: 49 × 2
homeworld summ
<chr> <dbl>
1 Alderaan NA
2 Aleen Minor 15
3 Bespin 79
4 Bestine IV 110
5 Cato Neimoidia 90
6 Cerea 82
7 Champala NA
8 Chandrila NA
9 Concord Dawn 79
10 Corellia 157
# … with 39 more rows
summarize_by(starwars, mass, homeworld, species)
# A tibble: 58 × 3
# Groups: homeworld [49]
homeworld species summ
<chr> <chr> <dbl>
1 Alderaan Human NA
2 Aleen Minor Aleena 15
3 Bespin Human 79
4 Bestine IV Human 110
5 Cato Neimoidia Neimodian 90
6 Cerea Cerean 82
7 Champala Chagrian NA
8 Chandrila Human NA
9 Concord Dawn Human 79
10 Corellia Human 157
# … with 48 more rows