[r] Relative frequencies / proportions with dplyr

For the sake of completeness of this popular question, since version 1.0.0 of dplyr, parameter .groups controls the grouping structure of the summarise function after group_by summarise help.

With .groups = "drop_last", summarise drops the last level of grouping. This was the only result obtained before version 1.0.0.

library(dplyr)
library(scales)

original <- mtcars %>%
  group_by (am, gear) %>%
  summarise (n=n()) %>%
  mutate(rel.freq =  scales::percent(n/sum(n), accuracy = 0.1))
#> `summarise()` regrouping output by 'am' (override with `.groups` argument)

original
#> # A tibble: 4 x 4
#> # Groups:   am [2]
#>      am  gear     n rel.freq
#>   <dbl> <dbl> <int> <chr>   
#> 1     0     3    15 78.9%   
#> 2     0     4     4 21.1%   
#> 3     1     4     8 61.5%   
#> 4     1     5     5 38.5%

new_drop_last <- mtcars %>%
  group_by (am, gear) %>%
  summarise (n=n(), .groups = "drop_last") %>%
  mutate(rel.freq =  scales::percent(n/sum(n), accuracy = 0.1))

dplyr::all_equal(original, new_drop_last)
#> [1] TRUE

With .groups = "drop", all levels of grouping are dropped. The result is turned into an independent tibble with no trace of the previous group_by

# .groups = "drop"
new_drop <- mtcars %>%
  group_by (am, gear) %>%
  summarise (n=n(), .groups = "drop") %>%
  mutate(rel.freq =  scales::percent(n/sum(n), accuracy = 0.1))

new_drop
#> # A tibble: 4 x 4
#>      am  gear     n rel.freq
#>   <dbl> <dbl> <int> <chr>   
#> 1     0     3    15 46.9%   
#> 2     0     4     4 12.5%   
#> 3     1     4     8 25.0%   
#> 4     1     5     5 15.6%

If .groups = "keep", same grouping structure as .data (mtcars, in this case). summarise does not peel off any variable used in the group_by.

Finally, with .groups = "rowwise", each row is it's own group. It is equivalent to "keep" in this situation

# .groups = "keep"
new_keep <- mtcars %>%
  group_by (am, gear) %>%
  summarise (n=n(), .groups = "keep") %>%
  mutate(rel.freq =  scales::percent(n/sum(n), accuracy = 0.1))

new_keep
#> # A tibble: 4 x 4
#> # Groups:   am, gear [4]
#>      am  gear     n rel.freq
#>   <dbl> <dbl> <int> <chr>   
#> 1     0     3    15 100.0%  
#> 2     0     4     4 100.0%  
#> 3     1     4     8 100.0%  
#> 4     1     5     5 100.0%

# .groups = "rowwise"
new_rowwise <- mtcars %>%
  group_by (am, gear) %>%
  summarise (n=n(), .groups = "rowwise") %>%
  mutate(rel.freq =  scales::percent(n/sum(n), accuracy = 0.1))

dplyr::all_equal(new_keep, new_rowwise)
#> [1] TRUE

Another point that can be of interest is that sometimes, after applying group_by and summarise, a summary line can help.

# create a subtotal line to help readability
subtotal_am <- mtcars %>%
  group_by (am) %>% 
  summarise (n=n()) %>%
  mutate(gear = NA, rel.freq = 1)
#> `summarise()` ungrouping output (override with `.groups` argument)

mtcars %>% group_by (am, gear) %>%
  summarise (n=n()) %>% 
  mutate(rel.freq = n/sum(n)) %>%
  bind_rows(subtotal_am) %>%
  arrange(am, gear) %>%
  mutate(rel.freq =  scales::percent(rel.freq, accuracy = 0.1))
#> `summarise()` regrouping output by 'am' (override with `.groups` argument)
#> # A tibble: 6 x 4
#> # Groups:   am [2]
#>      am  gear     n rel.freq
#>   <dbl> <dbl> <int> <chr>   
#> 1     0     3    15 78.9%   
#> 2     0     4     4 21.1%   
#> 3     0    NA    19 100.0%  
#> 4     1     4     8 61.5%   
#> 5     1     5     5 38.5%   
#> 6     1    NA    13 100.0%

Created on 2020-11-09 by the reprex package (v0.3.0)

Hope you find this answer useful.

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to group-by

SELECT list is not in GROUP BY clause and contains nonaggregated column .... incompatible with sql_mode=only_full_group_by Count unique values using pandas groupby Pandas group-by and sum Count unique values with pandas per groups Group dataframe and get sum AND count? Error related to only_full_group_by when executing a query in MySql Pandas sum by groupby, but exclude certain columns Using DISTINCT along with GROUP BY in SQL Server Python Pandas : group by in group by and average? How do I create a new column from the output of pandas groupby().sum()?

Examples related to dplyr

R dplyr: Drop multiple columns How to specify "does not contain" in dplyr filter Select first and last row from grouped data Error: could not find function "%>%" Sum across multiple columns with dplyr Removing NA observations with dplyr::filter() Changing factor levels with dplyr mutate Change value of variable with dplyr dplyr change many data types What does %>% function mean in R?

Examples related to frequency

How to count how many values per level in a given factor? Relative frequencies / proportions with dplyr Count the frequency that a value occurs in a dataframe column Count frequency of words in a list and sort by frequency Frequency table for a single variable Getting the count of unique values in a column in bash How to find most common elements of a list? How to count the frequency of the elements in an unordered list? Item frequency count in Python