Relative frequencies proportions with dplyr

Question

Suppose I want to calculate the proportion of different values within each group  For example  using the mtcars data  how do I calculate the relative frequency of number of gears by am  automatic manual  in one go with dplyr   library dplyr  data mtcars  mtcars  lt - tbl df mtcars     count frequency mtcars   gt     group by am  gear    gt     summarise n   n       am gear  n    0    3 15     0    4  4     1    4  8      1    5  5    What I would like to achieve   am gear  n rel freq  0    3 15      0 7894737  0    4  4      0 2105263  1    4  8      0 6153846  1    5  5      0 3846154

User · Answer

Despite the many answers  one more approach which uses prop table in combination with dplyr or data table   library  dplyr   mtcars   gt       group by am  gear    gt       summarise n   n      gt       mutate freq   prop table n    library  data table   cars dt  lt - as data table mtcars  cars dt     n    N   keyby     am  gear     freq    prop table n    by    am

User · Answer

For the sake of completeness of this popular question  since version 1 0 0 of dplyr  parameter  groups controls the grouping structure of the summarise function after group by summarise help  With  groups    quot drop last quot   summarise drops the last level of grouping  This was the only result obtained before version 1 0 0   library dplyr  library scales   original  lt - mtcars   gt     group by  am  gear    gt     summarise  n n      gt     mutate rel freq    scales  percent n sum n   accuracy   0 1     gt   summarise    regrouping output by  am   override with   groups  argument   original   gt    A tibble  4 x 4   gt    Groups    am  2    gt       am  gear     n rel freq   gt     lt dbl gt   lt dbl gt   lt int gt   lt chr gt       gt  1     0     3    15 78 9       gt  2     0     4     4 21 1       gt  3     1     4     8 61 5       gt  4     1     5     5 38 5   new drop last  lt - mtcars   gt     group by  am  gear    gt     summarise  n n     groups    quot drop last quot     gt     mutate rel freq    scales  percent n sum n   accuracy   0 1    dplyr  all equal original  new drop last    gt   1  TRUE  With  groups    quot drop quot   all levels of grouping are dropped  The result is turned into an independent tibble with no trace of the previous group by     groups    quot drop quot  new drop  lt - mtcars   gt     group by  am  gear    gt     summarise  n n     groups    quot drop quot     gt     mutate rel freq    scales  percent n sum n   accuracy   0 1    new drop   gt    A tibble  4 x 4   gt       am  gear     n rel freq   gt     lt dbl gt   lt dbl gt   lt int gt   lt chr gt       gt  1     0     3    15 46 9       gt  2     0     4     4 12 5       gt  3     1     4     8 25 0       gt  4     1     5     5 15 6   If  groups    quot keep quot   same grouping structure as  data  mtcars  in this case   summarise does not peel off any variable used in the group by  Finally  with  groups    quot rowwise quot   each row is it s own group  It is equivalent to  quot keep quot  in this situation     groups    quot keep quot  new keep  lt - mtcars   gt     group by  am  gear    gt     summarise  n n     groups    quot keep quot     gt     mutate rel freq    scales  percent n sum n   accuracy   0 1    new keep   gt    A tibble  4 x 4   gt    Groups    am  gear  4    gt       am  gear     n rel freq   gt     lt dbl gt   lt dbl gt   lt int gt   lt chr gt       gt  1     0     3    15 100 0      gt  2     0     4     4 100 0      gt  3     1     4     8 100 0      gt  4     1     5     5 100 0      groups    quot rowwise quot  new rowwise  lt - mtcars   gt     group by  am  gear    gt     summarise  n n     groups    quot rowwise quot     gt     mutate rel freq    scales  percent n sum n   accuracy   0 1    dplyr  all equal new keep  new rowwise    gt   1  TRUE  Another point that can be of interest is that sometimes  after applying group by and summarise  a summary line can help     create a subtotal line to help readability subtotal am  lt - mtcars   gt     group by  am    gt      summarise  n n      gt     mutate gear   NA  rel freq   1    gt   summarise    ungrouping output  override with   groups  argument   mtcars   gt   group by  am  gear    gt     summarise  n n      gt      mutate rel freq   n sum n     gt     bind rows subtotal am    gt     arrange am  gear    gt     mutate rel freq    scales  percent rel freq  accuracy   0 1     gt   summarise    regrouping output by  am   override with   groups  argument    gt    A tibble  6 x 4   gt    Groups    am  2    gt       am  gear     n rel freq   gt     lt dbl gt   lt dbl gt   lt int gt   lt chr gt       gt  1     0     3    15 78 9       gt  2     0     4     4 21 1       gt  3     0    NA    19 100 0      gt  4     1     4     8 61 5       gt  5     1     5     5 38 5       gt  6     1    NA    13 100 0   Created on 2020-11-09 by the reprex package  v0 3 0  Hope you find this answer useful

User · Answer

I wrote a small function for this repeating task   count pct  lt - function df      return      df   gt         tally   gt          mutate n pct   100 n sum n           I can then use it like   mtcars   gt      group by cyl    gt      count pct   It returns     A tibble  3 x 3     cyl     n n pct    lt dbl gt   lt int gt   lt dbl gt  1     4    11  34 4 2     6     7  21 9 3     8    14  43 8

User · Answer

Here is a base R answer using aggregate and ave   df1  lt - with mtcars  aggregate list n   mpg   list am   am  gear   gear   length   df1 prop  lt - with df1  n ave n  am  FUN   sum    Also with prop table  df1 prop  lt - with df1  ave n  am  FUN   prop table   df1     am gear  n      prop  1  0    3 15 0 7894737  2  0    4  4 0 2105263  3  1    4  8 0 6153846  4  1    5  5 0 3846154    We can also use prop table but the output displays differently  prop table table mtcars am  mtcars gear   1                   3         4         5    0 0 7894737 0 2105263 0 0000000    1 0 0000000 0 6153846 0 3846154

User · Answer

Henrik s is better for usability as this will make the column character and no longer numeric but matches what you asked for     mtcars   gt     group by  am  gear    gt     summarise  n n      gt     mutate rel freq   paste0 round 100   n sum n   0               am gear  n rel freq    1  0    3 15      79     2  0    4  4      21     3  1    4  8      62     4  1    5  5      38    EDIT Because Spacedman asked for it  -   as rel freq  lt - function x  rel freq col    rel freq              class x   lt - c  rel freq   class x       attributes x    rel freq col     lt - rel freq col     x    print rel freq  lt - function x             freq col  lt - attributes x    rel freq col        x  freq col    lt - paste0 round 100   x  freq col    0               class x   lt - class x   class x  in   rel freq       print x     mtcars   gt     group by  am  gear    gt     summarise  n n      gt     mutate rel freq   n sum n     gt     as rel freq       Source  local data frame  4 x 4     Groups  am          am gear  n rel freq    1  0    3 15      79     2  0    4  4      21     3  1    4  8      62     4  1    5  5      38

User · Answer

Try this   mtcars   gt     group by am  gear    gt     summarise n   n      gt     mutate freq   n   sum n        am gear  n      freq   1  0    3 15 0 7894737   2  0    4  4 0 2105263   3  1    4  8 0 6153846   4  1    5  5 0 3846154   From the dplyr vignette      When you group by multiple variables  each summary peels off one level of the grouping  That makes it easy to progressively roll-up a dataset    Thus  after the summarise  the last grouping variable specified in group by   gear   is peeled off  In the mutate step  the data is grouped by the remaining grouping variable s   here  am   You may check grouping in each step with groups    The outcome of the peeling is of course dependent of the order of the grouping variables in the group by call  You may wish to do a subsequent group by am   to make your code more explicit    For rounding and prettification  please refer to the nice answer by  Tyler Rinker

User · Answer

This answer is based upon Matifou s answer    First I modified it to ensure that I don t get the freq column returned as a scientific notation column by using the scipen option    Then I multiple the answer by 100 to get a percent rather than decimal to make the freq column easier to read as a percentage   getOption  scipen    options  scipen  10   mtcars   gt   count am  gear    gt    mutate freq    n   sum n     100

User · Answer

Here is a general function implementing Henrik s solution on dplyr 0 7 1   freq table  lt - function x                          group var                          prop var      group var  lt - enquo group var    prop var   lt - enquo prop var    x   gt        group by   group var    prop var    gt        summarise n   n      gt        mutate freq   n  sum n     gt        ungroup

User · Answer

You can use count   function  which has however a different behaviour depending on the version of dplyr    dplyr 0 7 1  returns an ungrouped table  you need to group again by am dplyr  lt  0 7 1  returns a grouped table  so no need to group again  although you might want to ungroup   for later manipulations   dplyr 0 7 1  mtcars   gt     count am  gear    gt     group by am    gt     mutate freq   n   sum n     dplyr  lt  0 7 1  mtcars   gt     count am  gear    gt     mutate freq   n   sum n     This results into a grouped table  if you want to use it for further analysis  it might be useful to remove the grouped attribute with ungroup

[r] Relative frequencies / proportions with dplyr

Examples related to r

Examples related to group-by

Examples related to dplyr

Examples related to frequency