[r] Summarizing multiple columns with dplyr?

All the examples are great, but I figure I'd add one more to show how working in a "tidy" format simplifies things. Right now the data frame is in "wide" format meaning the variables "a" through "d" are represented in columns. To get to a "tidy" (or long) format, you can use gather() from the tidyr package which shifts the variables in columns "a" through "d" into rows. Then you use the group_by() and summarize() functions to get the mean of each group. If you want to present the data in a wide format, just tack on an additional call to the spread() function.


library(tidyverse)

# Create reproducible df
set.seed(101)
df <- tibble(a   = sample(1:5, 10, replace=T), 
             b   = sample(1:5, 10, replace=T), 
             c   = sample(1:5, 10, replace=T), 
             d   = sample(1:5, 10, replace=T), 
             grp = sample(1:3, 10, replace=T))

# Convert to tidy format using gather
df %>%
    gather(key = variable, value = value, a:d) %>%
    group_by(grp, variable) %>%
    summarize(mean = mean(value)) %>%
    spread(variable, mean)
#> Source: local data frame [3 x 5]
#> Groups: grp [3]
#> 
#>     grp        a     b        c        d
#> * <int>    <dbl> <dbl>    <dbl>    <dbl>
#> 1     1 3.000000   3.5 3.250000 3.250000
#> 2     2 1.666667   4.0 4.666667 2.666667
#> 3     3 3.333333   3.0 2.333333 2.333333

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to dplyr

R dplyr: Drop multiple columns How to specify "does not contain" in dplyr filter Select first and last row from grouped data Error: could not find function "%>%" Sum across multiple columns with dplyr Removing NA observations with dplyr::filter() Changing factor levels with dplyr mutate Change value of variable with dplyr dplyr change many data types What does %>% function mean in R?

Examples related to aggregate

Pandas group-by and sum SELECT list is not in GROUP BY clause and contains nonaggregated column Aggregate multiple columns at once Pandas sum by groupby, but exclude certain columns Extract the maximum value within each group in a dataframe How to group dataframe rows into list in pandas groupby Mean per group in a data.frame Summarizing multiple columns with dplyr? data.frame Group By column Compute mean and standard deviation by group for multiple variables in a data.frame