I encounter this problem often, and the easiest way to do this is to use the apply()
function within a mutate
command.
library(tidyverse)
df=data.frame(
x1=c(1,0,0,NA,0,1,1,NA,0,1),
x2=c(1,1,NA,1,1,0,NA,NA,0,1),
x3=c(0,1,0,1,1,0,NA,NA,0,1),
x4=c(1,0,NA,1,0,0,NA,0,0,1),
x5=c(1,1,NA,1,1,1,NA,1,0,1))
df %>%
mutate(sum = select(., x1:x5) %>% apply(1, sum, na.rm=TRUE))
Here you could use whatever you want to select the columns using the standard dplyr
tricks (e.g. starts_with()
or contains()
). By doing all the work within a single mutate
command, this action can occur anywhere within a dplyr
stream of processing steps. Finally, by using the apply()
function, you have the flexibility to use whatever summary you need, including your own purpose built summarization function.
Alternatively, if the idea of using a non-tidyverse function is unappealing, then you could gather up the columns, summarize them and finally join the result back to the original data frame.
df <- df %>% mutate( id = 1:n() ) # Need some ID column for this to work
df <- df %>%
group_by(id) %>%
gather('Key', 'value', starts_with('x')) %>%
summarise( Key.Sum = sum(value) ) %>%
left_join( df, . )
Here I used the starts_with()
function to select the columns and calculated the sum and you can do whatever you want with NA
values. The downside to this approach is that while it is pretty flexible, it doesn't really fit into a dplyr
stream of data cleaning steps.