Summarizing count and conditional aggregate functions on the same factor

Question

Quick and short of it is I m having problems summarizing count and aggregate functions with conditions on the same factor   Suppose I have this dataframe    library dplyr   df   tbl df data frame      company c  Acme    Meca    Emca    Acme    Meca    Emca         year c  2011    2010    2009    2011    2010    2013         product c  Wrench    Hammer    Sonic Screwdriver    Fairy Dust                   Kindness    Helping Hand         price c  5 67    7 12    12 99    10 99   NA  FALSE      which creates this dataframe  in essence       company year  product             price 1    Acme  2011  Wrench              5 67 2    Meca  2010  Hammer              7 12 3    Emca  2009  Sonic Screwdriver   12 99 4    Acme  2011  Fairy Dust          10 99 5    Meca  2010  Kindness            NA                                          n    Emca  2013  Helping Hand        FALSE   Let s say I want to df  lt - group by df  company  year  product  and then get the following info all in one collection  i e  dataframe     Count of each price listing  including NA  False  Count of each with  NA  condition Average price excluding NA and False Max price  summarize df  count   n     satisfies first item obviously    I m having issues trying to get the others   I think I need to use pipe operators  If so  can anyone provide some guidance   This is what I ve tried and it is blatantly wrong  but I m not sure where to go next    summarize df             total count   n               count   filter df  is na price               avg price   filter df   is na price   price    FALSE              max price   max filter df   is na price   price    FALSE     And yes  I have reviewed documentation and I m sure the answers are there  but they might be too advanced for my understanding  Thanks in advance

User · Accepted Answer

Assuming that your original dataset is similar to the one you created (i.e. with NA as character. You could specify na.strings while reading the data using read.table. But, I guess NAs would be detected automatically.

The price column is factor which needs to be converted to numeric class. When you use as.numeric, all the non-numeric elements (i.e. "NA", FALSE) gets coerced to NA) with a warning.

library(dplyr)
df %>%
     mutate(price=as.numeric(as.character(price))) %>%  
     group_by(company, year, product) %>%
     summarise(total.count=n(), 
               count=sum(is.na(price)), 
               avg.price=mean(price,na.rm=TRUE),
               max.price=max(price, na.rm=TRUE))

data

I am using the same dataset (except the ... row) that was showed.

df = tbl_df(data.frame(company=c("Acme", "Meca", "Emca", "Acme", "Meca","Emca"),
 year=c("2011", "2010", "2009", "2011", "2010", "2013"), product=c("Wrench", "Hammer",
 "Sonic Screwdriver", "Fairy Dust", "Kindness", "Helping Hand"), price=c("5.67",
 "7.12", "12.99", "10.99", "NA",FALSE)))

[r] Summarizing count and conditional aggregate functions on the same factor

data

Examples related to r

Examples related to dplyr