[r] Count number of rows by group using dplyr

I am using the mtcars dataset. I want to find the number of records for a particular combination of data. Something very similar to the count(*) group by clause in SQL. ddply() from plyr is working for me

library(plyr)
ddply(mtcars, .(cyl,gear),nrow)

has output

  cyl gear V1
1   4    3  1
2   4    4  8
3   4    5  2
4   6    3  2
5   6    4  4
6   6    5  1
7   8    3 12
8   8    5  2

Using this code

library(dplyr)
g <- group_by(mtcars, cyl, gear)
summarise(g, length(gear))

has output

  length(cyl)
1          32

I found various functions to pass in to summarise() but none seem to work for me. One function I found is sum(G), which returned

Error in eval(expr, envir, enclos) : object 'G' not found

Tried using n(), which returned

Error in n() : This function should not be called directly

What am I doing wrong? How can I get group_by() / summarise() to work for me?

This question is related to r dplyr count plyr

The answer is


I think what you are looking for is as follows.

cars_by_cylinders_gears <- mtcars %>%
  group_by(cyl, gear) %>%
  summarise(count = n())

This is using the dplyr package. This is essentially the longhand version of the count () solution provided by docendo discimus.


another approach is to use the double colons:

mtcars %>% 
  dplyr::group_by(cyl, gear) %>%
  dplyr::summarise(length(gear))

There's a special function n() in dplyr to count rows (potentially within groups):

library(dplyr)
mtcars %>% 
  group_by(cyl, gear) %>% 
  summarise(n = n())
#Source: local data frame [8 x 3]
#Groups: cyl [?]
#
#    cyl  gear     n
#  (dbl) (dbl) (int)
#1     4     3     1
#2     4     4     8
#3     4     5     2
#4     6     3     2
#5     6     4     4
#6     6     5     1
#7     8     3    12
#8     8     5     2

But dplyr also offers a handy count function which does exactly the same with less typing:

count(mtcars, cyl, gear)          # or mtcars %>% count(cyl, gear)
#Source: local data frame [8 x 3]
#Groups: cyl [?]
#
#    cyl  gear     n
#  (dbl) (dbl) (int)
#1     4     3     1
#2     4     4     8
#3     4     5     2
#4     6     3     2
#5     6     4     4
#6     6     5     1
#7     8     3    12
#8     8     5     2

Another option, not necesarily more elegant, but does not require to refer to a specific column:

mtcars %>% 
  group_by(cyl, gear) %>%
  do(data.frame(nrow=nrow(.)))

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to dplyr

R dplyr: Drop multiple columns How to specify "does not contain" in dplyr filter Select first and last row from grouped data Error: could not find function "%>%" Sum across multiple columns with dplyr Removing NA observations with dplyr::filter() Changing factor levels with dplyr mutate Change value of variable with dplyr dplyr change many data types What does %>% function mean in R?

Examples related to count

Count the Number of Tables in a SQL Server Database SQL count rows in a table How to count the occurrence of certain item in an ndarray? Laravel Eloquent - distinct() and count() not working properly together How to count items in JSON data Powershell: count members of a AD group How to count how many values per level in a given factor? Count number of rows by group using dplyr C++ - how to find the length of an integer JPA COUNT with composite primary key query not working

Examples related to plyr

Change value of variable with dplyr How to select the rows with maximum values in each group with dplyr? Count number of rows by group using dplyr Aggregate a dataframe on a given column and display another column