I have a data frame in R
like this:
ID MONTH-YEAR VALUE
110 JAN. 2012 1000
111 JAN. 2012 2000
. .
. .
121 FEB. 2012 3000
131 FEB. 2012 4000
. .
. .
So, for each month of each year there are n
rows and they can be in any order(mean they all are not in continuity and are at breaks). I want to calculate how many rows are there for each MONTH-YEAR
i.e. how many rows are there for JAN. 2012, how many for FEB. 2012 and so on. Something like this:
MONTH-YEAR NUMBER OF ROWS
JAN. 2012 10
FEB. 2012 13
MAR. 2012 6
APR. 2012 9
I tried to do this:
n_row <- nrow(dat1_frame %.% group_by(MONTH-YEAR))
but it does not produce the desired output.How can I do that?
Here is another way of using aggregate
to count rows by group:
my.data <- read.table(text = '
month.year my.cov
Jan.2000 apple
Jan.2000 pear
Jan.2000 peach
Jan.2001 apple
Jan.2001 peach
Feb.2002 pear
', header = TRUE, stringsAsFactors = FALSE, na.strings = NA)
rows.per.group <- aggregate(rep(1, length(my.data$month.year)),
by=list(my.data$month.year), sum)
rows.per.group
# Group.1 x
# 1 Feb.2002 1
# 2 Jan.2000 3
# 3 Jan.2001 2
library(plyr)
ddply(data, .(MONTH-YEAR), nrow)
This will give you the answer, if "MONTH-YEAR" is a variable. First, try unique(data$MONTH-YEAR) and see if it returns unique values (no duplicates).
Then above simple split-apply-combine will return what you are looking for.
Suppose we have a df_data data frame as below
> df_data
ID MONTH-YEAR VALUE
1 110 JAN.2012 1000
2 111 JAN.2012 2000
3 121 FEB.2012 3000
4 131 FEB.2012 4000
5 141 MAR.2012 5000
To count number of rows in df_data grouped by MONTH-YEAR column, you can use:
> summary(df_data$`MONTH-YEAR`)
FEB.2012 JAN.2012 MAR.2012
2 2 1
summary function will create a table from the factor argument, then create a vector for the result (line 7 & 8)
Try using the count function in dplyr:
library(dplyr)
dat1_frame %>%
count(MONTH.YEAR)
I am not sure how you got MONTH-YEAR as a variable name. My R version does not allow for such a variable name, so I replaced it with MONTH.YEAR.
As a side note, the mistake in your code was that dat1_frame %.% group_by(MONTH-YEAR)
without a summarise
function returns the original data frame without any modifications. So, you want to use
dat1_frame %>%
group_by(MONTH.YEAR) %>%
summarise(count=n())
Using the example data set that Ananda dummied up, here's an example using aggregate()
, which is part of core R. aggregate()
just needs something to count as function of the different values of MONTH-YEAR
. In this case, I used VALUE
as the thing to count:
aggregate(cbind(count = VALUE) ~ MONTH.YEAR,
data = mydf,
FUN = function(x){NROW(x)})
which gives you..
MONTH.YEAR count
1 FEB. 2012 2
2 JAN. 2012 2
3 MAR. 2012 1
The count()
function in plyr
does what you want:
library(plyr)
count(mydf, "MONTH-YEAR")
Just for completion the data.table solution:
library(data.table)
mydf <- structure(list(ID = c(110L, 111L, 121L, 131L, 141L),
MONTH.YEAR = c("JAN. 2012", "JAN. 2012",
"FEB. 2012", "FEB. 2012",
"MAR. 2012"),
VALUE = c(1000L, 2000L, 3000L, 4000L, 5000L)),
.Names = c("ID", "MONTH.YEAR", "VALUE"),
class = "data.frame", row.names = c(NA, -5L))
setDT(mydf)
mydf[, .(`Number of rows` = .N), by = MONTH.YEAR]
MONTH.YEAR Number of rows
1: JAN. 2012 2
2: FEB. 2012 2
3: MAR. 2012 1
Source: Stackoverflow.com