[r] Count number of occurences for each unique value

Let's say I have:

v = rep(c(1,2, 2, 2), 25)

Now, I want to count the number of times each unique value appears. unique(v) returns what the unique values are, but not how many they are.

> unique(v)
[1] 1 2

I want something that gives me

length(v[v==1])
[1] 25
length(v[v==2])
[1] 75

but as a more general one-liner :) Something close (but not quite) like this:

#<doesn't work right> length(v[v==unique(v)])

This question is related to r count unique

The answer is


You can try also a tidyverse

library(tidyverse) 
dummyData %>% 
    as.tibble() %>% 
    count(value)
# A tibble: 2 x 2
  value     n
  <dbl> <int>
1     1    25
2     2    75

count_unique_words <-function(wlist) {
ucountlist = list()
unamelist = c()
for (i in wlist)
{
if (is.element(i, unamelist))
    ucountlist[[i]] <- ucountlist[[i]] +1
else
    {
    listlen <- length(ucountlist)
    ucountlist[[i]] <- 1
    unamelist <- c(unamelist, i)
    }
}
ucountlist
}

expt_counts <- count_unique_words(population)
for(i in names(expt_counts))
    cat(i, expt_counts[[i]], "\n")

I know there are many other answers, but here is another way to do it using the sort and rle functions. The function rle stands for Run Length Encoding. It can be used for counts of runs of numbers (see the R man docs on rle), but can also be applied here.

test.data = rep(c(1, 2, 2, 2), 25)
rle(sort(test.data))
## Run Length Encoding
##   lengths: int [1:2] 25 75
##   values : num [1:2] 1 2

If you capture the result, you can access the lengths and values as follows:

## rle returns a list with two items.
result.counts <- rle(sort(test.data))
result.counts$lengths
## [1] 25 75
result.counts$values
## [1] 1 2

It is a one-line approach by using aggregate.

> aggregate(data.frame(count = v), list(value = v), length)

  value count
1     1    25
2     2    75

If you want to run unique on a data.frame (e.g., train.data), and also get the counts (which can be used as the weight in classifiers), you can do the following:

unique.count = function(train.data, all.numeric=FALSE) {                                                                                                                                                                                                 
  # first convert each row in the data.frame to a string                                                                                                                                                                              
  train.data.str = apply(train.data, 1, function(x) paste(x, collapse=','))                                                                                                                                                           
  # use table to index and count the strings                                                                                                                                                                                          
  train.data.str.t = table(train.data.str)                                                                                                                                                                                            
  # get the unique data string from the row.names                                                                                                                                                                                     
  train.data.str.uniq = row.names(train.data.str.t)                                                                                                                                                                                   
  weight = as.numeric(train.data.str.t)                                                                                                                                                                                               
  # convert the unique data string to data.frame
  if (all.numeric) {
    train.data.uniq = as.data.frame(t(apply(cbind(train.data.str.uniq), 1, 
      function(x) as.numeric(unlist(strsplit(x, split=","))))))                                                                                                    
  } else {
    train.data.uniq = as.data.frame(t(apply(cbind(train.data.str.uniq), 1, 
      function(x) unlist(strsplit(x, split=",")))))                                                                                                    
  }
  names(train.data.uniq) = names(train.data)                                                                                                                                                                                          
  list(data=train.data.uniq, weight=weight)                                                                                                                                                                                           
}  

This works for me. Take your vector v

length(summary(as.factor(v),maxsum=50000))

Comment: set maxsum to be large enough to capture the number of unique values

or with the magrittr package

v %>% as.factor %>% summary(maxsum=50000) %>% length


length(unique(df$col)) is the most simple way I can see.


To get an un-dimensioned integer vector that contains the count of unique values, use c().

dummyData = rep(c(1, 2, 2, 2), 25) # Chase's reproducible data
c(table(dummyData)) # get un-dimensioned integer vector
 1  2 
25 75

str(c(table(dummyData)) ) # confirm structure
 Named int [1:2] 25 75
 - attr(*, "names")= chr [1:2] "1" "2"

This may be useful if you need to feed the counts of unique values into another function, and is shorter and more idiomatic than the t(as.data.frame(table(dummyData))[,2] posted in a comment to Chase's answer. Thanks to Ricardo Saporta who pointed this out to me here.


table() function is a good way to go, as Chase suggested. If you are analyzing a large dataset, an alternative way is to use .N function in datatable package.

Make sure you installed the data table package by

install.packages("data.table")

Code:

# Import the data.table package
library(data.table)

# Generate a data table object, which draws a number 10^7 times  
# from 1 to 10 with replacement
DT<-data.table(x=sample(1:10,1E7,TRUE))

# Count Frequency of each factor level
DT[,.N,by=x]

If you have multiple factors (= a multi-dimensional data frame), you can use the dplyr package to count unique values in each combination of factors:

library("dplyr")
data %>% group_by(factor1, factor2) %>% summarize(count=n())

It uses the pipe operator %>% to chain method calls on the data frame data.


If you need to have the number of unique values as an additional column in the data frame containing your values (a column which may represent sample size for example), plyr provides a neat way:

data_frame <- data.frame(v = rep(c(1,2, 2, 2), 25))

library("plyr")
data_frame <- ddply(data_frame, .(v), transform, n = length(v))

Also making the values categorical and calling summary() would work.

> v = rep(as.factor(c(1,2, 2, 2)), 25)
> summary(v)
 1  2 
25 75 

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to count

Count the Number of Tables in a SQL Server Database SQL count rows in a table How to count the occurrence of certain item in an ndarray? Laravel Eloquent - distinct() and count() not working properly together How to count items in JSON data Powershell: count members of a AD group How to count how many values per level in a given factor? Count number of rows by group using dplyr C++ - how to find the length of an integer JPA COUNT with composite primary key query not working

Examples related to unique

Count unique values with pandas per groups Find the unique values in a column and then sort them How can I check if the array of objects have duplicate property values? Firebase: how to generate a unique numeric ID for key? pandas unique values multiple columns Select unique values with 'select' function in 'dplyr' library Generate 'n' unique random numbers within a range SQL - select distinct only on one column Can I use VARCHAR as the PRIMARY KEY? Count unique values in a column in Excel