[python] numpy: most efficient frequency counts for unique values in an array

In numpy / scipy, is there an efficient way to get frequency counts for unique values in an array?

Something along these lines:

x = array( [1,1,1,2,2,2,5,25,1,1] )
y = freq_count( x )
print y

>> [[1, 5], [2,3], [5,1], [25,1]]

( For you, R users out there, I'm basically looking for the table() function )

This question is related to python arrays performance numpy

The answer is


As of Numpy 1.9, the easiest and fastest method is to simply use numpy.unique, which now has a return_counts keyword argument:

import numpy as np

x = np.array([1,1,1,2,2,2,5,25,1,1])
unique, counts = np.unique(x, return_counts=True)

print np.asarray((unique, counts)).T

Which gives:

 [[ 1  5]
  [ 2  3]
  [ 5  1]
  [25  1]]

A quick comparison with scipy.stats.itemfreq:

In [4]: x = np.random.random_integers(0,100,1e6)

In [5]: %timeit unique, counts = np.unique(x, return_counts=True)
10 loops, best of 3: 31.5 ms per loop

In [6]: %timeit scipy.stats.itemfreq(x)
10 loops, best of 3: 170 ms per loop

Similar questions with python tag:

Similar questions with arrays tag:

Similar questions with performance tag:

Similar questions with numpy tag: