[python] Find the item with maximum occurrences in a list

In Python, I have a list:

L = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]  

I want to identify the item that occurred the highest number of times. I am able to solve it but I need the fastest way to do so. I know there is a nice Pythonic answer to this.

This question is related to python list max counting

The answer is


from collections import Counter
most_common,num_most_common = Counter(L).most_common(1)[0] # 4, 6 times

For older Python versions (< 2.7), you can use this recipe to create the Counter class.


In your question, you asked for the fastest way to do it. As has been demonstrated repeatedly, particularly with Python, intuition is not a reliable guide: you need to measure.

Here's a simple test of several different implementations:

import sys
from collections import Counter, defaultdict
from itertools import groupby
from operator import itemgetter
from timeit import timeit

L = [1,2,45,55,5,4,4,4,4,4,4,5456,56,6,7,67]

def max_occurrences_1a(seq=L):
    "dict iteritems"
    c = dict()
    for item in seq:
        c[item] = c.get(item, 0) + 1
    return max(c.iteritems(), key=itemgetter(1))

def max_occurrences_1b(seq=L):
    "dict items"
    c = dict()
    for item in seq:
        c[item] = c.get(item, 0) + 1
    return max(c.items(), key=itemgetter(1))

def max_occurrences_2(seq=L):
    "defaultdict iteritems"
    c = defaultdict(int)
    for item in seq:
        c[item] += 1
    return max(c.iteritems(), key=itemgetter(1))

def max_occurrences_3a(seq=L):
    "sort groupby generator expression"
    return max(((k, sum(1 for i in g)) for k, g in groupby(sorted(seq))), key=itemgetter(1))

def max_occurrences_3b(seq=L):
    "sort groupby list comprehension"
    return max([(k, sum(1 for i in g)) for k, g in groupby(sorted(seq))], key=itemgetter(1))

def max_occurrences_4(seq=L):
    "counter"
    return Counter(L).most_common(1)[0]

versions = [max_occurrences_1a, max_occurrences_1b, max_occurrences_2, max_occurrences_3a, max_occurrences_3b, max_occurrences_4]

print sys.version, "\n"

for vers in versions:
    print vers.__doc__, vers(), timeit(vers, number=20000)

The results on my machine:

2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] 

dict iteritems (4, 6) 0.202214956284
dict items (4, 6) 0.208412885666
defaultdict iteritems (4, 6) 0.221301078796
sort groupby generator expression (4, 6) 0.383440971375
sort groupby list comprehension (4, 6) 0.402786016464
counter (4, 6) 0.564319133759

So it appears that the Counter solution is not the fastest. And, in this case at least, groupby is faster. defaultdict is good but you pay a little bit for its convenience; it's slightly faster to use a regular dict with a get.

What happens if the list is much bigger? Adding L *= 10000 to the test above and reducing the repeat count to 200:

dict iteritems (4, 60000) 10.3451900482
dict items (4, 60000) 10.2988479137
defaultdict iteritems (4, 60000) 5.52838587761
sort groupby generator expression (4, 60000) 11.9538850784
sort groupby list comprehension (4, 60000) 12.1327362061
counter (4, 60000) 14.7495789528

Now defaultdict is the clear winner. So perhaps the cost of the 'get' method and the loss of the inplace add adds up (an examination of the generated code is left as an exercise).

But with the modified test data, the number of unique item values did not change so presumably dict and defaultdict have an advantage there over the other implementations. So what happens if we use the bigger list but substantially increase the number of unique items? Replacing the initialization of L with:

LL = [1,2,45,55,5,4,4,4,4,4,4,5456,56,6,7,67]
L = []
for i in xrange(1,10001):
    L.extend(l * i for l in LL)

dict iteritems (2520, 13) 17.9935798645
dict items (2520, 13) 21.8974409103
defaultdict iteritems (2520, 13) 16.8289561272
sort groupby generator expression (2520, 13) 33.853593111
sort groupby list comprehension (2520, 13) 36.1303369999
counter (2520, 13) 22.626899004

So now Counter is clearly faster than the groupby solutions but still slower than the iteritems versions of dict and defaultdict.

The point of these examples isn't to produce an optimal solution. The point is that there often isn't one optimal general solution. Plus there are other performance criteria. The memory requirements will differ substantially among the solutions and, as the size of the input goes up, memory requirements may become the overriding factor in algorithm selection.

Bottom line: it all depends and you need to measure.


may something like this:

testList = [1, 2, 3, 4, 2, 2, 1, 4, 4] print(max(set(testList), key = testList.count))


Simple and best code:

def max_occ(lst,x):
    count=0
    for i in lst:
        if (i==x):
            count=count+1
    return count

lst=[1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
x=max(lst,key=lst.count)
print(x,"occurs ",max_occ(lst,x),"times")

Output: 4 occurs 6 times


If you're using Python 3.4 or above, you can use statistics.mode()

>>> import statistics
>>> L = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67] 
>>> statistics.mode(L)
4

Note that this will throw a statistics.StatisticsError if the list is empty or if there is not exactly one most common value.


if you are using numpy in your solution for faster computation use this:

import numpy as np
x = np.array([2,5,77,77,77,77,77,77,77,9,0,3,3,3,3,3])
y = np.bincount(x,minlength = max(x))
y = np.argmax(y)   
print(y)  #outputs 77

I am surprised no-one has mentioned the simplest solution,max() with the key list.count:

max(lst,key=lst.count)

Example:

>>> lst = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
>>> max(lst,key=lst.count)
4

This works in Python 3 or 2, but note that it only returns the most frequent item and not also the frequency. Also, in the case of a draw (i.e. joint most frequent item) only a single item is returned.

Although the time complexity of using max() is worse than using Counter.most_common(1) as PM 2Ring comments, the approach benefits from a rapid C implementation and I find this approach is fastest for short lists but slower for larger ones (Python 3.6 timings shown in IPython 5.3):

In [1]: from collections import Counter
   ...: 
   ...: def f1(lst):
   ...:     return max(lst, key = lst.count)
   ...: 
   ...: def f2(lst):
   ...:     return Counter(lst).most_common(1)
   ...: 
   ...: lst0 = [1,2,3,4,3]
   ...: lst1 = lst0[:] * 100
   ...: 

In [2]: %timeit -n 10 f1(lst0)
10 loops, best of 3: 3.32 us per loop

In [3]: %timeit -n 10 f2(lst0)
10 loops, best of 3: 26 us per loop

In [4]: %timeit -n 10 f1(lst1)
10 loops, best of 3: 4.04 ms per loop

In [5]: %timeit -n 10 f2(lst1)
10 loops, best of 3: 75.6 us per loop

Perhaps the most_common() method


Following is the solution which I came up with if there are multiple characters in the string all having the highest frequency.

mystr = input("enter string: ")
#define dictionary to store characters and their frequencies
mydict = {}
#get the unique characters
unique_chars = sorted(set(mystr),key = mystr.index)
#store the characters and their respective frequencies in the dictionary
for c in unique_chars:
    ctr = 0
    for d in mystr:
        if d != " " and d == c:
            ctr = ctr + 1
    mydict[c] = ctr
print(mydict)
#store the maximum frequency
max_freq = max(mydict.values())
print("the highest frequency of occurence: ",max_freq)
#print all characters with highest frequency
print("the characters are:")
for k,v in mydict.items():
    if v == max_freq:
        print(k)

Input: "hello people"

Output:

{'o': 2, 'p': 2, 'h': 1, ' ': 0, 'e': 3, 'l': 3}

the highest frequency of occurence: 3

the characters are:

e

l

My (simply) code (three months studying Python):

def more_frequent_item(lst):
    new_lst = []
    times = 0
    for item in lst:
        count_num = lst.count(item)
        new_lst.append(count_num)
        times = max(new_lst)
    key = max(lst, key=lst.count)
    print("In the list: ")
    print(lst)
    print("The most frequent item is " + str(key) + ". Appears " + str(times) + " times in this list.")


more_frequent_item([1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67])

The output will be:

In the list: 
[1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
The most frequent item is 4. Appears 6 times in this list.

I obtained the best results with groupby from itertools module with this function using Python 3.5.2:

from itertools import groupby

a = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]

def occurrence():
    occurrence, num_times = 0, 0
    for key, values in groupby(a, lambda x : x):
        val = len(list(values))
        if val >= occurrence:
            occurrence, num_times =  key, val
    return occurrence, num_times

occurrence, num_times = occurrence()
print("%d occurred %d times which is the highest number of times" % (occurrence, num_times))

Output:

4 occurred 6 times which is the highest number of times

Test with timeit from timeit module.

I used this script for my test with number= 20000:

from itertools import groupby

def occurrence():
    a = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
    occurrence, num_times = 0, 0
    for key, values in groupby(a, lambda x : x):
        val = len(list(values))
        if val >= occurrence:
            occurrence, num_times =  key, val
    return occurrence, num_times

if __name__ == '__main__':
    from timeit import timeit
    print(timeit("occurrence()", setup = "from __main__ import occurrence",  number = 20000))

Output (The best one):

0.1893607140000313

I want to throw in another solution that looks nice and is fast for short lists.

def mc(seq=L):
    "max/count"
    max_element = max(seq, key=seq.count)
    return (max_element, seq.count(max_element))

You can benchmark this with the code provided by Ned Deily which will give you these results for the smallest test case:

3.5.2 (default, Nov  7 2016, 11:31:36) 
[GCC 6.2.1 20160830] 

dict iteritems (4, 6) 0.2069783889998289
dict items (4, 6) 0.20462976200065896
defaultdict iteritems (4, 6) 0.2095775119996688
sort groupby generator expression (4, 6) 0.4473949929997616
sort groupby list comprehension (4, 6) 0.4367636879997008
counter (4, 6) 0.3618192010007988
max/count (4, 6) 0.20328268999946886

But beware, it is inefficient and thus gets really slow for large lists!


A simple way without any libraries or sets

def mcount(l):
  n = []                  #To store count of each elements
  for x in l:
      count = 0
      for i in range(len(l)):
          if x == l[i]:
              count+=1
      n.append(count)
  a = max(n)              #largest in counts list
  for i in range(len(n)):
      if n[i] == a:
          return(l[i],a)  #element,frequency
  return                  #if something goes wrong

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to list

Convert List to Pandas Dataframe Column Python find elements in one list that are not in the other Sorting a list with stream.sorted() in Java Python Loop: List Index Out of Range How to combine two lists in R How do I multiply each element in a list by a number? Save a list to a .txt file The most efficient way to remove first N elements in a list? TypeError: list indices must be integers or slices, not str Parse JSON String into List<string>

Examples related to max

Min and max value of input in angular4 application numpy max vs amax vs maximum mongodb how to get max value from collections Python find min max and average of a list (array) Max length UITextField How to find the highest value of a column in a data frame in R? MAX function in where clause mysql Check if all values in list are greater than a certain number How do I get the max and min values from a set of numbers entered? SQL: Group by minimum value in one field while selecting distinct rows

Examples related to counting

How to count digits, letters, spaces for a string in Python? How do I count unique visitors to my site? Counting the number of True Booleans in a Python List Find the item with maximum occurrences in a list How to count the frequency of the elements in an unordered list? Item frequency count in Python