Counting repeated characters in a string in Python

Question

I want to count the number of times each character is repeated in a string  Is there any particular way to do it apart from comparing each character of the string from A-Z and incrementing a counter   Update  in reference to Anthony s answer   Whatever you have suggested till now I have to write 26 times  Is there an easier way

User · Answer

If it an issue of just counting the number of repeatition of a given character in a given string  try something like this    word    babulibobablingo  letter    b   if letter in word      print word count letter

User · Answer

For counting a character in a string you have to use YOUR VARIABLE count  WHAT YOU WANT TO COUNT      If summarization is needed you have to use count   function   variable    turkiye  print variable count  u      output       1

User · Answer

import collections  d   collections defaultdict int  for c in thestring      d c     1   A collections defaultdict is like a dict  subclasses it  actually   but when an entry is sought and not found  instead of reporting it doesn t have it  it makes it and inserts it by calling the supplied 0-argument callable  Most popular are defaultdict int   for counting  or  equivalently  to make a multiset AKA bag data structure   and defaultdict list   which does away forever with the need to use  setdefault akey      append avalue  and similar awkward idioms   So once you ve done this d is a dict-like container mapping every character to the number of times it appears  and you can emit it any way you like  of course  For example  most-popular character first   for c in sorted d  key d get  reverse True     print   s  6d     c  d c

User · Answer

Python 2 7  includes the collections Counter class   import collections results   collections Counter the string  print results

User · Answer

this will show a dict of characters with occurrence count  str    aabcdefghijklmnopqrstuvwxyz  mydict      for char in str      mydict char  mydict get char 0  1  print mydict

User · Answer

If someone is looking for the simplest way without collections module  I guess this will be helpful    gt  gt  gt  s    asldaksldkalskdla   gt  gt  gt   i s count i  for i in set s     a   4   d   3   k   3   s   3   l   4    or   gt  gt  gt    i s count i   for i in set s      a   4     k   3     s   3     l   4     d   3

User · Answer

inputString    input  quot Enter a String  quot   countedArray       for char in inputString      if char in countedArray            countedArray char     1         else          countedArray char    1      print countedArray

User · Answer

I can count the number of days I know Python on my two hands so forgive me if I answer something silly     Instead of using a dict  I thought why not use a list  I m not sure how lists and dictionaries are implemented in Python so this would have to be measured to know what s faster    If this was C   I would just use a normal c-array vector for constant time access  that would definitely be faster  but I don t know what the corresponding datatype is in Python  if there s one       count    0 for i in range 26    for c in    join s lower   split       get rid of whitespaces and capital letters     count ord c  - 97     1            ord  a      97   It s also possible to make the list s size ord  z   and then get rid of the 97 subtraction everywhere  but if you optimize  why not all the way     EDIT  A commenter suggested that the join split is not worth the possible gain of using a list  so I thought why not get rid of it   count    0 for i in range 26    for c in s      if c isalpha    count ord c lower    - 97     1

User · Answer

Grand Performance Comparison  Scroll to the end for a TL DR graph  Since I had  nothing better to do   understand  I had just a lot of work   I decided to do a little performance contest  I assembled the most sensible or interesting answers and did some simple timeit in CPython 3 5 1 on them  I tested them with only one string  which is a typical input in my case    gt  gt  gt  s    ZDXMZKMXFDKXZFKZ   gt  gt  gt  len s  16   Be aware that results might vary for different inputs  be it different length of the string or different number of distinct characters  or different average number of occurrences per character     Don t reinvent the wheel  Python has made it simple for us  The collections Counter class does exactly what we want and a lot more  Its usage is by far the simplest of all the methods mentioned here   taken from  oefe  nice find   gt  gt  gt  timeit  Counter s    globals locals    8 208566107001388   Counter goes the extra mile  which is why it takes so long     Dictionary  comprende   Let s try using a simple dict instead  First  let s do it declaratively  using dict comprehension   I came up with this myself      gt  gt  gt  timeit   c  s count c  for c in s    globals locals    4 551155784000002   This will go through s from beginning to end  and for each character it will count the number of its occurrences in s  Since s contains duplicate characters  the above method searches s several times for the same character  The result is naturally always the same  So let s count the number of occurrences just once for each character   I came up with this myself  and so did  IrshadBhat   gt  gt  gt  timeit   c  s count c  for c in set s     globals locals    3 1484066140001232   Better  But we still have to search through the string to count the occurrences  One search for each distinct character  That means we re going to read the string more than once  We can do better than that  But for that  we have to get off our declarativist high horse and descend into an imperative mindset   Exceptional code  AKA Gotta catch  em all   inspired by  anthony   gt  gt  gt  timeit         d          for c in s        try          d c     1       except KeyError          d c    1          globals locals    3 7060273620008957   Well  it was worth a try  If you dig into the Python source  I can t say with certainty because I have never really done that   you will probably find that when you do except ExceptionType  Python has to check whether the exception raised is actually of ExceptionType or some other type  Just for the heck of it  let s see how long will it take if we omit that check and catch all exceptions   made by  anthony   gt  gt  gt  timeit         d          for c in s        try          d c     1       except          d c    1          globals locals    3 3506563019982423   It does save some time  so one might be tempted to use this as some sort of optimization   Don t do that  Or actually do  Do it now   INTERLUDE 1  import time while True    try      time sleep 1    except      print  You re trapped in your own trap      You see  It catches KeyboardInterrupt  besides other things  In fact  it catches all the exceptions there are  Including ones you might not have even heard about  like SystemExit   INTERLUDE 2  import sys try    print  Goodbye  I m going to die soon      sys exit   except    print  BACK FROM THE DEAD        Now back to counting letters and numbers and other characters   Playing catch-up  Exceptions aren t the way to go  You have to try hard to catch up with them  and when you finally do  they just throw up on you and then raise their eyebrows like it s your fault  Luckily brave fellows have paved our way so we can do away with exceptions  at least in this little exercise   The dict class has a nice method     get     which allows us to retrieve an item from a dictionary  just like d k   Except when the key k is not in the dictionary  it can return a default value  Let s use that method instead of fiddling with exceptions   credit goes to  Usman   gt  gt  gt  timeit         d          for c in s        d c    d get c  0    1          globals locals    3 2133633289995487   Almost as fast as the set-based dict comprehension  On larger inputs  this one would probably be even faster   Use the right tool for the job  For at least mildly knowledgeable Python programmer  the first thing that comes to mind is probably defaultdict  It does pretty much the same thing as the version above  except instead of a value  you give it a value factory  That might cause some overhead  because the value has to be  constructed  for each missing key individually  Let s see how it performs   hope  AlexMartelli won t crucify me for from collections import defaultdict   gt  gt  gt  timeit         dd   defaultdict int      for c in s        dd c     1          globals locals    3 3430528169992613   Not that bad  I d say the increase in execution time is a small tax to pay for the improved readability  However  we also favor performance  and we will not stop here  Let s take it further and prepopulate the dictionary with zeros  Then we won t have to check every time if the item is already there   hats off to  sqram   gt  gt  gt  timeit         d   dict fromkeys s  0      for c in s        d c     1          globals locals    2 6081761489986093   That s good  Over three times as fast as Counter  yet still simple enough  Personally  this is my favorite in case you don t want to add new characters later  And even if you do  you can still do it  It s just less convenient than it would be in other versions   d update   c  0 for c in set other string  - d keys          Practicality beats purity  except when it s not really practical   Now a bit different kind of counter   IdanK has come up with something interesting  Instead of using a hash table  a k a  dictionary a k a  dict   we can avoid the risk of hash collisions and consequent overhead of their resolution  We can also avoid the overhead of hashing the key  and the extra unoccupied table space  We can use a list  The ASCII values of characters will be indices and their counts will be values  As  IdanK has pointed out  this list gives us constant time access to a character s count  All we have to do is convert each character from str to int using the built-in function ord  That will give us an index into the list  which we will then use to increment the count of the character  So what we do is this  we initialize the list with zeros  do the job  and then convert the list into a dict  This dict will only contain those characters which have non-zero counts  in order to make it compliant with other versions   As a side note  this technique is used in a linear-time sorting algorithm known as count sort or counting sort  It s very efficient  but the range of values being sorted is limited  since each value has to have its own counter  To sort a sequence of 32-bit integers  4 3 billion counters would be needed    gt  gt  gt  timeit         counts    0 for   in range 256       for c in s        counts ord c      1     d    chr i   count for i count in enumerate counts  if count    0           globals locals    25 438595562001865   Ouch  Not cool  Let s try and see how long it takes when we omit building the dictionary    gt  gt  gt  timeit         counts    0 for   in range 256       for c in s        counts ord c      1          globals locals    10 564866792999965   Still bad  But wait  what s  0 for   in range 256    Can t we write it more simply  How about  0    256  That s cleaner  But will it perform better    gt  gt  gt  timeit         counts    0    256     for c in s        counts ord c      1          globals locals    3 290163638001104   Considerably  Now let s put the dictionary back in    gt  gt  gt  timeit         counts    0    256     for c in s        counts ord c      1     d    chr i   count for i count in enumerate counts  if count    0           globals locals    18 000623562998953   Almost six times slower  Why does it take so long  Because when we enumerate counts   we have to check every one of the 256 counts and see if it s zero  But we already know which counts are zero and which are not    gt  gt  gt  timeit         counts    0    256     for c in s        counts ord c      1     d    c  counts ord c   for c in set s            globals locals    5 826531438000529   It probably won t get much better than that  at least not for such a small input  Plus it s only usable for 8-bit EASCII characters            And the winner is      gt  gt  gt  timeit         d          for c in s        if c in d          d c     1       else          d c    1          globals locals    1 8509794599995075   Yep  Even if you have to check every time whether c is in d  for this input it s the fastest way  No pre-population of d will make it faster  again  for this input   It s a lot more verbose than Counter or defaultdict  but also more efficient     That s all folks  This little exercise teaches us a lesson  when optimizing  always measure performance  ideally with your expected inputs  Optimize for the common case  Don t presume something is actually more efficient just because its asymptotic complexity is lower  And last but not least  keep readability in mind  Try to find a compromise between  computer-friendly  and  human-friendly        UPDATE  I have been informed by  MartijnPieters of the function collections  count elements available in Python 3    Help on built-in function  count elements in module  collections    count elements           count elements mapping  iterable  -  None      Count elements in the iterable  updating the mappping   This function is implemented in C  so it should be faster  but this extra performance comes at a price  The price is incompatibility with Python 2 and possibly even future versions  since we re using a private function   From the documentation            a name prefixed with an underscore  e g   spam  should be treated as a non-public part   of the API  whether it is a function  a method or a data member     It should be considered an implementation detail and subject to change without notice    That said  if you still want to save those 620 nanoseconds per iteration    gt  gt  gt  timeit         d           count elements d  s           globals locals    1 229239897998923       UPDATE 2  Large strings  I thought it might be a good idea to re-run the tests on some larger input  since a 16 character string is such a small input that all the possible solutions were quite comparably fast  1 000 iterations in under 30 milliseconds    I decided to use the complete works of Shakespeare as a testing corpus  which turned out to be quite a challenge  since it s over 5MiB in size    I just used the first 100 000 characters of it  and  I had to limit the number of iterations from 1 000 000 to 1 000   import urllib request url    https   ocw mit edu ans7870 6 6 006 s08 lecturenotes files t8 shakespeare txt  s   urllib request urlopen url  read 100 000    collections Counter was really slow on a small input  but the tables have turned  Counter s     gt  7 63926783799991   Na  ve T n2  time dictionary comprehension simply doesn t work   c  s count c  for c in s     gt  15347 603935000052s  tested on 10 iterations  adjusted for 1000    Smart T n  time dictionary comprehension works fine   c  s count c  for c in set s      gt  8 882608592999986   Exceptions are clumsy and slow  d      for c in s    try      d c     1   except KeyError      d c    1    gt  21 26615508399982   Omitting the exception type check doesn t save time  since the exception is only thrown a few times   d      for c in s    try      d c     1   except      d c    1    gt  21 943328911999743   dict get looks nice but runs slow  d      for c in s    d c    d get c  0    1    gt  28 530086210000007   collections defaultdict isn t very fast either  dd   defaultdict int  for c in s    dd c     1    gt  19 43012963199999   dict fromkeys requires reading the  very long  string twice  d   dict fromkeys s  0  for c in s    d c     1    gt  22 70960557699999   Using list instead of dict is neither nice nor fast  counts    0 for   in range 256   for c in s    counts ord c      1  d    chr i   count for i count in enumerate counts  if count    0     gt  26 535474792000002   Leaving out the final conversion to dict doesn t help  counts    0 for   in range 256   for c in s    counts ord c      1    gt  26 27811567400005   It doesn t matter how you construct the list  since it s not the bottleneck  counts    0    256 for c in s    counts ord c      1    gt  25 863524940000048     counts    0    256 for c in s    counts ord c      1  d    chr i   count for i count in enumerate counts  if count    0     gt  26 416733378000004   If you convert list to dict the  smart  way  it s even slower  since you iterate over the string twice   counts    0    256 for c in s    counts ord c      1  d    c  counts ord c   for c in set s      gt  29 492915620000076   The dict   contains   variant may be fast for small strings  but not so much for big ones  d      for c in s    if c in d      d c     1   else      d c    1    gt  23 773295123000025   collections  count elements is about as fast as collections Counter  which uses  count elements internally   d       count elements d  s     gt  7 5814381919999505     Final verdict  Use collections Counter unless you cannot or don t want to         Appendix  NumPy  The numpy package provides a method numpy unique which accomplishes  almost  precisely what we want   The way this method works is very different from all the above methods    It first sorts a copy of the input using Quicksort  which is an O n2  time operation in the worst case  albeit O n log n  on average and O n  in the best case  Then it creates a  mask  array containing True at indices where a run of the same values begins  viz  at indices where the value differs from the previous value  Repeated values produce False in the mask  Example   5 5 5 8 9 9  produces a mask  True  False  False  True  True  False   This mask is then used to extract the unique values from the sorted input - unique chars in the code below  In our example  they would be  5  8  9   Positions of the True values in the mask are taken into an array  and the length of the input is appended at the end of this array  For the above example  this array would be  0  3  4  6   For this array  differences between its elements are calculated  eg   3  1  2   These are the respective counts of the elements in the sorted array - char counts in the code below  Finally  we create a dictionary by zipping unique chars and char counts   5  3  8  1  9  2       import numpy as np  def count chars s       The following statement needs to be changed for different input types      Our input  s  is actually of type  bytes   so we use  np frombuffer       For inputs of type  str   change  np frombuffer  to  np fromstring       or transform the input into a  bytes  instance    arr   np frombuffer s  dtype np uint8     unique chars  char counts   np unique arr  return counts True     return dict zip unique chars  char counts       For the test input  first 100 000 characters of the complete works of Shakespeare   this method performs better than any other tested here  But note that on a different input  this approach might yield worse performance than the other methods  Pre-sortedness of the input and number of repetitions per element are important factors affecting the performance   count chars s     gt  2 960809530000006    If you are thinking about using this method because it s over twice as fast as collections Counter  consider this    collections Counter has linear time complexity  numpy unique is linear at best  quadratic at worst  The speedup is not really that significant - you save  3 5 milliseconds per iteration on an input of length 100 000  Using numpy unique obviously requires numpy    That considered  it seems reasonable to use Counter unless you need to be really fast  And in that case  you better know what you re doing or else you ll end up being slower with numpy than without it       Appendix 2  A somewhat useful plot  I ran the 13 different methods above on prefixes of the complete works of Shakespeare and made an interactive plot  Note that in the plot  both prefixes and durations are displayed in logarithmic scale  the used prefixes are of exponentially increasing length   Click on the items in the legend to show hide them in the plot     Click to open

User · Answer

Below code worked for me without looking for any other Python libraries    def count repeated letter string1       list1         for letter in string1          if string1 count letter  gt  2              if letter not in list1                  list1 append letter        for item in list1          if item                    print item string1 count item     count repeated letter  letter has 1 e and 2 e and 1 t and two t     Output    e 4 t 5 a 4 1 2 n 3 d 3

User · Answer

This is the shortest  most practical I can comeup with without importing extra modules   text    hello cruel world  This is a sample text  d   dict fromkeys text  0  for c in text  d c     1   print d  a   would output 2  And it s also fast

User · Answer

s    today is sunday i would like to relax  numberOfDuplicatedChar   len s  - len set s     no duplicated element in set

User · Answer

dict      for i in set str       b   str count i  0  len str       dict i    b print dict   If my string is   str    this is string     Above code will print         1       2   g   1   i   3   h   1   n   1   s   3   r   1   t   2

User · Answer

Here is the solution    my list    history    history count 0 my str  happppyyyy    for letter in my str      if letter in history          my list remove  history history count           history letter         history count  1      else          history count 0         history count  1         history letter   my list append  history history count         print my list

User · Answer

You can use a dictionary   s    asldaksldkalskdla  dict      for letter in s   if letter not in dict keys      dict letter    1  else    dict letter     1  print dict

User · Answer

My first idea was to do this   chars    abcdefghijklmnopqrstuvwxyz  check string    i am checking this string to see how many times each character appears   for char in chars    count   check string count char    if count  gt  1      print char  count   This is not a good idea  however   This is going to scan the string 26 times  so you re going to potentially do 26 times more work than some of the other answers   You really should do this   count      for s in check string    if s in count      count s     1   else      count s    1  for key in count    if count key   gt  1      print key  count key    This ensures that you only go through the string once  instead of 26 times     Also  Alex s answer is a great one - I was not familiar with the collections module   I ll be using that in the future   His answer is more concise than mine is and technically superior   I recommend using his code over mine

User · Answer

You want to use a dict      usr bin env python  input    this is a string   d       for c in input      try          d c     1     except          d c    1  for k in d keys        print   s   d     k  d k

[python] Counting repeated characters in a string in Python

Examples related to python