How do I use itertools groupby

Question

I haven t been able to find an understandable explanation of how to actually use Python s itertools groupby   function   What I m trying to do is this   Take a list - in this case  the children of an objectified lxml element Divide it into groups based on some criteria Then later iterate over each of these groups separately   I ve reviewed the documentation  but I ve had trouble trying to apply them beyond a simple list of numbers  So  how do I use of itertools groupby     Is there another technique I should be using   Pointers to good  quot prerequisite quot  reading would also be appreciated

User · Answer

CaptSolo  I tried your example  but it didn t work   from itertools import groupby    c len list cs    for c cs in groupby  Pedro Manoel      Output      P   1     e   1     d   1     r   1     o   1         1     M   1     a   1     n   1     o   1     e   1     l   1     As you can see  there are two o s and two e s  but they got into separate groups  That s when I realized you need to sort the list passed to the groupby function  So  the correct usage would be   name   list  Pedro Manoel   name sort     c len list cs    for c cs in groupby name     Output          1     M   1     P   1     a   1     d   1     e   2     l   1     n   1     o   2     r   1     Just remembering  if the list is not sorted  the groupby function will not work

User · Answer

itertools groupby is a tool for grouping items  From the docs  we glean further what it might do      k for k  g in groupby  AAAABBBCCDAABBB    -- gt  A B C D A B    list g  for k  g in groupby  AAAABBBCCD    -- gt  AAAA BBB CC D  groupby objects yield key-group pairs where the group is a generator  Features  A  Group consecutive items together B  Group all occurrences of an item  given a sorted iterable C  Specify how to group items with a key function    Comparisons   Define a printer for comparing outputs  gt  gt  gt  def print groupby iterable  keyfunc None          for k  g in it groupby iterable  keyfunc              print  quot key      -- gt  group     quot  format k  list g            Feature A  group consecutive occurrences  gt  gt  gt  print groupby  quot BCAACACAADBBB quot   key   B -- gt  group    B   key   C -- gt  group    C   key   A -- gt  group    A    A   key   C -- gt  group    C   key   A -- gt  group    A   key   C -- gt  group    C   key   A -- gt  group    A    A   key   D -- gt  group    D   key   B -- gt  group    B    B    B      Feature B  group all occurrences  gt  gt  gt  print groupby sorted  quot BCAACACAADBBB quot    key   A -- gt  group    A    A    A    A    A   key   B -- gt  group    B    B    B    B   key   C -- gt  group    C    C    C   key   D -- gt  group    D      Feature C  group by a key function  gt  gt  gt    islower   lambda s  s islower                          equivalent  gt  gt  gt  def islower s            quot  quot  quot Return True if a string is lowercase  else False  quot  quot  quot             return s islower    gt  gt  gt  print groupby sorted  quot bCAaCacAADBbB quot    keyfunc islower  key   False -- gt  group    A    A    A    B    B    C    C    D   key   True -- gt  group    a    a    b    b    c    Uses  Anagrams  see notebook  Binning Group odd and even numbers Group a list by values Remove duplicate elements Find indices of repeated elements in an array Split an array into n-sized chunks Find corresponding elements between two lists Compression algorithm  see notebook  Run Length Encoding Grouping letters by length  key function  see notebook  Consecutive values over a threshold  see notebook  Find ranges of numbers in a list or continuous items  see docs  Find all related longest sequences Take consecutive sequences that meet a condition  see related post   Note  Several of the latter examples derive from V  ctor Terr  n s PyCon  talk   Spanish    quot Kung Fu at Dawn with Itertools quot   See also the groupby source code written in C    A function where all items are passed through and compared  influencing the result   Other objects with key functions include  sorted    max   and min     Response   OP  Yes  you can use  groupby   e g    do something list g   for    g in groupby lxml elements  criteria func

User · Answer

You can write own groupby function              def groupby data                   kv                      for k v in data                      if k not in kv                           kv k   v                      else                          kv k  append v             return kv       Run on ipython         In  10   data      a   1     b  2    a  2            In  11   groupby data          Out 11     a    1  2    b    2

User · Answer

Sorting and groupby   from itertools import groupby  val      name    satyajit    address    btm    pin   560076             name    Mukul    address    Silk board    pin   560078            name    Preetam    address    btm    pin   560076     for pin  list data in groupby sorted val  key lambda k  k  pin    lambda x  x  pin             print pin         for rec in list data                  print rec      o p   560076   name    satyajit    pin   560076   address    btm     name    Preetam    pin   560076   address    btm   560078   name    Mukul    pin   560078   address    Silk board

User · Answer

A neato trick with groupby is to run length encoding in one line     c len list cgen    for c cgen in groupby some string     will give you a list of 2-tuples where the first element is the char and the 2nd is the number of repetitions   Edit  Note that this is what separates itertools groupby from the SQL GROUP BY semantics  itertools doesn t  and in general can t  sort the iterator in advance  so groups with the same  key  aren t merged

User · Answer

This basic implementation helped me understand this function  Hope it helps others as well   arr     1   A     1   B     1   C     2   D     2   E     3   F     for k g in groupby arr  lambda x  x 0        print  --   k   --       for tup in g          print tup 1      tup 0     k   -- 1 -- A B C -- 2 -- D E -- 3 -- F

User · Answer

One useful example that I came across may be helpful   from itertools import groupby   user input  myinput   input     creating empty list to store output  myoutput       for k g in groupby myinput        myoutput append  len list g   int k     print  myoutput    Sample input  14445221  Sample output   1 1   3 4   1 5   2 2   1 1

User · Answer

IMPORTANT NOTE  You have to sort your data first   The part I didn t get is that in the example construction groups      uniquekeys      for k  g in groupby data  keyfunc      groups append list g        Store group iterator as a list    uniquekeys append k   k is the current grouping key  and g is an iterator that you can use to iterate over the group defined by that grouping key  In other words  the groupby iterator itself returns iterators  Here s an example of that  using clearer variable names  from itertools import groupby  things      quot animal quot    quot bear quot      quot animal quot    quot duck quot      quot plant quot    quot cactus quot      quot vehicle quot    quot speed boat quot      quot vehicle quot    quot school bus quot     for key  group in groupby things  lambda x  x 0        for thing in group          print  quot A  s is a  s  quot     thing 1   key       print  quot  quot         This will give you the output   A bear is a animal  A duck is a animal  A cactus is a plant  A speed boat is a vehicle  A school bus is a vehicle   In this example  things is a list of tuples where the first item in each tuple is the group the second item belongs to  The groupby   function takes two arguments   1  the data to group and  2  the function to group it with  Here  lambda x  x 0  tells groupby   to use the first item in each tuple as the grouping key  In the above for statement  groupby returns three  key  group iterator  pairs - once for each unique key  You can use the returned iterator to iterate over each individual item in that group  Here s a slightly different example with the same data  using a list comprehension  for key  group in groupby things  lambda x  x 0        listOfThings    quot  and  quot  join  thing 1  for thing in group       print key    quot s    quot    listOfThings    quot   quot    This will give you the output   animals  bear and duck  plants  cactus  vehicles  speed boat and school bus

User · Answer

Another example   for key  igroup in itertools groupby xrange 12   lambda x  x    5       print key  list igroup    results in  0  0  1  2  3  4  1  5  6  7  8  9  2  10  11    Note that igroup is an iterator  a sub-iterator as the documentation calls it    This is useful for chunking a generator   def chunker items  chunk size          Group items in chunks of chunk size        for  key  group in itertools groupby enumerate items   lambda x  x 0     chunk size           yield  g 1  for g in group   with open  file txt   as fobj      for chunk in chunker fobj           process chunk    Another example of groupby - when the keys are not sorted   In the following example  items in xx are grouped by values in yy   In this case  one set of zeros is output first  followed by a set of ones  followed again by a set of zeros   xx   range 10  yy    0  0  0  1  1  1  0  0  0  0  for group in itertools groupby iter xx   lambda x  yy x        print group 0   list group 1     Produces   0  0  1  2  1  3  4  5  0  6  7  8  9

User · Answer

How do I use Python s itertools groupby      You can use groupby to group things to iterate over  You give groupby an iterable  and a optional key function callable by which to check the items as they come out of the iterable  and it returns an iterator that gives a two-tuple of the result of the key callable and the actual items in another iterable  From the help   groupby iterable   keyfunc   - gt  create an iterator which returns  key  sub-iterator  grouped by each value of key value     Here s an example of groupby using a coroutine to group by a count  it uses a key callable  in this case  coroutine send  to just spit out the count for however many iterations and a grouped sub-iterator of elements   import itertools   def grouper iterable  n       def coroutine n           yield   queue up coroutine         for i in itertools count                for j in range n                   yield i     groups   coroutine n      next groups    queue up coroutine      for c  objs in itertools groupby iterable  groups send           yield c  list objs        or instead of materializing a list of objs  just        return itertools groupby iterable  groups send   list grouper range 10   3     prints    0   0  1  2     1   3  4  5     2   6  7  8     3   9

User · Answer

The example on the Python docs is quite straightforward   groups      uniquekeys      for k  g in groupby data  keyfunc       groups append list g          Store group iterator as a list     uniquekeys append k    So in your case  data is a list of nodes  keyfunc is where the logic of your criteria function goes and then groupby   groups the data   You must be careful to sort the data by the criteria before you call groupby or it won t work  groupby method actually just iterates through a list and whenever the key changes it creates a new group

User · Answer

I would like to give another example where groupby without sort is not working  Adapted from example by James Sulak  from itertools import groupby  things      vehicle    bear      animal    duck      animal    cactus      vehicle    speed boat      vehicle    school bus     for key  group in groupby things  lambda x  x 0        for thing in group          print  A  s is a  s      thing 1   key      print       output is  A bear is a vehicle   A duck is a animal  A cactus is a animal   A speed boat is a vehicle  A school bus is a vehicle    there are two groups with vehicule  whereas one could expect only one group

User · Answer

WARNING   The syntax list groupby       won t work the way that you intend  It seems to destroy the internal iterator objects  so using  for x in list groupby range 10         print list x 1      will produce                               9    Instead  of list groupby        try   k  list g   for k g in groupby        or if you use that syntax often   def groupbylist  args    kwargs       return   k  list g   for k  g in groupby  args    kwargs     and get access to the groupby functionality while avoiding those pesky  for small data  iterators all together

[python] How do I use itertools.groupby()?

Examples related to python

Examples related to itertools