[python] How do I use itertools.groupby()?

itertools.groupby is a tool for grouping items.

From the docs, we glean further what it might do:

# [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B

# [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D

groupby objects yield key-group pairs where the group is a generator.

Features

  • A. Group consecutive items together
  • B. Group all occurrences of an item, given a sorted iterable
  • C. Specify how to group items with a key function *

Comparisons

# Define a printer for comparing outputs
>>> def print_groupby(iterable, keyfunc=None):
...    for k, g in it.groupby(iterable, keyfunc):
...        print("key: '{}'--> group: {}".format(k, list(g)))
# Feature A: group consecutive occurrences
>>> print_groupby("BCAACACAADBBB")
key: 'B'--> group: ['B']
key: 'C'--> group: ['C']
key: 'A'--> group: ['A', 'A']
key: 'C'--> group: ['C']
key: 'A'--> group: ['A']
key: 'C'--> group: ['C']
key: 'A'--> group: ['A', 'A']
key: 'D'--> group: ['D']
key: 'B'--> group: ['B', 'B', 'B']

# Feature B: group all occurrences
>>> print_groupby(sorted("BCAACACAADBBB"))
key: 'A'--> group: ['A', 'A', 'A', 'A', 'A']
key: 'B'--> group: ['B', 'B', 'B', 'B']
key: 'C'--> group: ['C', 'C', 'C']
key: 'D'--> group: ['D']

# Feature C: group by a key function
>>> # islower = lambda s: s.islower()                      # equivalent
>>> def islower(s):
...     """Return True if a string is lowercase, else False."""   
...     return s.islower()
>>> print_groupby(sorted("bCAaCacAADBbB"), keyfunc=islower)
key: 'False'--> group: ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'D']
key: 'True'--> group: ['a', 'a', 'b', 'b', 'c']

Uses

Note: Several of the latter examples derive from Víctor Terrón's PyCon (talk) (Spanish), "Kung Fu at Dawn with Itertools". See also the groupby source code written in C.

* A function where all items are passed through and compared, influencing the result. Other objects with key functions include sorted(), max() and min().


Response

# OP: Yes, you can use `groupby`, e.g. 
[do_something(list(g)) for _, g in groupby(lxml_elements, criteria_func)]