[python] Transpose/Unzip Function (inverse of zip)?

I have a list of 2-item tuples and I'd like to convert them to 2 lists where the first contains the first item in each tuple and the second list holds the second item.

For example:

original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
# and I want to become...
result = (['a', 'b', 'c', 'd'], [1, 2, 3, 4])

Is there a builtin function that does that?

This question is related to python list matrix transpose

The answer is


You could also do

result = ([ a for a,b in original ], [ b for a,b in original ])

It should scale better. Especially if Python makes good on not expanding the list comprehensions unless needed.

(Incidentally, it makes a 2-tuple (pair) of lists, rather than a list of tuples, like zip does.)

If generators instead of actual lists are ok, this would do that:

result = (( a for a,b in original ), ( b for a,b in original ))

The generators don't munch through the list until you ask for each element, but on the other hand, they do keep references to the original list.


I like to use zip(*iterable) (which is the piece of code you're looking for) in my programs as so:

def unzip(iterable):
    return zip(*iterable)

I find unzip more readable.


While numpy arrays and pandas may be preferrable, this function imitates the behavior of zip(*args) when called as unzip(args).

Allows for generators, like the result from zip in Python 3, to be passed as args as it iterates through values.

def unzip(items, cls=list, ocls=tuple):
    """Zip function in reverse.

    :param items: Zipped-like iterable.
    :type  items: iterable

    :param cls: Container factory. Callable that returns iterable containers,
        with a callable append attribute, to store the unzipped items. Defaults
        to ``list``.
    :type  cls: callable, optional

    :param ocls: Outer container factory. Callable that returns iterable
        containers. with a callable append attribute, to store the inner
        containers (see ``cls``). Defaults to ``tuple``.
    :type  ocls: callable, optional

    :returns: Unzipped items in instances returned from ``cls``, in an instance
        returned from ``ocls``.
    """
    # iter() will return the same iterator passed to it whenever possible.
    items = iter(items)

    try:
        i = next(items)
    except StopIteration:
        return ocls()

    unzipped = ocls(cls([v]) for v in i)

    for i in items:
        for c, v in zip(unzipped, i):
            c.append(v)

    return unzipped

To use list cointainers, simply run unzip(zipped), as

unzip(zip(["a","b","c"],[1,2,3])) == (["a","b","c"],[1,2,3])

To use deques, or other any container sporting append, pass a factory function.

from collections import deque

unzip([("a",1),("b",2)], deque, list) == [deque(["a","b"]),deque([1,2])]

(Decorate cls and/or main_cls to micro manage container initialization, as briefly shown in the final assert statement above.)


Naive approach

def transpose_finite_iterable(iterable):
    return zip(*iterable)  # `itertools.izip` for Python 2 users

works fine for finite iterable (e.g. sequences like list/tuple/str) of (potentially infinite) iterables which can be illustrated like

| |a_00| |a_10| ... |a_n0| |
| |a_01| |a_11| ... |a_n1| |
| |... | |... | ... |... | |
| |a_0i| |a_1i| ... |a_ni| |
| |... | |... | ... |... | |

where

  • n in N,
  • a_ij corresponds to j-th element of i-th iterable,

and after applying transpose_finite_iterable we get

| |a_00| |a_01| ... |a_0i| ... |
| |a_10| |a_11| ... |a_1i| ... |
| |... | |... | ... |... | ... |
| |a_n0| |a_n1| ... |a_ni| ... |

Python example of such case where a_ij == j, n == 2

>>> from itertools import count
>>> iterable = [count(), count()]
>>> result = transpose_finite_iterable(iterable)
>>> next(result)
(0, 0)
>>> next(result)
(1, 1)

But we can't use transpose_finite_iterable again to return to structure of original iterable because result is an infinite iterable of finite iterables (tuples in our case):

>>> transpose_finite_iterable(result)
... hangs ...
Traceback (most recent call last):
  File "...", line 1, in ...
  File "...", line 2, in transpose_finite_iterable
MemoryError

So how can we deal with this case?

... and here comes the deque

After we take a look at docs of itertools.tee function, there is Python recipe that with some modification can help in our case

def transpose_finite_iterables(iterable):
    iterator = iter(iterable)
    try:
        first_elements = next(iterator)
    except StopIteration:
        return ()
    queues = [deque([element])
              for element in first_elements]

    def coordinate(queue):
        while True:
            if not queue:
                try:
                    elements = next(iterator)
                except StopIteration:
                    return
                for sub_queue, element in zip(queues, elements):
                    sub_queue.append(element)
            yield queue.popleft()

    return tuple(map(coordinate, queues))

let's check

>>> from itertools import count
>>> iterable = [count(), count()]
>>> result = transpose_finite_iterables(transpose_finite_iterable(iterable))
>>> result
(<generator object transpose_finite_iterables.<locals>.coordinate at ...>, <generator object transpose_finite_iterables.<locals>.coordinate at ...>)
>>> next(result[0])
0
>>> next(result[0])
1

Synthesis

Now we can define general function for working with iterables of iterables ones of which are finite and another ones are potentially infinite using functools.singledispatch decorator like

from collections import (abc,
                         deque)
from functools import singledispatch


@singledispatch
def transpose(object_):
    """
    Transposes given object.
    """
    raise TypeError('Unsupported object type: {type}.'
                    .format(type=type))


@transpose.register(abc.Iterable)
def transpose_finite_iterables(object_):
    """
    Transposes given iterable of finite iterables.
    """
    iterator = iter(object_)
    try:
        first_elements = next(iterator)
    except StopIteration:
        return ()
    queues = [deque([element])
              for element in first_elements]

    def coordinate(queue):
        while True:
            if not queue:
                try:
                    elements = next(iterator)
                except StopIteration:
                    return
                for sub_queue, element in zip(queues, elements):
                    sub_queue.append(element)
            yield queue.popleft()

    return tuple(map(coordinate, queues))


def transpose_finite_iterable(object_):
    """
    Transposes given finite iterable of iterables.
    """
    yield from zip(*object_)

try:
    transpose.register(abc.Collection, transpose_finite_iterable)
except AttributeError:
    # Python3.5-
    transpose.register(abc.Mapping, transpose_finite_iterable)
    transpose.register(abc.Sequence, transpose_finite_iterable)
    transpose.register(abc.Set, transpose_finite_iterable)

which can be considered as its own inverse (mathematicians call this kind of functions "involutions") in class of binary operators over finite non-empty iterables.


As a bonus of singledispatching we can handle numpy arrays like

import numpy as np
...
transpose.register(np.ndarray, np.transpose)

and then use it like

>>> array = np.arange(4).reshape((2,2))
>>> array
array([[0, 1],
       [2, 3]])
>>> transpose(array)
array([[0, 2],
       [1, 3]])

Note

Since transpose returns iterators and if someone wants to have a tuple of lists like in OP -- this can be made additionally with map built-in function like

>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> tuple(map(list, transpose(original)))
(['a', 'b', 'c', 'd'], [1, 2, 3, 4])

Advertisement

I've added generalized solution to lz package from 0.5.0 version which can be used like

>>> from lz.transposition import transpose
>>> list(map(tuple, transpose(zip(range(10), range(10, 20)))))
[(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)]

P.S.

There is no solution (at least obvious) for handling potentially infinite iterable of potentially infinite iterables, but this case is less common though.


It's only another way to do it but it helped me a lot so I write it here:

Having this data structure:

X=[1,2,3,4]
Y=['a','b','c','d']
XY=zip(X,Y)

Resulting in:

In: XY
Out: [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]

The more pythonic way to unzip it and go back to the original is this one in my opinion:

x,y=zip(*XY)

But this return a tuple so if you need a list you can use:

x,y=(list(x),list(y))

>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> tuple([list(tup) for tup in zip(*original)])
(['a', 'b', 'c', 'd'], [1, 2, 3, 4])

Gives a tuple of lists as in the question.

list1, list2 = [list(tup) for tup in zip(*original)]

Unpacks the two lists.


If you have lists that are not the same length, you may not want to use zip as per Patricks answer. This works:

>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]

But with different length lists, zip truncates each item to the length of the shortest list:

>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', )])
[('a', 'b', 'c', 'd', 'e')]

You can use map with no function to fill empty results with None:

>>> map(None, *[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', )])
[('a', 'b', 'c', 'd', 'e'), (1, 2, 3, 4, None)]

zip() is marginally faster though.


Consider using more_itertools.unzip:

>>> from more_itertools import unzip
>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> [list(x) for x in unzip(original)]
[['a', 'b', 'c', 'd'], [1, 2, 3, 4]]     

None of the previous answers efficiently provide the required output, which is a tuple of lists, rather than a list of tuples. For the former, you can use tuple with map. Here's the difference:

res1 = list(zip(*original))              # [('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
res2 = tuple(map(list, zip(*original)))  # (['a', 'b', 'c', 'd'], [1, 2, 3, 4])

In addition, most of the previous solutions assume Python 2.7, where zip returns a list rather than an iterator.

For Python 3.x, you will need to pass the result to a function such as list or tuple to exhaust the iterator. For memory-efficient iterators, you can omit the outer list and tuple calls for the respective solutions.


While zip(*seq) is very useful, it may be unsuitable for very long sequences as it will create a tuple of values to be passed in. For example, I've been working with a coordinate system with over a million entries and find it signifcantly faster to create the sequences directly.

A generic approach would be something like this:

from collections import deque
seq = ((a1, b1, …), (a2, b2, …), …)
width = len(seq[0])
output = [deque(len(seq))] * width # preallocate memory
for element in seq:
    for s, item in zip(output, element):
        s.append(item)

But, depending on what you want to do with the result, the choice of collection can make a big difference. In my actual use case, using sets and no internal loop, is noticeably faster than all other approaches.

And, as others have noted, if you are doing this with datasets, it might make sense to use Numpy or Pandas collections instead.


Since it returns tuples (and can use tons of memory), the zip(*zipped) trick seems more clever than useful, to me.

Here's a function that will actually give you the inverse of zip.

def unzip(zipped):
    """Inverse of built-in zip function.
    Args:
        zipped: a list of tuples

    Returns:
        a tuple of lists

    Example:
        a = [1, 2, 3]
        b = [4, 5, 6]
        zipped = list(zip(a, b))

        assert zipped == [(1, 4), (2, 5), (3, 6)]

        unzipped = unzip(zipped)

        assert unzipped == ([1, 2, 3], [4, 5, 6])

    """

    unzipped = ()
    if len(zipped) == 0:
        return unzipped

    dim = len(zipped[0])

    for i in range(dim):
        unzipped = unzipped + ([tup[i] for tup in zipped], )

    return unzipped

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to list

Convert List to Pandas Dataframe Column Python find elements in one list that are not in the other Sorting a list with stream.sorted() in Java Python Loop: List Index Out of Range How to combine two lists in R How do I multiply each element in a list by a number? Save a list to a .txt file The most efficient way to remove first N elements in a list? TypeError: list indices must be integers or slices, not str Parse JSON String into List<string>

Examples related to matrix

How to get element-wise matrix multiplication (Hadamard product) in numpy? How can I plot a confusion matrix? Error: stray '\240' in program What does the error "arguments imply differing number of rows: x, y" mean? How to input matrix (2D list) in Python? Difference between numpy.array shape (R, 1) and (R,) Counting the number of non-NaN elements in a numpy ndarray in Python Inverse of a matrix using numpy How to create an empty matrix in R? numpy matrix vector multiplication

Examples related to transpose

Converting rows into columns and columns into rows using R Postgres - Transpose Rows to Columns Transposing a 2D-array in JavaScript What is the fastest way to transpose a matrix in C++? Excel VBA - Range.Copy transpose paste Transpose list of lists Transposing a 1D NumPy array An efficient way to transpose a file in Bash Transpose/Unzip Function (inverse of zip)?