If you are still thinking what the! You would not be alone, its actually not that complicated really, let me explain.
We want to turn the following list into a dictionary using the odd entries (counting from 1) as keys mapped to their consecutive even entries.
l = ["a", "b", "c", "d", "e"]
To create a dictionary we can use the built in dict
function for Mapping Types as per the manual the following methods are supported.
dict(one=1, two=2)
dict({'one': 1, 'two': 2})
dict(zip(('one', 'two'), (1, 2)))
dict([['two', 2], ['one', 1]])
The last option suggests that we supply a list of lists with 2 values or (key, value)
tuples, so we want to turn our sequential list into:
l = [["a", "b"], ["c", "d"], ["e",]]
We are also introduced to the zip
function, one of the built-in functions which the manual explains:
returns a list of tuples, where the i-th tuple contains the i-th element from each of the arguments
In other words if we can turn our list into two lists a, c, e
and b, d
then zip
will do the rest.
Slicings which we see used with Strings and also further on in the List section which mainly uses the range or short slice notation but this is what the long slice notation looks like and what we can accomplish with step:
>>> l[::2]
['a', 'c', 'e']
>>> l[1::2]
['b', 'd']
>>> zip(['a', 'c', 'e'], ['b', 'd'])
[('a', 'b'), ('c', 'd')]
>>> dict(zip(l[::2], l[1::2]))
{'a': 'b', 'c': 'd'}
Even though this is the simplest way to understand the mechanics involved there is a downside because slices are new list objects each time, as can be seen with this cloning example:
>>> a = [1, 2, 3]
>>> b = a
>>> b
[1, 2, 3]
>>> b is a
True
>>> b = a[:]
>>> b
[1, 2, 3]
>>> b is a
False
Even though b looks like a they are two separate objects now and this is why we prefer to use the grouper recipe instead.
Although the grouper is explained as part of the itertools module it works perfectly fine with the basic functions too.
Some serious voodoo right? =) But actually nothing more than a bit of syntax sugar for spice, the grouper recipe is accomplished by the following expression.
*[iter(l)]*2
Which more or less translates to two arguments of the same iterator wrapped in a list, if that makes any sense. Lets break it down to help shed some light.
>>> l*2
['a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e']
>>> [l]*2
[['a', 'b', 'c', 'd', 'e'], ['a', 'b', 'c', 'd', 'e']]
>>> [iter(l)]*2
[<listiterator object at 0x100486450>, <listiterator object at 0x100486450>]
>>> zip([iter(l)]*2)
[(<listiterator object at 0x1004865d0>,),(<listiterator object at 0x1004865d0>,)]
>>> zip(*[iter(l)]*2)
[('a', 'b'), ('c', 'd')]
>>> dict(zip(*[iter(l)]*2))
{'a': 'b', 'c': 'd'}
As you can see the addresses for the two iterators remain the same so we are working with the same iterator which zip then first gets a key from and then a value and a key and a value every time stepping the same iterator to accomplish what we did with the slices much more productively.
You would accomplish very much the same with the following which carries a smaller What the? factor perhaps.
>>> it = iter(l)
>>> dict(zip(it, it))
{'a': 'b', 'c': 'd'}
What about the empty key e
if you've noticed it has been missing from all the examples which is because zip
picks the shortest of the two arguments, so what are we to do.
Well one solution might be adding an empty value to odd length lists, you may choose to use append
and an if
statement which would do the trick, albeit slightly boring, right?
>>> if len(l) % 2:
... l.append("")
>>> l
['a', 'b', 'c', 'd', 'e', '']
>>> dict(zip(*[iter(l)]*2))
{'a': 'b', 'c': 'd', 'e': ''}
Now before you shrug away to go type from itertools import izip_longest
you may be surprised to know it is not required, we can accomplish the same, even better IMHO, with the built in functions alone.
I prefer to use the map() function instead of izip_longest() which not only uses shorter syntax doesn't require an import but it can assign an actual None
empty value when required, automagically.
>>> l = ["a", "b", "c", "d", "e"]
>>> l
['a', 'b', 'c', 'd', 'e']
>>> dict(map(None, *[iter(l)]*2))
{'a': 'b', 'c': 'd', 'e': None}
Comparing performance of the two methods, as pointed out by KursedMetal, it is clear that the itertools module far outperforms the map function on large volumes, as a benchmark against 10 million records show.
$ time python -c 'dict(map(None, *[iter(range(10000000))]*2))'
real 0m3.755s
user 0m2.815s
sys 0m0.869s
$ time python -c 'from itertools import izip_longest; dict(izip_longest(*[iter(range(10000000))]*2, fillvalue=None))'
real 0m2.102s
user 0m1.451s
sys 0m0.539s
However the cost of importing the module has its toll on smaller datasets with map returning much quicker up to around 100 thousand records when they start arriving head to head.
$ time python -c 'dict(map(None, *[iter(range(100))]*2))'
real 0m0.046s
user 0m0.029s
sys 0m0.015s
$ time python -c 'from itertools import izip_longest; dict(izip_longest(*[iter(range(100))]*2, fillvalue=None))'
real 0m0.067s
user 0m0.042s
sys 0m0.021s
$ time python -c 'dict(map(None, *[iter(range(100000))]*2))'
real 0m0.074s
user 0m0.050s
sys 0m0.022s
$ time python -c 'from itertools import izip_longest; dict(izip_longest(*[iter(range(100000))]*2, fillvalue=None))'
real 0m0.075s
user 0m0.047s
sys 0m0.024s
See nothing to it! =)
nJoy!