[python] How does collections.defaultdict work?

I've read the examples in python docs, but still can't figure out what this method means. Can somebody help? Here are two examples from the python docs

>>> from collections import defaultdict

>>> s = 'mississippi'
>>> d = defaultdict(int)
>>> for k in s:
...     d[k] += 1
...
>>> d.items()
[('i', 4), ('p', 2), ('s', 4), ('m', 1)]

and

>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list)
>>> for k, v in s:
...     d[k].append(v)
...
>>> d.items()
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

the parameters int and list are for what?

This question is related to python dictionary default-value defaultdict

The answer is


The defaultdict tool is a container in the collections class of Python. It's similar to the usual dictionary (dict) container, but it has one difference: The value fields' data type is specified upon initialization.

For example:

from collections import defaultdict

d = defaultdict(list)

d['python'].append("awesome")

d['something-else'].append("not relevant")

d['python'].append("language")

for i in d.items():

    print i

This prints:

('python', ['awesome', 'language'])
('something-else', ['not relevant'])

I think its best used in place of a switch case statement. Imagine if we have a switch case statement as below:

option = 1

switch(option) {
    case 1: print '1st option'
    case 2: print '2nd option'
    case 3: print '3rd option'
    default: return 'No such option'
}

There is no switch case statements available in python. We can achieve the same by using defaultdict.

from collections import defaultdict

def default_value(): return "Default Value"
dd = defaultdict(default_value)

dd[1] = '1st option'
dd[2] = '2nd option'
dd[3] = '3rd option'

print(dd[4])    
print(dd[5])    
print(dd[3])

It prints:

Default Value
Default Value
3rd option

In the above snippet dd has no keys 4 or 5 and hence it prints out a default value which we have configured in a helper function. This is quite nicer than a raw dictionary where a KeyError is thrown if key is not present. From this it is evident that defaultdict more like a switch case statement where we can avoid a complicated if-elif-elif-else blocks.

One more good example that impressed me a lot from this site is:

>>> from collections import defaultdict
>>> food_list = 'spam spam spam spam spam spam eggs spam'.split()
>>> food_count = defaultdict(int) # default value of int is 0
>>> for food in food_list:
...     food_count[food] += 1 # increment element's value by 1
...
defaultdict(<type 'int'>, {'eggs': 1, 'spam': 7})
>>>

If we try to access any items other than eggs and spam we will get a count of 0.


The standard dictionary includes the method setdefault() for retrieving a value and establishing a default if the value does not exist. By contrast, defaultdict lets the caller specify the default up front when the container is initialized.

import collections

def default_factory():
    return 'default value'

d = collections.defaultdict(default_factory, foo='bar')
print 'd:', d
print 'foo =>', d['foo']
print 'bar =>', d['bar']

This works well as long as it is appropriate for all keys to have the same default. It can be especially useful if the default is a type used for aggregating or accumulating values, such as a list, set, or even int. The standard library documentation includes several examples of using defaultdict this way.

$ python collections_defaultdict.py

d: defaultdict(<function default_factory at 0x100468c80>, {'foo': 'bar'})
foo => bar
bar => default value

There is a great explanation of defaultdicts here: http://ludovf.net/blog/python-collections-defaultdict/

Basically, the parameters int and list are functions that you pass. Remember that Python accepts function names as arguments. int returns 0 by default and list returns an empty list when called with parentheses.

In normal dictionaries, if in your example I try calling d[a], I will get an error (KeyError), since only keys m, s, i and p exist and key a has not been initialized. But in a defaultdict, it takes a function name as an argument, when you try to use a key that has not been initialized, it simply calls the function you passed in and assigns its return value as the value of the new key.


My own 2¢: you can also subclass defaultdict:

class MyDict(defaultdict):
    def __missing__(self, key):
        value = [None, None]
        self[key] = value
        return value

This could come in handy for very complex cases.


The behavior of defaultdict can be easily mimicked using dict.setdefault instead of d[key] in every call.

In other words, the code:

from collections import defaultdict

d = defaultdict(list)

print(d['key'])                        # empty list []
d['key'].append(1)                     # adding constant 1 to the list
print(d['key'])                        # list containing the constant [1]

is equivalent to:

d = dict()

print(d.setdefault('key', list()))     # empty list []
d.setdefault('key', list()).append(1)  # adding constant 1 to the list
print(d.setdefault('key', list()))     # list containing the constant [1]

The only difference is that, using defaultdict, the list constructor is called only once, and using dict.setdefault the list constructor is called more often (but the code may be rewriten to avoid this, if really needed).

Some may argue there is a performance consideration, but this topic is a minefield. This post shows there isn't a big performance gain in using defaultdict, for example.

IMO, defaultdict is a collection that adds more confusion than benefits to the code. Useless for me, but others may think different.


Without defaultdict, you can probably assign new values to unseen keys but you cannot modify it. For example:

import collections
d = collections.defaultdict(int)
for i in range(10):
  d[i] += i
print(d)
# Output: defaultdict(<class 'int'>, {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9})

import collections
d = {}
for i in range(10):
  d[i] += i
print(d)
# Output: Traceback (most recent call last): File "python", line 4, in <module> KeyError: 0

In short:

defaultdict(int) - the argument int indicates that the values will be int type.

defaultdict(list) - the argument list indicates that the values will be list type.


defaultdict means that if a key is not found in the dictionary, then instead of a KeyError being thrown, a new entry is created. The type of this new entry is given by the argument of defaultdict.

For example:

somedict = {}
print(somedict[3]) # KeyError

someddict = defaultdict(int)
print(someddict[3]) # print int(), thus 0

Well, defaultdict can also raise keyerror in the following case:

    from collections import defaultdict
    d = defaultdict()
    print(d[3]) #raises keyerror

Always remember to give argument to the defaultdict like defaultdict(int).


The documentation and the explanation are pretty much self-explanatory:

http://docs.python.org/library/collections.html#collections.defaultdict

The type function(int/str etc.) passed as an argument is used to initialize a default value for any given key where the key is not present in the dict.


Since the question is about "how it works", some readers may want to see more nuts and bolts. Specifically, the method in question is the __missing__(key) method. See: https://docs.python.org/2/library/collections.html#defaultdict-objects .

More concretely, this answer shows how to make use of __missing__(key) in a practical way: https://stackoverflow.com/a/17956989/1593924

To clarify what 'callable' means, here's an interactive session (from 2.7.6 but should work in v3 too):

>>> x = int
>>> x
<type 'int'>
>>> y = int(5)
>>> y
5
>>> z = x(5)
>>> z
5

>>> from collections import defaultdict
>>> dd = defaultdict(int)
>>> dd
defaultdict(<type 'int'>, {})
>>> dd = defaultdict(x)
>>> dd
defaultdict(<type 'int'>, {})
>>> dd['a']
0
>>> dd
defaultdict(<type 'int'>, {'a': 0})

That was the most typical use of defaultdict (except for the pointless use of the x variable). You can do the same thing with 0 as the explicit default value, but not with a simple value:

>>> dd2 = defaultdict(0)

Traceback (most recent call last):
  File "<pyshell#7>", line 1, in <module>
    dd2 = defaultdict(0)
TypeError: first argument must be callable

Instead, the following works because it passes in a simple function (it creates on the fly a nameless function which takes no arguments and always returns 0):

>>> dd2 = defaultdict(lambda: 0)
>>> dd2
defaultdict(<function <lambda> at 0x02C4C130>, {})
>>> dd2['a']
0
>>> dd2
defaultdict(<function <lambda> at 0x02C4C130>, {'a': 0})
>>> 

And with a different default value:

>>> dd3 = defaultdict(lambda: 1)
>>> dd3
defaultdict(<function <lambda> at 0x02C4C170>, {})
>>> dd3['a']
1
>>> dd3
defaultdict(<function <lambda> at 0x02C4C170>, {'a': 1})
>>> 

defaultdict

"The standard dictionary includes the method setdefault() for retrieving a value and establishing a default if the value does not exist. By contrast, defaultdict lets the caller specify the default(value to be returned) up front when the container is initialized."

as defined by Doug Hellmann in The Python Standard Library by Example

How to use defaultdict

Import defaultdict

>>> from collections import defaultdict

Initialize defaultdict

Initialize it by passing

callable as its first argument(mandatory)

>>> d_int = defaultdict(int)
>>> d_list = defaultdict(list)
>>> def foo():
...     return 'default value'
... 
>>> d_foo = defaultdict(foo)
>>> d_int
defaultdict(<type 'int'>, {})
>>> d_list
defaultdict(<type 'list'>, {})
>>> d_foo
defaultdict(<function foo at 0x7f34a0a69578>, {})

**kwargs as its second argument(optional)

>>> d_int = defaultdict(int, a=10, b=12, c=13)
>>> d_int
defaultdict(<type 'int'>, {'a': 10, 'c': 13, 'b': 12})

or

>>> kwargs = {'a':10,'b':12,'c':13}
>>> d_int = defaultdict(int, **kwargs)
>>> d_int
defaultdict(<type 'int'>, {'a': 10, 'c': 13, 'b': 12})

How does it works

As is a child class of standard dictionary, it can perform all the same functions.

But in case of passing an unknown key it returns the default value instead of error. For ex:

>>> d_int['a']
10
>>> d_int['d']
0
>>> d_int
defaultdict(<type 'int'>, {'a': 10, 'c': 13, 'b': 12, 'd': 0})

In case you want to change default value overwrite default_factory:

>>> d_int.default_factory = lambda: 1
>>> d_int['e']
1
>>> d_int
defaultdict(<function <lambda> at 0x7f34a0a91578>, {'a': 10, 'c': 13, 'b': 12, 'e': 1, 'd': 0})

or

>>> def foo():
...     return 2
>>> d_int.default_factory = foo
>>> d_int['f']
2
>>> d_int
defaultdict(<function foo at 0x7f34a0a0a140>, {'a': 10, 'c': 13, 'b': 12, 'e': 1, 'd': 0, 'f': 2})

Examples in the Question

Example 1

As int has been passed as default_factory, any unknown key will return 0 by default.

Now as the string is passed in the loop, it will increase the count of those alphabets in d.

>>> s = 'mississippi'
>>> d = defaultdict(int)
>>> d.default_factory
<type 'int'>
>>> for k in s:
...     d[k] += 1
>>> d.items()
[('i', 4), ('p', 2), ('s', 4), ('m', 1)]
>>> d
defaultdict(<type 'int'>, {'i': 4, 'p': 2, 's': 4, 'm': 1})

Example 2

As a list has been passed as default_factory, any unknown(non-existent) key will return [ ](ie. list) by default.

Now as the list of tuples is passed in the loop, it will append the value in the d[color]

>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list)
>>> d.default_factory
<type 'list'>
>>> for k, v in s:
...     d[k].append(v)
>>> d.items()
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
>>> d
defaultdict(<type 'list'>, {'blue': [2, 4], 'red': [1], 'yellow': [1, 3]})

Dictionaries are a convenient way to store data for later retrieval by name (key). Keys must be unique, immutable objects, and are typically strings. The values in a dictionary can be anything. For many applications, the values are simple types such as integers and strings.

It gets more interesting when the values in a dictionary are collections (lists, dicts, etc.) In this case, the value (an empty list or dict) must be initialized the first time a given key is used. While this is relatively easy to do manually, the defaultdict type automates and simplifies these kinds of operations. A defaultdict works exactly like a normal dict, but it is initialized with a function (“default factory”) that takes no arguments and provides the default value for a nonexistent key.

A defaultdict will never raise a KeyError. Any key that does not exist gets the value returned by the default factory.

from collections import defaultdict
ice_cream = defaultdict(lambda: 'Vanilla')

ice_cream['Sarah'] = 'Chunky Monkey'
ice_cream['Abdul'] = 'Butter Pecan'

print(ice_cream['Sarah'])
>>>Chunky Monkey

print(ice_cream['Joe'])
>>>Vanilla

Here is another example on How using defaultdict, we can reduce complexity

from collections import defaultdict
# Time complexity O(n^2)
def delete_nth_naive(array, n):
    ans = []
    for num in array:
        if ans.count(num) < n:
            ans.append(num)
    return ans

# Time Complexity O(n), using hash tables.
def delete_nth(array,n):
    result = []
    counts = defaultdict(int)

    for i in array:
        if counts[i] < n:
            result.append(i)
            counts[i] += 1
    return result


x = [1,2,3,1,2,1,2,3]
print(delete_nth(x, n=2))
print(delete_nth_naive(x, n=2))

In conclusion, whenever you need a dictionary, and each element’s value should start with a default value, use a defaultdict.


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to dictionary

JS map return object python JSON object must be str, bytes or bytearray, not 'dict Python update a key in dict if it doesn't exist How to update the value of a key in a dictionary in Python? How to map an array of objects in React C# Dictionary get item by index Are dictionaries ordered in Python 3.6+? Split / Explode a column of dictionaries into separate columns with pandas Writing a dictionary to a text file? enumerate() for dictionary in python

Examples related to default-value

How to set a default value in react-select How to set default values in Go structs What is the default value for Guid? CURRENT_DATE/CURDATE() not working as default DATE value Entity Framework 6 Code first Default value Default values and initialization in Java Python argparse: default value or specified value Javascript Get Element by Id and set the value Why is the default value of the string type null instead of an empty string? SQL Column definition : default value and not null redundant?

Examples related to defaultdict

How does collections.defaultdict work?