I have a list of dictionaries and want each item to be sorted by a specific value.
Take into consideration the list:
[{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
When sorted by name
, it should become:
[{'name':'Bart', 'age':10}, {'name':'Homer', 'age':39}]
This question is related to
python
list
sorting
dictionary
data-structures
import operator
To sort the list of dictionaries by key='name':
list_of_dicts.sort(key=operator.itemgetter('name'))
To sort the list of dictionaries by key='age':
list_of_dicts.sort(key=operator.itemgetter('age'))
my_list = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
my_list.sort(lambda x,y : cmp(x['name'], y['name']))
my_list
will now be what you want.
Or better:
Since Python 2.4, there's a key
argument is both more efficient and neater:
my_list = sorted(my_list, key=lambda k: k['name'])
...the lambda is, IMO, easier to understand than operator.itemgetter
, but your mileage may vary.
If you want to sort the list by multiple keys, you can do the following:
my_list = [{'name':'Homer', 'age':39}, {'name':'Milhouse', 'age':10}, {'name':'Bart', 'age':10} ]
sortedlist = sorted(my_list , key=lambda elem: "%02d %s" % (elem['age'], elem['name']))
It is rather hackish, since it relies on converting the values into a single string representation for comparison, but it works as expected for numbers including negative ones (although you will need to format your string appropriately with zero paddings if you are using numbers).
a = [{'name':'Homer', 'age':39}, ...]
# This changes the list a
a.sort(key=lambda k : k['name'])
# This returns a new list (a is not modified)
sorted(a, key=lambda k : k['name'])
Sometimes we need to use lower()
. For example,
lists = [{'name':'Homer', 'age':39},
{'name':'Bart', 'age':10},
{'name':'abby', 'age':9}]
lists = sorted(lists, key=lambda k: k['name'])
print(lists)
# [{'name':'Bart', 'age':10}, {'name':'Homer', 'age':39}, {'name':'abby', 'age':9}]
lists = sorted(lists, key=lambda k: k['name'].lower())
print(lists)
# [ {'name':'abby', 'age':9}, {'name':'Bart', 'age':10}, {'name':'Homer', 'age':39}]
Here is the alternative general solution - it sorts elements of a dict by keys and values.
The advantage of it - no need to specify keys, and it would still work if some keys are missing in some of dictionaries.
def sort_key_func(item):
""" Helper function used to sort list of dicts
:param item: dict
:return: sorted list of tuples (k, v)
"""
pairs = []
for k, v in item.items():
pairs.append((k, v))
return sorted(pairs)
sorted(A, key=sort_key_func)
If you do not need the original list
of dictionaries
, you could modify it in-place with sort()
method using a custom key function.
Key function:
def get_name(d):
""" Return the value of a key in a dictionary. """
return d["name"]
The list
to be sorted:
data_one = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age': 10}]
Sorting it in-place:
data_one.sort(key=get_name)
If you need the original list
, call the sorted()
function passing it the list
and the key function, then assign the returned sorted list
to a new variable:
data_two = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age': 10}]
new_data = sorted(data_two, key=get_name)
Printing data_one
and new_data
.
>>> print(data_one)
[{'name': 'Bart', 'age': 10}, {'name': 'Homer', 'age': 39}]
>>> print(new_data)
[{'name': 'Bart', 'age': 10}, {'name': 'Homer', 'age': 39}]
import operator
a_list_of_dicts.sort(key=operator.itemgetter('name'))
'key' is used to sort by an arbitrary value and 'itemgetter' sets that value to each item's 'name' attribute.
Using the Schwartzian transform from Perl,
py = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
do
sort_on = "name"
decorated = [(dict_[sort_on], dict_) for dict_ in py]
decorated.sort()
result = [dict_ for (key, dict_) in decorated]
gives
>>> result
[{'age': 10, 'name': 'Bart'}, {'age': 39, 'name': 'Homer'}]
More on the Perl Schwartzian transform:
In computer science, the Schwartzian transform is a Perl programming idiom used to improve the efficiency of sorting a list of items. This idiom is appropriate for comparison-based sorting when the ordering is actually based on the ordering of a certain property (the key) of the elements, where computing that property is an intensive operation that should be performed a minimal number of times. The Schwartzian Transform is notable in that it does not use named temporary arrays.
If you do not need the original list
of dictionaries
, you could modify it in-place with sort()
method using a custom key function.
Key function:
def get_name(d):
""" Return the value of a key in a dictionary. """
return d["name"]
The list
to be sorted:
data_one = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age': 10}]
Sorting it in-place:
data_one.sort(key=get_name)
If you need the original list
, call the sorted()
function passing it the list
and the key function, then assign the returned sorted list
to a new variable:
data_two = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age': 10}]
new_data = sorted(data_two, key=get_name)
Printing data_one
and new_data
.
>>> print(data_one)
[{'name': 'Bart', 'age': 10}, {'name': 'Homer', 'age': 39}]
>>> print(new_data)
[{'name': 'Bart', 'age': 10}, {'name': 'Homer', 'age': 39}]
my_list = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
my_list.sort(lambda x,y : cmp(x['name'], y['name']))
my_list
will now be what you want.
Or better:
Since Python 2.4, there's a key
argument is both more efficient and neater:
my_list = sorted(my_list, key=lambda k: k['name'])
...the lambda is, IMO, easier to understand than operator.itemgetter
, but your mileage may vary.
import operator
To sort the list of dictionaries by key='name':
list_of_dicts.sort(key=operator.itemgetter('name'))
To sort the list of dictionaries by key='age':
list_of_dicts.sort(key=operator.itemgetter('age'))
You have to implement your own comparison function that will compare the dictionaries by values of name keys. See Sorting Mini-HOW TO from PythonInfo Wiki
I have been a big fan of a filter with lambda. However, it is not best option if you consider time complexity.
sorted_list = sorted(list_to_sort, key= lambda x: x['name'])
# Returns list of values
list_to_sort.sort(key=operator.itemgetter('name'))
# Edits the list, and does not return a new list
# First option
python3.6 -m timeit -s "list_to_sort = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}, {'name':'Faaa', 'age':57}, {'name':'Errr', 'age':20}]" -s "sorted_l=[]" "sorted_l = sorted(list_to_sort, key=lambda e: e['name'])"
1000000 loops, best of 3: 0.736 µsec per loop
# Second option
python3.6 -m timeit -s "list_to_sort = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}, {'name':'Faaa', 'age':57}, {'name':'Errr', 'age':20}]" -s "sorted_l=[]" -s "import operator" "list_to_sort.sort(key=operator.itemgetter('name'))"
1000000 loops, best of 3: 0.438 µsec per loop
my_list = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
my_list.sort(lambda x,y : cmp(x['name'], y['name']))
my_list
will now be what you want.
Or better:
Since Python 2.4, there's a key
argument is both more efficient and neater:
my_list = sorted(my_list, key=lambda k: k['name'])
...the lambda is, IMO, easier to understand than operator.itemgetter
, but your mileage may vary.
I guess you've meant:
[{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
This would be sorted like this:
sorted(l,cmp=lambda x,y: cmp(x['name'],y['name']))
You have to implement your own comparison function that will compare the dictionaries by values of name keys. See Sorting Mini-HOW TO from PythonInfo Wiki
I guess you've meant:
[{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
This would be sorted like this:
sorted(l,cmp=lambda x,y: cmp(x['name'],y['name']))
Using the Schwartzian transform from Perl,
py = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
do
sort_on = "name"
decorated = [(dict_[sort_on], dict_) for dict_ in py]
decorated.sort()
result = [dict_ for (key, dict_) in decorated]
gives
>>> result
[{'age': 10, 'name': 'Bart'}, {'age': 39, 'name': 'Homer'}]
More on the Perl Schwartzian transform:
In computer science, the Schwartzian transform is a Perl programming idiom used to improve the efficiency of sorting a list of items. This idiom is appropriate for comparison-based sorting when the ordering is actually based on the ordering of a certain property (the key) of the elements, where computing that property is an intensive operation that should be performed a minimal number of times. The Schwartzian Transform is notable in that it does not use named temporary arrays.
import operator
To sort the list of dictionaries by key='name':
list_of_dicts.sort(key=operator.itemgetter('name'))
To sort the list of dictionaries by key='age':
list_of_dicts.sort(key=operator.itemgetter('age'))
If performance is a concern, I would use operator.itemgetter
instead of lambda
as built-in functions perform faster than hand-crafted functions. The itemgetter
function seems to perform approximately 20% faster than lambda
based on my testing.
From https://wiki.python.org/moin/PythonSpeed:
Likewise, the builtin functions run faster than hand-built equivalents. For example, map(operator.add, v1, v2) is faster than map(lambda x,y: x+y, v1, v2).
Here is a comparison of sorting speed using lambda
vs itemgetter
.
import random
import operator
# Create a list of 100 dicts with random 8-letter names and random ages from 0 to 100.
l = [{'name': ''.join(random.choices(string.ascii_lowercase, k=8)), 'age': random.randint(0, 100)} for i in range(100)]
# Test the performance with a lambda function sorting on name
%timeit sorted(l, key=lambda x: x['name'])
13 µs ± 388 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# Test the performance with itemgetter sorting on name
%timeit sorted(l, key=operator.itemgetter('name'))
10.7 µs ± 38.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# Check that each technique produces the same sort order
sorted(l, key=lambda x: x['name']) == sorted(l, key=operator.itemgetter('name'))
True
Both techniques sort the list in the same order (verified by execution of the final statement in the code block), but the first one is a little faster.
You could use a custom comparison function, or you could pass in a function that calculates a custom sort key. That's usually more efficient as the key is only calculated once per item, while the comparison function would be called many more times.
You could do it this way:
def mykey(adict): return adict['name']
x = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age':10}]
sorted(x, key=mykey)
But the standard library contains a generic routine for getting items of arbitrary objects: itemgetter
. So try this instead:
from operator import itemgetter
x = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age':10}]
sorted(x, key=itemgetter('name'))
a = [{'name':'Homer', 'age':39}, ...]
# This changes the list a
a.sort(key=lambda k : k['name'])
# This returns a new list (a is not modified)
sorted(a, key=lambda k : k['name'])
import operator
To sort the list of dictionaries by key='name':
list_of_dicts.sort(key=operator.itemgetter('name'))
To sort the list of dictionaries by key='age':
list_of_dicts.sort(key=operator.itemgetter('age'))
Sometimes we need to use lower()
. For example,
lists = [{'name':'Homer', 'age':39},
{'name':'Bart', 'age':10},
{'name':'abby', 'age':9}]
lists = sorted(lists, key=lambda k: k['name'])
print(lists)
# [{'name':'Bart', 'age':10}, {'name':'Homer', 'age':39}, {'name':'abby', 'age':9}]
lists = sorted(lists, key=lambda k: k['name'].lower())
print(lists)
# [ {'name':'abby', 'age':9}, {'name':'Bart', 'age':10}, {'name':'Homer', 'age':39}]
Let's say I have a dictionary D
with the elements below. To sort, just use the key argument in sorted
to pass a custom function as below:
D = {'eggs': 3, 'ham': 1, 'spam': 2}
def get_count(tuple):
return tuple[1]
sorted(D.items(), key = get_count, reverse=True)
# Or
sorted(D.items(), key = lambda x: x[1], reverse=True) # Avoiding get_count function call
Check this out.
Using the Pandas package is another method, though its runtime at large scale is much slower than the more traditional methods proposed by others:
import pandas as pd
listOfDicts = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
df = pd.DataFrame(listOfDicts)
df = df.sort_values('name')
sorted_listOfDicts = df.T.to_dict().values()
Here are some benchmark values for a tiny list and a large (100k+) list of dicts:
setup_large = "listOfDicts = [];\
[listOfDicts.extend(({'name':'Homer', 'age':39}, {'name':'Bart', 'age':10})) for _ in range(50000)];\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(listOfDicts);"
setup_small = "listOfDicts = [];\
listOfDicts.extend(({'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}));\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(listOfDicts);"
method1 = "newlist = sorted(listOfDicts, key=lambda k: k['name'])"
method2 = "newlist = sorted(listOfDicts, key=itemgetter('name')) "
method3 = "df = df.sort_values('name');\
sorted_listOfDicts = df.T.to_dict().values()"
import timeit
t = timeit.Timer(method1, setup_small)
print('Small Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_small)
print('Small Method LC2: ' + str(t.timeit(100)))
t = timeit.Timer(method3, setup_small)
print('Small Method Pandas: ' + str(t.timeit(100)))
t = timeit.Timer(method1, setup_large)
print('Large Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_large)
print('Large Method LC2: ' + str(t.timeit(100)))
t = timeit.Timer(method3, setup_large)
print('Large Method Pandas: ' + str(t.timeit(1)))
#Small Method LC: 0.000163078308105
#Small Method LC2: 0.000134944915771
#Small Method Pandas: 0.0712950229645
#Large Method LC: 0.0321750640869
#Large Method LC2: 0.0206089019775
#Large Method Pandas: 5.81405615807
I guess you've meant:
[{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
This would be sorted like this:
sorted(l,cmp=lambda x,y: cmp(x['name'],y['name']))
import operator
a_list_of_dicts.sort(key=operator.itemgetter('name'))
'key' is used to sort by an arbitrary value and 'itemgetter' sets that value to each item's 'name' attribute.
If performance is a concern, I would use operator.itemgetter
instead of lambda
as built-in functions perform faster than hand-crafted functions. The itemgetter
function seems to perform approximately 20% faster than lambda
based on my testing.
From https://wiki.python.org/moin/PythonSpeed:
Likewise, the builtin functions run faster than hand-built equivalents. For example, map(operator.add, v1, v2) is faster than map(lambda x,y: x+y, v1, v2).
Here is a comparison of sorting speed using lambda
vs itemgetter
.
import random
import operator
# Create a list of 100 dicts with random 8-letter names and random ages from 0 to 100.
l = [{'name': ''.join(random.choices(string.ascii_lowercase, k=8)), 'age': random.randint(0, 100)} for i in range(100)]
# Test the performance with a lambda function sorting on name
%timeit sorted(l, key=lambda x: x['name'])
13 µs ± 388 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# Test the performance with itemgetter sorting on name
%timeit sorted(l, key=operator.itemgetter('name'))
10.7 µs ± 38.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# Check that each technique produces the same sort order
sorted(l, key=lambda x: x['name']) == sorted(l, key=operator.itemgetter('name'))
True
Both techniques sort the list in the same order (verified by execution of the final statement in the code block), but the first one is a little faster.
If you want to sort the list by multiple keys, you can do the following:
my_list = [{'name':'Homer', 'age':39}, {'name':'Milhouse', 'age':10}, {'name':'Bart', 'age':10} ]
sortedlist = sorted(my_list , key=lambda elem: "%02d %s" % (elem['age'], elem['name']))
It is rather hackish, since it relies on converting the values into a single string representation for comparison, but it works as expected for numbers including negative ones (although you will need to format your string appropriately with zero paddings if you are using numbers).
You have to implement your own comparison function that will compare the dictionaries by values of name keys. See Sorting Mini-HOW TO from PythonInfo Wiki
import operator
a_list_of_dicts.sort(key=operator.itemgetter('name'))
'key' is used to sort by an arbitrary value and 'itemgetter' sets that value to each item's 'name' attribute.
You could use a custom comparison function, or you could pass in a function that calculates a custom sort key. That's usually more efficient as the key is only calculated once per item, while the comparison function would be called many more times.
You could do it this way:
def mykey(adict): return adict['name']
x = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age':10}]
sorted(x, key=mykey)
But the standard library contains a generic routine for getting items of arbitrary objects: itemgetter
. So try this instead:
from operator import itemgetter
x = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age':10}]
sorted(x, key=itemgetter('name'))
my_list = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
my_list.sort(lambda x,y : cmp(x['name'], y['name']))
my_list
will now be what you want.
Or better:
Since Python 2.4, there's a key
argument is both more efficient and neater:
my_list = sorted(my_list, key=lambda k: k['name'])
...the lambda is, IMO, easier to understand than operator.itemgetter
, but your mileage may vary.
I guess you've meant:
[{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
This would be sorted like this:
sorted(l,cmp=lambda x,y: cmp(x['name'],y['name']))
You could use a custom comparison function, or you could pass in a function that calculates a custom sort key. That's usually more efficient as the key is only calculated once per item, while the comparison function would be called many more times.
You could do it this way:
def mykey(adict): return adict['name']
x = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age':10}]
sorted(x, key=mykey)
But the standard library contains a generic routine for getting items of arbitrary objects: itemgetter
. So try this instead:
from operator import itemgetter
x = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age':10}]
sorted(x, key=itemgetter('name'))
You could use a custom comparison function, or you could pass in a function that calculates a custom sort key. That's usually more efficient as the key is only calculated once per item, while the comparison function would be called many more times.
You could do it this way:
def mykey(adict): return adict['name']
x = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age':10}]
sorted(x, key=mykey)
But the standard library contains a generic routine for getting items of arbitrary objects: itemgetter
. So try this instead:
from operator import itemgetter
x = [{'name': 'Homer', 'age': 39}, {'name': 'Bart', 'age':10}]
sorted(x, key=itemgetter('name'))
Here is the alternative general solution - it sorts elements of a dict by keys and values.
The advantage of it - no need to specify keys, and it would still work if some keys are missing in some of dictionaries.
def sort_key_func(item):
""" Helper function used to sort list of dicts
:param item: dict
:return: sorted list of tuples (k, v)
"""
pairs = []
for k, v in item.items():
pairs.append((k, v))
return sorted(pairs)
sorted(A, key=sort_key_func)
I have been a big fan of a filter with lambda. However, it is not best option if you consider time complexity.
sorted_list = sorted(list_to_sort, key= lambda x: x['name'])
# Returns list of values
list_to_sort.sort(key=operator.itemgetter('name'))
# Edits the list, and does not return a new list
# First option
python3.6 -m timeit -s "list_to_sort = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}, {'name':'Faaa', 'age':57}, {'name':'Errr', 'age':20}]" -s "sorted_l=[]" "sorted_l = sorted(list_to_sort, key=lambda e: e['name'])"
1000000 loops, best of 3: 0.736 µsec per loop
# Second option
python3.6 -m timeit -s "list_to_sort = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}, {'name':'Faaa', 'age':57}, {'name':'Errr', 'age':20}]" -s "sorted_l=[]" -s "import operator" "list_to_sort.sort(key=operator.itemgetter('name'))"
1000000 loops, best of 3: 0.438 µsec per loop
Using the Pandas package is another method, though its runtime at large scale is much slower than the more traditional methods proposed by others:
import pandas as pd
listOfDicts = [{'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}]
df = pd.DataFrame(listOfDicts)
df = df.sort_values('name')
sorted_listOfDicts = df.T.to_dict().values()
Here are some benchmark values for a tiny list and a large (100k+) list of dicts:
setup_large = "listOfDicts = [];\
[listOfDicts.extend(({'name':'Homer', 'age':39}, {'name':'Bart', 'age':10})) for _ in range(50000)];\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(listOfDicts);"
setup_small = "listOfDicts = [];\
listOfDicts.extend(({'name':'Homer', 'age':39}, {'name':'Bart', 'age':10}));\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(listOfDicts);"
method1 = "newlist = sorted(listOfDicts, key=lambda k: k['name'])"
method2 = "newlist = sorted(listOfDicts, key=itemgetter('name')) "
method3 = "df = df.sort_values('name');\
sorted_listOfDicts = df.T.to_dict().values()"
import timeit
t = timeit.Timer(method1, setup_small)
print('Small Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_small)
print('Small Method LC2: ' + str(t.timeit(100)))
t = timeit.Timer(method3, setup_small)
print('Small Method Pandas: ' + str(t.timeit(100)))
t = timeit.Timer(method1, setup_large)
print('Large Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_large)
print('Large Method LC2: ' + str(t.timeit(100)))
t = timeit.Timer(method3, setup_large)
print('Large Method Pandas: ' + str(t.timeit(1)))
#Small Method LC: 0.000163078308105
#Small Method LC2: 0.000134944915771
#Small Method Pandas: 0.0712950229645
#Large Method LC: 0.0321750640869
#Large Method LC2: 0.0206089019775
#Large Method Pandas: 5.81405615807
You have to implement your own comparison function that will compare the dictionaries by values of name keys. See Sorting Mini-HOW TO from PythonInfo Wiki
Let's say I have a dictionary D
with the elements below. To sort, just use the key argument in sorted
to pass a custom function as below:
D = {'eggs': 3, 'ham': 1, 'spam': 2}
def get_count(tuple):
return tuple[1]
sorted(D.items(), key = get_count, reverse=True)
# Or
sorted(D.items(), key = lambda x: x[1], reverse=True) # Avoiding get_count function call
Check this out.
Source: Stackoverflow.com