[python] Python : List of dict, if exists increment a dict value, if not append a new dict

I would like do something like that.

list_of_urls = ['http://www.google.fr/', 'http://www.google.fr/', 
                'http://www.google.cn/', 'http://www.google.com/', 
                'http://www.google.fr/', 'http://www.google.fr/', 
                'http://www.google.fr/', 'http://www.google.com/', 
                'http://www.google.fr/', 'http://www.google.com/', 
                'http://www.google.cn/']

urls = [{'url': 'http://www.google.fr/', 'nbr': 1}]

for url in list_of_urls:
    if url in [f['url'] for f in urls]:
         urls[??]['nbr'] += 1
    else:
         urls.append({'url': url, 'nbr': 1})

How can I do ? I don't know if I should take the tuple to edit it or figure out the tuple indices?

Any help ?

This question is related to python loops list tuples

The answer is


To do it exactly your way? You could use the for...else structure

for url in list_of_urls:
    for url_dict in urls:
        if url_dict['url'] == url:
            url_dict['nbr'] += 1
            break
    else:
        urls.append(dict(url=url, nbr=1))

But it is quite inelegant. Do you really have to store the visited urls as a LIST? If you sort it as a dict, indexed by url string, for example, it would be way cleaner:

urls = {'http://www.google.fr/': dict(url='http://www.google.fr/', nbr=1)}

for url in list_of_urls:
    if url in urls:
        urls[url]['nbr'] += 1
    else:
        urls[url] = dict(url=url, nbr=1)

A few things to note in that second example:

  • see how using a dict for urls removes the need for going through the whole urls list when testing for one single url. This approach will be faster.
  • Using dict( ) instead of braces makes your code shorter
  • using list_of_urls, urls and url as variable names make the code quite hard to parse. It's better to find something clearer, such as urls_to_visit, urls_already_visited and current_url. I know, it's longer. But it's clearer.

And of course I'm assuming that dict(url='http://www.google.fr', nbr=1) is a simplification of your own data structure, because otherwise, urls could simply be:

urls = {'http://www.google.fr':1}

for url in list_of_urls:
    if url in urls:
        urls[url] += 1
    else:
        urls[url] = 1

Which can get very elegant with the defaultdict stance:

urls = collections.defaultdict(int)
for url in list_of_urls:
    urls[url] += 1

Except for the first time, each time a word is seen the if statement's test fails. If you are counting a large number of words, many will probably occur multiple times. In a situation where the initialization of a value is only going to occur once and the augmentation of that value will occur many times it is cheaper to use a try statement:

urls_d = {}
for url in list_of_urls:
    try:
        urls_d[url] += 1
    except KeyError:
        urls_d[url] = 1

you can read more about this: https://wiki.python.org/moin/PythonSpeed/PerformanceTips


Use defaultdict:

from collections import defaultdict

urls = defaultdict(int)

for url in list_of_urls:
    urls[url] += 1

This always works fine for me:

for url in list_of_urls:
    urls.setdefault(url, 0)
    urls[url] += 1

Using the default works, but so does:

urls[url] = urls.get(url, 0) + 1

using .get, you can get a default return if it doesn't exist. By default it's None, but in the case I sent you, it would be 0.


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to loops

How to increment a letter N times per iteration and store in an array? Angular 2 Cannot find control with unspecified name attribute on formArrays What is the difference between i = i + 1 and i += 1 in a 'for' loop? Prime numbers between 1 to 100 in C Programming Language Python Loop: List Index Out of Range JavaScript: Difference between .forEach() and .map() Why does using from __future__ import print_function breaks Python2-style print? Creating an array from a text file in Bash Iterate through dictionary values? C# Wait until condition is true

Examples related to list

Convert List to Pandas Dataframe Column Python find elements in one list that are not in the other Sorting a list with stream.sorted() in Java Python Loop: List Index Out of Range How to combine two lists in R How do I multiply each element in a list by a number? Save a list to a .txt file The most efficient way to remove first N elements in a list? TypeError: list indices must be integers or slices, not str Parse JSON String into List<string>

Examples related to tuples

Append a tuple to a list - what's the difference between two ways? How can I access each element of a pair in a pair list? pop/remove items out of a python tuple Python convert tuple to string django: TypeError: 'tuple' object is not callable Why is there no tuple comprehension in Python? Python add item to the tuple Converting string to tuple without splitting characters Convert tuple to list and back How to form tuple column from two columns in Pandas