I would like do something like that.
list_of_urls = ['http://www.google.fr/', 'http://www.google.fr/',
'http://www.google.cn/', 'http://www.google.com/',
'http://www.google.fr/', 'http://www.google.fr/',
'http://www.google.fr/', 'http://www.google.com/',
'http://www.google.fr/', 'http://www.google.com/',
'http://www.google.cn/']
urls = [{'url': 'http://www.google.fr/', 'nbr': 1}]
for url in list_of_urls:
if url in [f['url'] for f in urls]:
urls[??]['nbr'] += 1
else:
urls.append({'url': url, 'nbr': 1})
How can I do ? I don't know if I should take the tuple to edit it or figure out the tuple indices?
Any help ?
To do it exactly your way? You could use the for...else structure
for url in list_of_urls:
for url_dict in urls:
if url_dict['url'] == url:
url_dict['nbr'] += 1
break
else:
urls.append(dict(url=url, nbr=1))
But it is quite inelegant. Do you really have to store the visited urls as a LIST? If you sort it as a dict, indexed by url string, for example, it would be way cleaner:
urls = {'http://www.google.fr/': dict(url='http://www.google.fr/', nbr=1)}
for url in list_of_urls:
if url in urls:
urls[url]['nbr'] += 1
else:
urls[url] = dict(url=url, nbr=1)
A few things to note in that second example:
urls
removes the need for going through the whole urls
list when testing for one single url
. This approach will be faster.dict( )
instead of braces makes your code shorterlist_of_urls
, urls
and url
as variable names make the code quite hard to parse. It's better to find something clearer, such as urls_to_visit
, urls_already_visited
and current_url
. I know, it's longer. But it's clearer.And of course I'm assuming that dict(url='http://www.google.fr', nbr=1)
is a simplification of your own data structure, because otherwise, urls
could simply be:
urls = {'http://www.google.fr':1}
for url in list_of_urls:
if url in urls:
urls[url] += 1
else:
urls[url] = 1
Which can get very elegant with the defaultdict stance:
urls = collections.defaultdict(int)
for url in list_of_urls:
urls[url] += 1
Except for the first time, each time a word is seen the if statement's test fails. If you are counting a large number of words, many will probably occur multiple times. In a situation where the initialization of a value is only going to occur once and the augmentation of that value will occur many times it is cheaper to use a try statement:
urls_d = {}
for url in list_of_urls:
try:
urls_d[url] += 1
except KeyError:
urls_d[url] = 1
you can read more about this: https://wiki.python.org/moin/PythonSpeed/PerformanceTips
Use defaultdict:
from collections import defaultdict
urls = defaultdict(int)
for url in list_of_urls:
urls[url] += 1
This always works fine for me:
for url in list_of_urls:
urls.setdefault(url, 0)
urls[url] += 1
Using the default works, but so does:
urls[url] = urls.get(url, 0) + 1
using .get
, you can get a default return if it doesn't exist. By default it's None, but in the case I sent you, it would be 0.
Source: Stackoverflow.com