[python] Better/Faster to Loop through set or list?

If I have a python list that is has many duplicates, and I want to iterate through each item, but not through the duplicates, is it best to use a set (as in set(mylist), or find another way to create a list without duplicates? I was thinking of just looping through the list and checking for duplicates but I figured that's what set() does when it's initialized.

So if mylist = [3,1,5,2,4,4,1,4,2,5,1,3] and I really just want to loop through [1,2,3,4,5] (order doesn't matter), should I use set(mylist) or something else?

An alternative is possible in the last example, since the list contains every integer between its min and max value, I could loop through range(min(mylist),max(mylist)) or through set(mylist). Should I generally try to avoid using set in this case? Also, would finding the min and max be slower than just creating the set?


In the case in the last example, the set is faster:

from numpy.random import random_integers
ids = random_integers(1e3,size=1e6)

def set_loop(mylist):
    idlist = []
    for id in set(mylist):
        idlist.append(id)
    return idlist

def list_loop(mylist):
    idlist = []
    for id in range(min(mylist),max(mylist)):
        idlist.append(id)
    return idlist

%timeit set_loop(ids)
#1 loops, best of 3: 232 ms per loop

%timeit list_loop(ids)
#1 loops, best of 3: 408 ms per loop

This question is related to python list loops set

The answer is


I the list is vary large looping two time over it will take a lot of time and more in the second time you are looping a set not a list and as we know iterating over a set is slower than list.

i think you need the power of generator and set.

def first_test():

    def loop_one_time(my_list):
        # create a set to keep the items.
        iterated_items = set()
        # as we know iterating over list is faster then list.
        for value in my_list: 
            # as we know checking if element exist in set is very fast not
            # metter the size of the set.
            if value not in iterated_items:  
                iterated_items.add(value) # add this item to list
                yield value


    mylist = [3,1,5,2,4,4,1,4,2,5,1,3]

    for v in loop_one_time(mylist):pass



def second_test():
    mylist = [3,1,5,2,4,4,1,4,2,5,1,3]
    s = set(mylist)
    for v in s:pass


import timeit

print(timeit.timeit('first_test()', setup='from __main__ import first_test', number=10000))
print(timeit.timeit('second_test()', setup='from __main__ import second_test', number=10000))

out put:

   0.024003583388435043
   0.010424674188938422

Note: this technique order is guaranteed


While a set may be what you want structure-wise, the question is what is faster. A list is faster. Your example code doesn't accurately compare set vs list because you're converting from a list to a set in set_loop, and then you're creating the list you'll be looping through in list_loop. The set and list you iterate through should be constructed and in memory ahead of time, and simply looped through to see which data structure is faster at iterating:

ids_list = range(1000000)
ids_set = set(ids)
def f(x):
    for i in x:
         pass

%timeit f(ids_set)
#1 loops, best of 3: 214 ms per loop
%timeit f(ids_list)
#1 loops, best of 3: 176 ms per loop

For simplicity's sake: newList = list(set(oldList))

But there are better options out there if you'd like to get speed/ordering/optimization instead: http://www.peterbe.com/plog/uniqifiers-benchmark


set is what you want, so you should use set. Trying to be clever introduces subtle bugs like forgetting to add one tomax(mylist)! Code defensively. Worry about what's faster when you determine that it is too slow.

range(min(mylist), max(mylist) + 1)  # <-- don't forget to add 1

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to list

Convert List to Pandas Dataframe Column Python find elements in one list that are not in the other Sorting a list with stream.sorted() in Java Python Loop: List Index Out of Range How to combine two lists in R How do I multiply each element in a list by a number? Save a list to a .txt file The most efficient way to remove first N elements in a list? TypeError: list indices must be integers or slices, not str Parse JSON String into List<string>

Examples related to loops

How to increment a letter N times per iteration and store in an array? Angular 2 Cannot find control with unspecified name attribute on formArrays What is the difference between i = i + 1 and i += 1 in a 'for' loop? Prime numbers between 1 to 100 in C Programming Language Python Loop: List Index Out of Range JavaScript: Difference between .forEach() and .map() Why does using from __future__ import print_function breaks Python2-style print? Creating an array from a text file in Bash Iterate through dictionary values? C# Wait until condition is true

Examples related to set

java, get set methods golang why don't we have a set datastructure Simplest way to merge ES6 Maps/Sets? Swift Set to Array JavaScript Array to Set How to sort a HashSet? Python Set Comprehension How to get first item from a java.util.Set? Getting the difference between two sets Python convert set to string and vice versa