[python] Python find elements in one list that are not in the other

I need to compare two lists in order to create a new list of specific elements found in one list but not in the other. For example:

main_list=[]
list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"] 

I want to loop through list_1 and append to main_list all the elements from list_2 that are not found in list_1.

The result should be:

main_list=["f", "m"]

How can I do it with python?

This question is related to python list

The answer is


If you want a one-liner solution (ignoring imports) that only requires O(max(n, m)) work for inputs of length n and m, not O(n * m) work, you can do so with the itertools module:

from itertools import filterfalse

main_list = list(filterfalse(set(list_1).__contains__, list_2))

This takes advantage of the functional functions taking a callback function on construction, allowing it to create the callback once and reuse it for every element without needing to store it somewhere (because filterfalse stores it internally); list comprehensions and generator expressions can do this, but it's ugly.†

That gets the same results in a single line as:

main_list = [x for x in list_2 if x not in list_1]

with the speed of:

set_1 = set(list_1)
main_list = [x for x in list_2 if x not in set_1]

Of course, if the comparisons are intended to be positional, so:

list_1 = [1, 2, 3]
list_2 = [2, 3, 4]

should produce:

main_list = [2, 3, 4]

(because no value in list_2 has a match at the same index in list_1), you should definitely go with Patrick's answer, which involves no temporary lists or sets (even with sets being roughly O(1), they have a higher "constant" factor per check than simple equality checks) and involves O(min(n, m)) work, less than any other answer, and if your problem is position sensitive, is the only correct solution when matching elements appear at mismatched offsets.

†: The way to do the same thing with a list comprehension as a one-liner would be to abuse nested looping to create and cache value(s) in the "outermost" loop, e.g.:

main_list = [x for set_1 in (set(list_1),) for x in list_2 if x not in set_1]

which also gives a minor performance benefit on Python 3 (because now set_1 is locally scoped in the comprehension code, rather than looked up from nested scope for each check; on Python 2 that doesn't matter, because Python 2 doesn't use closures for list comprehensions; they operate in the same scope they're used in).


main_list=[]
list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"]

for i in list_2:
    if i not in list_1:
        main_list.append(i)

print(main_list)

output:

['f', 'm']

I used two methods and I found one method useful over other. Here is my answer:

My input data:

crkmod_mpp = ['M13','M18','M19','M24']
testmod_mpp = ['M13','M14','M15','M16','M17','M18','M19','M20','M21','M22','M23','M24']

Method1: np.setdiff1d I like this approach over other because it preserves the position

test= list(np.setdiff1d(testmod_mpp,crkmod_mpp))
print(test)
['M15', 'M16', 'M22', 'M23', 'M20', 'M14', 'M17', 'M21']

Method2: Though it gives same answer as in Method1 but disturbs the order

test = list(set(testmod_mpp).difference(set(crkmod_mpp)))
print(test)
['POA23', 'POA15', 'POA17', 'POA16', 'POA22', 'POA18', 'POA24', 'POA21']

Method1 np.setdiff1d meets my requirements perfectly. This answer for information.


Use a list comprehension like this:

main_list = [item for item in list_2 if item not in list_1]

Output:

>>> list_1 = ["a", "b", "c", "d", "e"]
>>> list_2 = ["a", "f", "c", "m"] 
>>> 
>>> main_list = [item for item in list_2 if item not in list_1]
>>> main_list
['f', 'm']

Edit:

Like mentioned in the comments below, with large lists, the above is not the ideal solution. When that's the case, a better option would be converting list_1 to a set first:

set_1 = set(list_1)  # this reduces the lookup time from O(n) to O(1)
main_list = [item for item in list_2 if item not in set_1]

If the number of occurences should be taken into account you probably need to use something like collections.Counter:

list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"] 
from collections import Counter
cnt1 = Counter(list_1)
cnt2 = Counter(list_2)
final = [key for key, counts in cnt2.items() if cnt1.get(key, 0) != counts]

>>> final
['f', 'm']

As promised this can also handle differing number of occurences as "difference":

list_1=["a", "b", "c", "d", "e", 'a']
cnt1 = Counter(list_1)
cnt2 = Counter(list_2)
final = [key for key, counts in cnt2.items() if cnt1.get(key, 0) != counts]

>>> final
['a', 'f', 'm']

I would zip the lists together to compare them element by element.

main_list = [b for a, b in zip(list1, list2) if a!= b]

You can use sets:

main_list = list(set(list_2) - set(list_1))

Output:

>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> set(list_2) - set(list_1)
set(['m', 'f'])
>>> list(set(list_2) - set(list_1))
['m', 'f']

Per @JonClements' comment, here is a tidier version:

>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> list(set(list_2).difference(list_1))
['m', 'f']

From ser1 remove items present in ser2.

Input

ser1 = pd.Series([1, 2, 3, 4, 5]) ser2 = pd.Series([4, 5, 6, 7, 8])

Solution

ser1[~ser1.isin(ser2)]


Not sure why the above explanations are so complicated when you have native methods available:

main_list = list(set(list_2)-set(list_1))