Python compute list difference

Question

In Python  what is the best way to compute the difference between two lists   example  A    1 2 3 4  B    2 5   A - B    1 3 4  B - A    5

User · Answer

A    1 2 3 4  B    2 5    A - B x   list set A  - set B    B - A  y   list set B  - set A    print x print y

User · Answer

Use set if you don t care about items order or repetition  Use list comprehensions if you do    gt  gt  gt  def diff first  second           second   set second          return  item for item in first if item not in second    gt  gt  gt  diff A  B   1  3  4   gt  gt  gt  diff B  A   5   gt  gt  gt

User · Answer

In case of a list of dictionaries  the full list comprehension solution works while the set solution raises  TypeError  unhashable type   dict    Test Case  def diff a  b       return  aa for aa in a if aa not in b   d1     a  1   b  1  d2     a  2   b  2  d3     a  3   b  3    gt  gt  gt  diff  d1  d2  d3    d2  d3      a   1   b   1    gt  gt  gt  diff  d1  d2  d3    d1      a   2   b   2     a   3   b   3

User · Answer

You can do a   list set A -set B     and  list set B -set A

User · Answer

You would want to use a set instead of a list

User · Answer

One liner   diff   lambda l1 l2   x for x in l1 if x not in l2  diff A B  diff B A    Or   diff   lambda l1 l2  filter lambda x  x not in l2  l1  diff A B  diff B A

User · Answer

Python 2 7 3  default  Feb 27 2014  19 58 35  - IPython 1 1 0 - timeit   github gist   def diff a  b     b   set b    return  aa for aa in a if aa not in b   def set diff a  b     return list set a  - set b    diff lamb hension   lambda l1 l2   x for x in l1 if x not in l2   diff lamb filter   lambda l1 l2  filter lambda x  x not in l2  l1   from difflib import SequenceMatcher def squeezer a  b     squeeze   SequenceMatcher None  a  b    return reduce lambda p q  p q  map      lambda t  squeeze a t 1  t 2          filter lambda x x 0    equal           squeeze get opcodes        Results     Small a   range 10  b   range 10 2   timeit diff a  b   100000 loops  best of 3  1 97   s per loop  timeit set diff a  b   100000 loops  best of 3  2 71   s per loop  timeit diff lamb hension a  b   100000 loops  best of 3  2 1   s per loop  timeit diff lamb filter a  b   100000 loops  best of 3  3 58   s per loop  timeit squeezer a  b   10000 loops  best of 3  36   s per loop    Medium a   range 10  4  b   range 10  4 2   timeit diff a  b   1000 loops  best of 3  1 17 ms per loop  timeit set diff a  b   1000 loops  best of 3  1 27 ms per loop  timeit diff lamb hension a  b   1 loops  best of 3  736 ms per loop  timeit diff lamb filter a  b   1 loops  best of 3  732 ms per loop  timeit squeezer a  b   100 loops  best of 3  12 8 ms per loop    Big a   xrange 10  7  b   xrange 10  7 2   timeit diff a  b   1 loops  best of 3  1 74 s per loop  timeit set diff a  b   1 loops  best of 3  2 57 s per loop  timeit diff lamb filter a  b     too long to wait for  timeit diff lamb filter a  b     too long to wait for  timeit diff lamb filter a  b     TypeError  sequence index must be integer  not  slice     roman-bodnarchuk list comprehensions function def diff a  b  seems to be faster

User · Answer

most simple way   use set   difference set     list a    1 2 3  list b    2 3  print set list a  difference set list b     answer is set  1

User · Answer

The above examples trivialized the problem of calculating differences   Assuming sorting or de-duplication definitely make it easier to compute the difference  but if your comparison cannot afford those assumptions then you ll need a non-trivial implementation of a diff algorithm   See difflib in the python standard library      usr bin python2 from difflib import SequenceMatcher  A    1 2 3 4  B    2 5   squeeze SequenceMatcher  None  A  B    print  quot A - B     s  quot    reduce  lambda p q  p q                                 map  lambda t  squeeze a t 1  t 2                                        filter lambda x x 0    equal                                              squeeze get opcodes            Or Python3        usr bin python3 from difflib import SequenceMatcher from functools import reduce  A    1 2 3 4  B    2 5   squeeze SequenceMatcher  None  A  B    print   quot A - B     s  quot    reduce  lambda p q  p q                                 map  lambda t  squeeze a t 1  t 2                                        filter lambda x x 0    equal                                              squeeze get opcodes               Output  A - B     1  3  4

User · Answer

Simple code that gives you the difference with multiple items if you want that       a  1 2 3 3 4  b  2 4  tmp   copy deepcopy a  for k in b      if k in tmp          tmp remove k  print tmp

User · Answer

Adding an answer to take care of the case where we want a strict difference with repetitions  i e   there are repetitions in the first list that we want to keep in the result  e g  to get   1  1  1  2  -  1  1  -- gt   1  2   We could use an additional counter to have an elegant difference function  from collections import Counter  def diff first  second       secondCntr   Counter second      second   set second      res          for i in first          if i not in second              res append i          elif i in secondCntr              if secondCntr i   gt  0                  secondCntr i  -  1             else                  res append i              return res

User · Answer

When having a look at TimeComplexity of In-operator  in worst case it works with O n   Even for Sets   So when comparing two arrays we ll have a TimeComplexity of O n  in best case and O n 2  in worst case   An alternative  but unfortunately more complex  solution  which works with O n  in best and worst case is this one     Compares the difference of list a and b   uses a callback function to compare items def diff a  b  callback     a missing in b        ai   0   bi   0    a   sorted a  callback    b   sorted b  callback     while  ai  lt  len a   and  bi  lt  len b         cmp   callback a ai   b bi       if cmp  lt  0        a missing in b append a ai         ai    1     elif cmp  gt  0          Item b is missing in a       bi    1     else          a and b intersecting on this item       ai    1       bi    1      if a and b are not of same length  we need to add the remaining items   for ai in xrange ai  len a        a missing in b append a ai       return a missing in b   e g    gt  gt  gt  a  1 2 3   gt  gt  gt  b  2 4 6   gt  gt  gt  diff a  b  cmp   1  3

User · Answer

In case you want the difference recursively going deep into items of your list  I have written a package for python  https   github com erasmose deepdiff  Installation  Install from PyPi   pip install deepdiff   If you are Python3 you need to also install   pip install future six   Example usage   gt  gt  gt  from deepdiff import DeepDiff  gt  gt  gt  from pprint import pprint  gt  gt  gt  from   future   import print function   Same object returns empty   gt  gt  gt  t1    1 1  2 2  3 3   gt  gt  gt  t2   t1  gt  gt  gt  ddiff   DeepDiff t1  t2   gt  gt  gt  print  ddiff changes           Type of an item has changed   gt  gt  gt  t1    1 1  2 2  3 3   gt  gt  gt  t2    1 1  2  2   3 3   gt  gt  gt  ddiff   DeepDiff t1  t2   gt  gt  gt  print  ddiff changes        type changes     root 2   2  lt type  int  gt  vs  2  lt type  str  gt       Value of an item has changed   gt  gt  gt  t1    1 1  2 2  3 3   gt  gt  gt  t2    1 1  2 4  3 3   gt  gt  gt  ddiff   DeepDiff t1  t2   gt  gt  gt  print  ddiff changes        values changed     root 2   2      gt  gt  4      Item added and or removed   gt  gt  gt  t1    1 1  2 2  3 3  4 4   gt  gt  gt  t2    1 1  2 4  3 3  5 5  6 6   gt  gt  gt  ddiff   DeepDiff t1  t2   gt  gt  gt  pprint  ddiff changes        dic item added     root 5  6           dic item removed     root 4           values changed     root 2   2      gt  gt  4      String difference   gt  gt  gt  t1    1 1  2 2  3 3  4   a   hello    b   world     gt  gt  gt  t2    1 1  2 4  3 3  4   a   hello    b   world      gt  gt  gt  ddiff   DeepDiff t1  t2   gt  gt  gt  pprint  ddiff changes  indent   2         values changed      root 2   2      gt  gt  4                              root 4   b    n---  n     n   -1  1    n-world n world      gt  gt  gt   gt  gt  gt  print  ddiff changes  values changed   1       root 4   b        ---                  -1  1        -world      world    String difference 2           gt  gt  gt  t1    1 1  2 2  3 3  4   a   hello    b   world  nGoodbye  n1 n2 nEnd     gt  gt  gt  t2    1 1  2 2  3 3  4   a   hello    b   world n1 n2 nEnd     gt  gt  gt  ddiff   DeepDiff t1  t2   gt  gt  gt  pprint  ddiff changes  indent   2         values changed      root 4   b    n---  n     n   -1 5  1 4    n-world  n-Goodbye  n world n 1 n 2 n End     gt  gt  gt   gt  gt  gt  print  ddiff changes  values changed   0       root 4   b        ---                  -1 5  1 4        -world      -Goodbye       world      1      2      End   Type change   gt  gt  gt  t1    1 1  2 2  3 3  4   a   hello    b   1  2  3     gt  gt  gt  t2    1 1  2 2  3 3  4   a   hello    b   world n n nEnd     gt  gt  gt  ddiff   DeepDiff t1  t2   gt  gt  gt  pprint  ddiff changes  indent   2         type changes      root 4   b     1  2  3   lt type  list  gt  vs  world n n nEnd  lt type  str  gt       List difference   gt  gt  gt  t1    1 1  2 2  3 3  4   a   hello    b   1  2  3     gt  gt  gt  t2    1 1  2 2  3 3  4   a   hello    b   1  2     gt  gt  gt  ddiff   DeepDiff t1  t2   gt  gt  gt  pprint  ddiff changes  indent   2         list removed     root 4   b     3       List difference 2  Note that it DOES NOT take order into account   gt  gt  gt    Note that it DOES NOT take order into account     t1    1 1  2 2  3 3  4   a   hello    b   1  2  3     gt  gt  gt  t2    1 1  2 2  3 3  4   a   hello    b   1  3  2     gt  gt  gt  ddiff   DeepDiff t1  t2   gt  gt  gt  pprint  ddiff changes  indent   2            List that contains dictionary    gt  gt  gt  t1    1 1  2 2  3 3  4   a   hello    b   1  2   1 1  2 2      gt  gt  gt  t2    1 1  2 2  3 3  4   a   hello    b   1  2   1 3      gt  gt  gt  ddiff   DeepDiff t1  t2   gt  gt  gt  pprint  ddiff changes  indent   2         dic item removed     root 4   b   2  2            values changed     root 4   b   2  1   1      gt  gt  3

User · Answer

If the order does not matter  you can simply calculate the set difference   gt  gt  gt  set  1 2 3 4   - set  2 5   set  1  4  3    gt  gt  gt  set  2 5   - set  1 2 3 4   set  5

[python] Python, compute list difference

Examples related to python

Examples related to list