How do you remove duplicates from a list whilst preserving order

Question

Is there a built-in that removes duplicates from list in Python  whilst preserving order  I know that I can use a set to remove duplicates  but that destroys the original order  I also know that I can roll my own like this   def uniq input     output        for x in input      if x not in output        output append x    return output    Thanks to unwind for that code sample    But I d like to avail myself of a built-in or a more Pythonic idiom if possible   Related question  In Python  what is the fastest algorithm for removing duplicates from a list so that all elements are unique while preserving order

User · Answer

here is a simple way to do it   list1     hello         w    o    r    l    d   sorted set list1    key lambda x list1 index x     that gives the output     hello         w    o    r    l    d

User · Answer

A solution without using imported modules or sets   text    ask not what your country can do for you ask what you can do for your country  sentence   text split      noduplicates     sentence i   for i in range  0 len sentence   if sentence i  not in sentence  i   print noduplicates    Gives output     ask    not    what    your    country    can    do    for    you

User · Answer

Relatively effective approach with  sorted  a numpy arrays   b   np array  1 3 3  8  12  12 12       numpy hstack  b 0    x 0  for x in zip b 1    b  -1   if x 0   x 1       Outputs   array   1   3   8  12

User · Answer

In Python 3 7 and above  dictionaries are guaranteed to remember their key insertion order  The answer to this question summarizes the current state of affairs   The OrderedDict solution thus becomes obsolete and without any import statements we can simply issue    gt  gt  gt  lst    1  2  1  3  3  2  4   gt  gt  gt  list dict fromkeys lst    1  2  3  4

User · Answer

Not to kick a dead horse  this question is very old and already has lots of good answers   but here is a solution using pandas that is quite fast in many circumstances and is dead simple to use     import pandas as pd  my list    0  1  2  3  4  1  2  3  5    gt  gt  gt  pd Series my list  drop duplicates   tolist     Output     0  1  2  3  4  5

User · Answer

pandas users should check out pandas unique    gt  gt  gt  import pandas as pd  gt  gt  gt  lst    1  2  1  3  3  2  4   gt  gt  gt  pd unique lst  array  1  2  3  4     The function returns a NumPy array  If needed  you can convert it to a list with the tolist method

User · Answer

from itertools import groupby   key for key   in groupby sortedList     The list doesn t even have to be sorted  the sufficient condition is that equal values are grouped together   Edit  I assumed that  preserving order  implies that the list is actually ordered  If this is not the case  then the solution from MizardX is the right one   Community edit  This is however the most elegant way to  compress duplicate consecutive elements into a single element

User · Answer

zmk s approach uses list comprehension which is very fast  yet keeps the order naturally  For applying to case sensitive strings it can be easily modified  This also preserves the original case   def DelDupes aseq        seen   set       return  x for x in aseq if  x lower   not in seen  and  not seen add x lower        Closely associated functions are   def HasDupes aseq        s   set       return any   x lower   in s  or s add x lower     for x in aseq   def GetDupes aseq        s   set       return set x for x in aseq if   x lower   in s  or s add x lower

User · Answer

l    1 2 2 3 3      n      n extend ele for ele in l if ele not in set n     A generator expression that uses the O 1  look up of a set to determine whether or not to include an element in the new list

User · Answer

def remove duplicates thenSort             t     b    c    d   d   a   c   c            t2               for i k in enumerate t                 index   t index k                if i    index                   t2 append t i            return sorted t2        print remove duplicates thenSort

User · Answer

For no hashable types  e g  list of lists   based on MizardX s   def f7 noHash seq      seen   set       return   x for x in seq if str  x   not in seen and not seen add  str  x

User · Answer

Credit to  wjandrea for dict fromdict method idea  def solve arr        return list dict fromkeys arr   -1      -1   This will reverse input and output to iterate properly

User · Answer

For no hashable types  e g  list of lists   based on MizardX s   def f7 noHash seq      seen   set       return   x for x in seq if str  x   not in seen and not seen add  str  x

User · Answer

I think if you wanna maintain the order   you can try this   list1     b   c   d   b   c   a   a       list2   list set list1       list2 sort key list1 index      print list2   OR similarly you can do this   list1     b   c   d   b   c   a   a     list2   sorted set list1  key list1 index    print list2    You can also do this   list1     b   c   d   b   c   a   a       list2          for i in list1          if not i in list2            list2 append i       print list2   It can also be written as this   list1     b   c   d   b   c   a   a       list2           list2 append i  for i in list1 if not i in list2      print list2

User · Answer

If you need one liner then maybe this would help   reduce lambda x  y  x   y if y 0  not in x else x  map lambda x   x  lst         should work but correct me if i m wrong

User · Answer

A simple recursive solution   def uniquefy list a       return uniquefy list a 1    if a 0  in a 1   else  a 0   uniquefy list a 1    if len a  gt 1 else  a 0

User · Answer

from itertools import groupby   key for key   in groupby sortedList     The list doesn t even have to be sorted  the sufficient condition is that equal values are grouped together   Edit  I assumed that  preserving order  implies that the list is actually ordered  If this is not the case  then the solution from MizardX is the right one   Community edit  This is however the most elegant way to  compress duplicate consecutive elements into a single element

User · Answer

An in-place method  This method is quadratic  because we have a linear lookup into the list for every element of the list  to that we have to add the cost of rearranging the list because of the del s    That said  it is possible to operate in place if we start from the end of the list and proceed toward the origin removing each term that is present in the sub-list at its left  This idea in code is simply  for i in range len l -1 0 -1        if l i  in l  i   del l i       A simple test of the implementation  In  91   from random import randint  seed                                                                                             In  92   seed  20080808     l    randint 1 6  for   in range 12     Beijing Olympics                                                                  In  93   for i in range len l -1 0 -1                 print l                print i  l i   l  i   end                   if l i  in l  i                      print     remove   l i                      del l i                else                     print             print l   6  5  1  4  6  1  6  2  2  4  5  2  11 2  6  5  1  4  6  1  6  2  2  4  5   remove 2  6  5  1  4  6  1  6  2  2  4  5  10 5  6  5  1  4  6  1  6  2  2  4   remove 5  6  5  1  4  6  1  6  2  2  4  9 4  6  5  1  4  6  1  6  2  2   remove 4  6  5  1  4  6  1  6  2  2  8 2  6  5  1  4  6  1  6  2   remove 2  6  5  1  4  6  1  6  2  7 2  6  5  1  4  6  1  6   6  5  1  4  6  1  6  2  6 6  6  5  1  4  6  1   remove 6  6  5  1  4  6  1  2  5 1  6  5  1  4  6   remove 1  6  5  1  4  6  2  4 6  6  5  1  4   remove 6  6  5  1  4  2  3 4  6  5  1   6  5  1  4  2  2 1  6  5   6  5  1  4  2  1 5  6   6  5  1  4  2   In  94

User · Answer

Borrowing the recursive idea used in definining Haskell s nub function for lists  this would be a recursive approach   def unique lst       return    if lst     else  lst 0     unique filter lambda x  x   lst 0   lst 1       e g    In  118   unique  1 5 1 1 4 3 4   Out 118    1  5  4  3    I tried it for growing data sizes and saw sub-linear time-complexity  not definitive  but suggests this should be fine for normal data    In  122    timeit unique np random randint 5  size  1    10000 loops  best of 3  25 3 us per loop  In  123    timeit unique np random randint 5  size  10    10000 loops  best of 3  42 9 us per loop  In  124    timeit unique np random randint 5  size  100    10000 loops  best of 3  132 us per loop  In  125    timeit unique np random randint 5  size  1000    1000 loops  best of 3  1 05 ms per loop  In  126    timeit unique np random randint 5  size  10000    100 loops  best of 3  11 ms per loop   I also think it s interesting that this could be readily generalized to uniqueness by other operations  Like this   import operator def unique lst  cmp op operator ne       return    if lst     else  lst 0     unique filter lambda x  cmp op x  lst 0    lst 1     cmp op    For example  you could pass in a function that uses the notion of rounding to the same integer as if it was  equality  for uniqueness purposes  like this   def test round x y       return round x     round y    then unique some list  test round  would provide the unique elements of the list where uniqueness no longer meant traditional equality  which is implied by using any sort of set-based or dict-key-based approach to this problem  but instead meant to take only the first element that rounds to K for each possible integer K that the elements might round to  e g    In  6   unique  1 2  5  1 9  1 1  4 2  3  4 8   test round  Out 6    1 2  5  1 9  4 2  3

User · Answer

I think if you wanna maintain the order   you can try this   list1     b   c   d   b   c   a   a       list2   list set list1       list2 sort key list1 index      print list2   OR similarly you can do this   list1     b   c   d   b   c   a   a     list2   sorted set list1  key list1 index    print list2    You can also do this   list1     b   c   d   b   c   a   a       list2          for i in list1          if not i in list2            list2 append i       print list2   It can also be written as this   list1     b   c   d   b   c   a   a       list2           list2 append i  for i in list1 if not i in list2      print list2

User · Answer

One liner list comprehension  values non duplicated    value for index  value in enumerate values  if value not in values    index

User · Answer

In CPython 3 6   and all other Python implementations starting with Python 3 7    dictionaries are ordered  so the way to remove duplicates from an iterable while keeping it in the original order is   gt  gt  gt  list dict fromkeys  abracadabra      a    b    r    c    d    In Python 3 5 and below  including Python 2 7   use the OrderedDict  My timings show that this is now both the fastest and shortest of the various approaches for Python 3 5   gt  gt  gt  from collections import OrderedDict  gt  gt  gt  list OrderedDict fromkeys  abracadabra      a    b    r    c    d

User · Answer

A simple recursive solution   def uniquefy list a       return uniquefy list a 1    if a 0  in a 1   else  a 0   uniquefy list a 1    if len a  gt 1 else  a 0

User · Answer

this will preserve order and run in O n  time  basically the idea is to create a hole wherever there is a duplicate found and sink it down to the bottom  makes use of a read and write pointer  whenever a duplicate is found only the read pointer advances and write pointer stays on the duplicate entry to overwrite it   def deduplicate l       count           read write     0 0      while read  lt  len l           if l read  in count              read    1             continue         count l read     True         l write    l read          read    1         write    1     return l 0 write

User · Answer

You can reference a list comprehension as it is being built by the symbol    1    For example  the following function unique-ifies a list of elements without changing their order by referencing its list comprehension   def unique my list        return  x for x in my list if x not in locals      1       Demo   l1    1  2  3  4  1  2  3  4  5  l2    x for x in l1 if x not in locals      1     print l2   Output    1  2  3  4  5

User · Answer

Just to add another  very performant  implementation of such a functionality from an external module1  iteration utilities unique everseen    gt  gt  gt  from iteration utilities import unique everseen  gt  gt  gt  lst    1 1 1 2 3 2 2 2 1 3 4    gt  gt  gt  list unique everseen lst    1  2  3  4    Timings  I did some timings  Python 3 6  and these show that it s faster than all other alternatives I tested  including OrderedDict fromkeys  f7 and more itertools unique everseen    matplotlib notebook  from iteration utilities import unique everseen from collections import OrderedDict from more itertools import unique everseen as mi unique everseen  def f7 seq       seen   set       seen add   seen add     return  x for x in seq if not  x in seen or seen add x     def iteration utilities unique everseen seq       return list unique everseen seq    def more itertools unique everseen seq       return list mi unique everseen seq    def odict seq       return list OrderedDict fromkeys seq    from simple benchmark import benchmark  b   benchmark  f7  iteration utilities unique everseen  more itertools unique everseen  odict                  2  i  list range 2  i   for i in range 1  20                   list size  no duplicates    b plot       And just to make sure I also did a test with more duplicates just to check if it makes a difference   import random  b   benchmark  f7  iteration utilities unique everseen  more itertools unique everseen  odict                  2  i   random randint 0  2   i-1   for   in range 2  i   for i in range 1  20                   list size  lots of duplicates    b plot       And one containing only one value   b   benchmark  f7  iteration utilities unique everseen  more itertools unique everseen  odict                  2  i   1   2  i  for i in range 1  20                   list size  only duplicates    b plot       In all of these cases the iteration utilities unique everseen function is the fastest  on my computer      This iteration utilities unique everseen function can also handle unhashable values in the input  however with an O n n  performance instead of the O n  performance when the values are hashable     gt  gt  gt  lst     1    1    2    1    3     gt  gt  gt  list unique everseen lst     1    2    3       1 Disclaimer  I m the author of that package

User · Answer

Not to kick a dead horse  this question is very old and already has lots of good answers   but here is a solution using pandas that is quite fast in many circumstances and is dead simple to use     import pandas as pd  my list    0  1  2  3  4  1  2  3  5    gt  gt  gt  pd Series my list  drop duplicates   tolist     Output     0  1  2  3  4  5

User · Answer

from itertools import groupby   key for key   in groupby sortedList     The list doesn t even have to be sorted  the sufficient condition is that equal values are grouped together   Edit  I assumed that  preserving order  implies that the list is actually ordered  If this is not the case  then the solution from MizardX is the right one   Community edit  This is however the most elegant way to  compress duplicate consecutive elements into a single element

User · Answer

You could do a sort of ugly list comprehension hack    l i  for i in range len l   if l index l i      i

User · Answer

Here you have some alternatives  http   www peterbe com plog uniqifiers-benchmark  Fastest one   def f7 seq       seen   set       seen add   seen add     return  x for x in seq if not  x in seen or seen add x      Why assign seen add to seen add instead of just calling seen add  Python is a dynamic language  and resolving seen add each iteration is more costly than resolving a local variable  seen add could have changed between iterations  and the runtime isn t smart enough to rule that out  To play it safe  it has to check the object each time   If you plan on using this function a lot on the same dataset  perhaps you would be better off with an ordered set  http   code activestate com recipes 528878   O 1  insertion  deletion and member-check per operation    Small additional note  seen add   always returns None  so the or above is there only as a way to attempt a set update  and not as an integral part of the logical test

User · Answer

sequence     1    2    3    3    6    4    5    6   unique       unique append item  for item in sequence if item not in unique    unique     1    2    3    6    4    5

User · Answer

In CPython 3 6   and all other Python implementations starting with Python 3 7    dictionaries are ordered  so the way to remove duplicates from an iterable while keeping it in the original order is   gt  gt  gt  list dict fromkeys  abracadabra      a    b    r    c    d    In Python 3 5 and below  including Python 2 7   use the OrderedDict  My timings show that this is now both the fastest and shortest of the various approaches for Python 3 5   gt  gt  gt  from collections import OrderedDict  gt  gt  gt  list OrderedDict fromkeys  abracadabra      a    b    r    c    d

User · Answer

zmk s approach uses list comprehension which is very fast  yet keeps the order naturally  For applying to case sensitive strings it can be easily modified  This also preserves the original case   def DelDupes aseq        seen   set       return  x for x in aseq if  x lower   not in seen  and  not seen add x lower        Closely associated functions are   def HasDupes aseq        s   set       return any   x lower   in s  or s add x lower     for x in aseq   def GetDupes aseq        s   set       return set x for x in aseq if   x lower   in s  or s add x lower

User · Answer

Here you have some alternatives  http   www peterbe com plog uniqifiers-benchmark  Fastest one   def f7 seq       seen   set       seen add   seen add     return  x for x in seq if not  x in seen or seen add x      Why assign seen add to seen add instead of just calling seen add  Python is a dynamic language  and resolving seen add each iteration is more costly than resolving a local variable  seen add could have changed between iterations  and the runtime isn t smart enough to rule that out  To play it safe  it has to check the object each time   If you plan on using this function a lot on the same dataset  perhaps you would be better off with an ordered set  http   code activestate com recipes 528878   O 1  insertion  deletion and member-check per operation    Small additional note  seen add   always returns None  so the or above is there only as a way to attempt a set update  and not as an integral part of the logical test

User · Answer

For another very late answer to another very old question   The itertools recipes have a function that does this  using the seen set technique  but    Handles a standard key function  Uses no unseemly hacks  Optimizes the loop by pre-binding seen add instead of looking it up N times   f7 also does this  but some versions don t   Optimizes the loop by using ifilterfalse  so you only have to loop over the unique elements in Python  instead of all of them   You still iterate over all of them inside ifilterfalse  of course  but that s in C  and much faster     Is it actually faster than f7  It depends on your data  so you ll have to test it and see  If you want a list in the end  f7 uses a listcomp  and there s no way to do that here   You can directly append instead of yielding  or you can feed the generator into the list function  but neither one can be as fast as the LIST APPEND inside a listcomp   At any rate  usually  squeezing out a few microseconds is not going to be as important as having an easily-understandable  reusable  already-written function that doesn t require DSU when you want to decorate   As with all of the recipes  it s also available in more-iterools   If you just want the no-key case  you can simplify it as   def unique iterable       seen   set       seen add   seen add     for element in itertools ifilterfalse seen   contains    iterable           seen add element          yield element

User · Answer

For another very late answer to another very old question   The itertools recipes have a function that does this  using the seen set technique  but    Handles a standard key function  Uses no unseemly hacks  Optimizes the loop by pre-binding seen add instead of looking it up N times   f7 also does this  but some versions don t   Optimizes the loop by using ifilterfalse  so you only have to loop over the unique elements in Python  instead of all of them   You still iterate over all of them inside ifilterfalse  of course  but that s in C  and much faster     Is it actually faster than f7  It depends on your data  so you ll have to test it and see  If you want a list in the end  f7 uses a listcomp  and there s no way to do that here   You can directly append instead of yielding  or you can feed the generator into the list function  but neither one can be as fast as the LIST APPEND inside a listcomp   At any rate  usually  squeezing out a few microseconds is not going to be as important as having an easily-understandable  reusable  already-written function that doesn t require DSU when you want to decorate   As with all of the recipes  it s also available in more-iterools   If you just want the no-key case  you can simplify it as   def unique iterable       seen   set       seen add   seen add     for element in itertools ifilterfalse seen   contains    iterable           seen add element          yield element

User · Answer

Eliminating the duplicate values in a sequence  but preserve the order of the remaining items  Use of general purpose generator function     for hashable sequence def remove duplicates items       seen   set       for item in items          if item not in seen              yield item             seen add item   a    1  5  2  1  9  1  5  10  list remove duplicates a      1  5  2  9  10       for unhashable sequence def remove duplicates items  key None       seen   set       for item in items          val   item if key is None else key item          if val not in seen              yield item             seen add val   a       x   1   y   2     x   1   y   3     x   1   y   2     x   2   y   4   list remove duplicates a  key lambda d   d  x   d  y           x   1   y   2     x   1   y   3     x   2   y   4

User · Answer

Here you have some alternatives  http   www peterbe com plog uniqifiers-benchmark  Fastest one   def f7 seq       seen   set       seen add   seen add     return  x for x in seq if not  x in seen or seen add x      Why assign seen add to seen add instead of just calling seen add  Python is a dynamic language  and resolving seen add each iteration is more costly than resolving a local variable  seen add could have changed between iterations  and the runtime isn t smart enough to rule that out  To play it safe  it has to check the object each time   If you plan on using this function a lot on the same dataset  perhaps you would be better off with an ordered set  http   code activestate com recipes 528878   O 1  insertion  deletion and member-check per operation    Small additional note  seen add   always returns None  so the or above is there only as a way to attempt a set update  and not as an integral part of the logical test

User · Answer

Relatively effective approach with  sorted  a numpy arrays   b   np array  1 3 3  8  12  12 12       numpy hstack  b 0    x 0  for x in zip b 1    b  -1   if x 0   x 1       Outputs   array   1   3   8  12

User · Answer

5 x faster reduce variant but more sophisticated   gt  gt  gt  l    5  6  6  1  1  2  2  3  4   gt  gt  gt  reduce lambda r  v  v in r 1  and r or  r 0  append v  or r 1  add v   or r  l       set     0   5  6  1  2  3  4    Explanation   default    list    set      use list to keep order   use set to make lookup faster  def reducer result  item       if item not in result 1           result 0  append item          result 1  add item      return result   gt  gt  gt  reduce reducer  l  default  0   5  6  1  2  3  4

User · Answer

def remove duplicates thenSort             t     b    c    d   d   a   c   c            t2               for i k in enumerate t                 index   t index k                if i    index                   t2 append t i            return sorted t2        print remove duplicates thenSort

User · Answer

Credit to  wjandrea for dict fromdict method idea  def solve arr        return list dict fromkeys arr   -1      -1   This will reverse input and output to iterate properly

User · Answer

pandas users should check out pandas unique    gt  gt  gt  import pandas as pd  gt  gt  gt  lst    1  2  1  3  3  2  4   gt  gt  gt  pd unique lst  array  1  2  3  4     The function returns a NumPy array  If needed  you can convert it to a list with the tolist method

User · Answer

Here you have some alternatives  http   www peterbe com plog uniqifiers-benchmark  Fastest one   def f7 seq       seen   set       seen add   seen add     return  x for x in seq if not  x in seen or seen add x      Why assign seen add to seen add instead of just calling seen add  Python is a dynamic language  and resolving seen add each iteration is more costly than resolving a local variable  seen add could have changed between iterations  and the runtime isn t smart enough to rule that out  To play it safe  it has to check the object each time   If you plan on using this function a lot on the same dataset  perhaps you would be better off with an ordered set  http   code activestate com recipes 528878   O 1  insertion  deletion and member-check per operation    Small additional note  seen add   always returns None  so the or above is there only as a way to attempt a set update  and not as an integral part of the logical test

User · Answer

Edit 2020 As of CPython PyPy 3 6  and as a language guarantee in 3 7   plain dict is insertion ordered  and even more efficient than the  also C implemented  collections OrderedDict  So the fastest solution  by far  is also the simplest   gt  gt  gt  items    1  2  0  1  3  2   gt  gt  gt  list dict fromkeys items    1  2  0  3   Like list set items   this pushes all the work to the C layer  on CPython   but since dicts are insertion ordered  dict fromkeys doesn t lose ordering  It s slower than list set items    takes 50-100  longer typically   but much faster than any other order-preserving solution  takes about half the time of hacks involving use of sets in a listcomp   Edit 2016 As Raymond pointed out  in python 3 5  where OrderedDict is implemented in C  the list comprehension approach will be slower than OrderedDict  unless you actually need the list at the end - and even then  only if the input is very short   So the best solution for 3 5  is OrderedDict  Important Edit 2015 As  abarnert notes  the more itertools library  pip install more itertools  contains a unique everseen function that is built to solve this problem without any unreadable  not seen add  mutations in list comprehensions  This is also the fastest solution too   gt  gt  gt  from  more itertools import unique everseen  gt  gt  gt  items    1  2  0  1  3  2   gt  gt  gt  list unique everseen items    1  2  0  3   Just one simple library import and no hacks  This comes from an implementation of the itertools recipe unique everseen which looks like  def unique everseen iterable  key None        quot List unique elements  preserving order  Remember all elements ever seen  quot        unique everseen  AAAABBBCCDAABBB   -- gt  A B C D       unique everseen  ABBCcAD   str lower  -- gt  A B C D     seen   set       seen add   seen add     if key is None          for element in filterfalse seen   contains    iterable               seen add element              yield element     else          for element in iterable              k   key element              if k not in seen                  seen add k                  yield element   In Python 2 7  the accepted common idiom  which works but isn t optimized for speed  I would now use unique everseen  for this uses collections OrderedDict  Runtime  O N   gt  gt  gt  from collections import OrderedDict  gt  gt  gt  items    1  2  0  1  3  2   gt  gt  gt  list OrderedDict fromkeys items    1  2  0  3   This looks much nicer than  seen   set    x for x in seq if x not in seen and not seen add x    and doesn t utilize the ugly hack  not seen add x   which relies on the fact that set add is an in-place method that always returns None so not None evaluates to True  Note however that the hack solution is faster in raw speed though it has the same runtime complexity O N

User · Answer

If you need one liner then maybe this would help   reduce lambda x  y  x   y if y 0  not in x else x  map lambda x   x  lst         should work but correct me if i m wrong

User · Answer

Eliminating the duplicate values in a sequence  but preserve the order of the remaining items  Use of general purpose generator function     for hashable sequence def remove duplicates items       seen   set       for item in items          if item not in seen              yield item             seen add item   a    1  5  2  1  9  1  5  10  list remove duplicates a      1  5  2  9  10       for unhashable sequence def remove duplicates items  key None       seen   set       for item in items          val   item if key is None else key item          if val not in seen              yield item             seen add val   a       x   1   y   2     x   1   y   3     x   1   y   2     x   2   y   4   list remove duplicates a  key lambda d   d  x   d  y           x   1   y   2     x   1   y   3     x   2   y   4

User · Answer

l    1 2 2 3 3      n      n extend ele for ele in l if ele not in set n     A generator expression that uses the O 1  look up of a set to determine whether or not to include an element in the new list

User · Answer

You could do a sort of ugly list comprehension hack    l i  for i in range len l   if l index l i      i

User · Answer

Borrowing the recursive idea used in definining Haskell s nub function for lists  this would be a recursive approach   def unique lst       return    if lst     else  lst 0     unique filter lambda x  x   lst 0   lst 1       e g    In  118   unique  1 5 1 1 4 3 4   Out 118    1  5  4  3    I tried it for growing data sizes and saw sub-linear time-complexity  not definitive  but suggests this should be fine for normal data    In  122    timeit unique np random randint 5  size  1    10000 loops  best of 3  25 3 us per loop  In  123    timeit unique np random randint 5  size  10    10000 loops  best of 3  42 9 us per loop  In  124    timeit unique np random randint 5  size  100    10000 loops  best of 3  132 us per loop  In  125    timeit unique np random randint 5  size  1000    1000 loops  best of 3  1 05 ms per loop  In  126    timeit unique np random randint 5  size  10000    100 loops  best of 3  11 ms per loop   I also think it s interesting that this could be readily generalized to uniqueness by other operations  Like this   import operator def unique lst  cmp op operator ne       return    if lst     else  lst 0     unique filter lambda x  cmp op x  lst 0    lst 1     cmp op    For example  you could pass in a function that uses the notion of rounding to the same integer as if it was  equality  for uniqueness purposes  like this   def test round x y       return round x     round y    then unique some list  test round  would provide the unique elements of the list where uniqueness no longer meant traditional equality  which is implied by using any sort of set-based or dict-key-based approach to this problem  but instead meant to take only the first element that rounds to K for each possible integer K that the elements might round to  e g    In  6   unique  1 2  5  1 9  1 1  4 2  3  4 8   test round  Out 6    1 2  5  1 9  4 2  3

User · Answer

Just to add another  very performant  implementation of such a functionality from an external module1  iteration utilities unique everseen    gt  gt  gt  from iteration utilities import unique everseen  gt  gt  gt  lst    1 1 1 2 3 2 2 2 1 3 4    gt  gt  gt  list unique everseen lst    1  2  3  4    Timings  I did some timings  Python 3 6  and these show that it s faster than all other alternatives I tested  including OrderedDict fromkeys  f7 and more itertools unique everseen    matplotlib notebook  from iteration utilities import unique everseen from collections import OrderedDict from more itertools import unique everseen as mi unique everseen  def f7 seq       seen   set       seen add   seen add     return  x for x in seq if not  x in seen or seen add x     def iteration utilities unique everseen seq       return list unique everseen seq    def more itertools unique everseen seq       return list mi unique everseen seq    def odict seq       return list OrderedDict fromkeys seq    from simple benchmark import benchmark  b   benchmark  f7  iteration utilities unique everseen  more itertools unique everseen  odict                  2  i  list range 2  i   for i in range 1  20                   list size  no duplicates    b plot       And just to make sure I also did a test with more duplicates just to check if it makes a difference   import random  b   benchmark  f7  iteration utilities unique everseen  more itertools unique everseen  odict                  2  i   random randint 0  2   i-1   for   in range 2  i   for i in range 1  20                   list size  lots of duplicates    b plot       And one containing only one value   b   benchmark  f7  iteration utilities unique everseen  more itertools unique everseen  odict                  2  i   1   2  i  for i in range 1  20                   list size  only duplicates    b plot       In all of these cases the iteration utilities unique everseen function is the fastest  on my computer      This iteration utilities unique everseen function can also handle unhashable values in the input  however with an O n n  performance instead of the O n  performance when the values are hashable     gt  gt  gt  lst     1    1    2    1    3     gt  gt  gt  list unique everseen lst     1    2    3       1 Disclaimer  I m the author of that package

User · Answer

sequence     1    2    3    3    6    4    5    6   unique       unique append item  for item in sequence if item not in unique    unique     1    2    3    6    4    5

User · Answer

In Python 3 7 and above  dictionaries are guaranteed to remember their key insertion order  The answer to this question summarizes the current state of affairs   The OrderedDict solution thus becomes obsolete and without any import statements we can simply issue    gt  gt  gt  lst    1  2  1  3  3  2  4   gt  gt  gt  list dict fromkeys lst    1  2  3  4

User · Answer

this will preserve order and run in O n  time  basically the idea is to create a hole wherever there is a duplicate found and sink it down to the bottom  makes use of a read and write pointer  whenever a duplicate is found only the read pointer advances and write pointer stays on the duplicate entry to overwrite it   def deduplicate l       count           read write     0 0      while read  lt  len l           if l read  in count              read    1             continue         count l read     True         l write    l read          read    1         write    1     return l 0 write

User · Answer

You can reference a list comprehension as it is being built by the symbol    1    For example  the following function unique-ifies a list of elements without changing their order by referencing its list comprehension   def unique my list        return  x for x in my list if x not in locals      1       Demo   l1    1  2  3  4  1  2  3  4  5  l2    x for x in l1 if x not in locals      1     print l2   Output    1  2  3  4  5

User · Answer

here is a simple way to do it   list1     hello         w    o    r    l    d   sorted set list1    key lambda x list1 index x     that gives the output     hello         w    o    r    l    d

User · Answer

A solution without using imported modules or sets   text    ask not what your country can do for you ask what you can do for your country  sentence   text split      noduplicates     sentence i   for i in range  0 len sentence   if sentence i  not in sentence  i   print noduplicates    Gives output     ask    not    what    your    country    can    do    for    you

User · Answer

from itertools import groupby   key for key   in groupby sortedList     The list doesn t even have to be sorted  the sufficient condition is that equal values are grouped together   Edit  I assumed that  preserving order  implies that the list is actually ordered  If this is not the case  then the solution from MizardX is the right one   Community edit  This is however the most elegant way to  compress duplicate consecutive elements into a single element

User · Answer

x    1  2  1  3  1  4     brute force method arr      for i in x    if not i in arr      arr insert x i  i     recursive method tmp      def remove duplicates j 0       if j  lt  len x         if not x j  in tmp          tmp append x j         i   j 1         remove duplicates i           remove duplicates

User · Answer

5 x faster reduce variant but more sophisticated   gt  gt  gt  l    5  6  6  1  1  2  2  3  4   gt  gt  gt  reduce lambda r  v  v in r 1  and r or  r 0  append v  or r 1  add v   or r  l       set     0   5  6  1  2  3  4    Explanation   default    list    set      use list to keep order   use set to make lookup faster  def reducer result  item       if item not in result 1           result 0  append item          result 1  add item      return result   gt  gt  gt  reduce reducer  l  default  0   5  6  1  2  3  4

User · Answer

MizardX s answer gives a good collection of multiple approaches   This is what I came up with while thinking aloud   mylist    x for i x in enumerate mylist  if x not in mylist i 1

User · Answer

If you routinely use pandas  and aesthetics is preferred over performance  then consider the built-in function pandas Series drop duplicates       import pandas as pd     import numpy as np      uniquifier   lambda alist  pd Series alist  drop duplicates   tolist          from the chosen answer      def f7 seq           seen   set           seen add   seen add         return   x for x in seq if not  x in seen or seen add x         alist   np random randint low 0  high 1000  size 10000  tolist        print uniquifier alist     f7 alist     True   Timing        In  104    timeit f7 alist      1000 loops  best of 3  1 3 ms per loop     In  110    timeit uniquifier alist      100 loops  best of 3  4 39 ms per loop

User · Answer

One liner list comprehension  values non duplicated    value for index  value in enumerate values  if value not in values    index

User · Answer

Edit 2020 As of CPython PyPy 3 6  and as a language guarantee in 3 7   plain dict is insertion ordered  and even more efficient than the  also C implemented  collections OrderedDict  So the fastest solution  by far  is also the simplest   gt  gt  gt  items    1  2  0  1  3  2   gt  gt  gt  list dict fromkeys items    1  2  0  3   Like list set items   this pushes all the work to the C layer  on CPython   but since dicts are insertion ordered  dict fromkeys doesn t lose ordering  It s slower than list set items    takes 50-100  longer typically   but much faster than any other order-preserving solution  takes about half the time of hacks involving use of sets in a listcomp   Edit 2016 As Raymond pointed out  in python 3 5  where OrderedDict is implemented in C  the list comprehension approach will be slower than OrderedDict  unless you actually need the list at the end - and even then  only if the input is very short   So the best solution for 3 5  is OrderedDict  Important Edit 2015 As  abarnert notes  the more itertools library  pip install more itertools  contains a unique everseen function that is built to solve this problem without any unreadable  not seen add  mutations in list comprehensions  This is also the fastest solution too   gt  gt  gt  from  more itertools import unique everseen  gt  gt  gt  items    1  2  0  1  3  2   gt  gt  gt  list unique everseen items    1  2  0  3   Just one simple library import and no hacks  This comes from an implementation of the itertools recipe unique everseen which looks like  def unique everseen iterable  key None        quot List unique elements  preserving order  Remember all elements ever seen  quot        unique everseen  AAAABBBCCDAABBB   -- gt  A B C D       unique everseen  ABBCcAD   str lower  -- gt  A B C D     seen   set       seen add   seen add     if key is None          for element in filterfalse seen   contains    iterable               seen add element              yield element     else          for element in iterable              k   key element              if k not in seen                  seen add k                  yield element   In Python 2 7  the accepted common idiom  which works but isn t optimized for speed  I would now use unique everseen  for this uses collections OrderedDict  Runtime  O N   gt  gt  gt  from collections import OrderedDict  gt  gt  gt  items    1  2  0  1  3  2   gt  gt  gt  list OrderedDict fromkeys items    1  2  0  3   This looks much nicer than  seen   set    x for x in seq if x not in seen and not seen add x    and doesn t utilize the ugly hack  not seen add x   which relies on the fact that set add is an in-place method that always returns None so not None evaluates to True  Note however that the hack solution is faster in raw speed though it has the same runtime complexity O N

User · Answer

MizardX s answer gives a good collection of multiple approaches   This is what I came up with while thinking aloud   mylist    x for i x in enumerate mylist  if x not in mylist i 1

User · Answer

If you routinely use pandas  and aesthetics is preferred over performance  then consider the built-in function pandas Series drop duplicates       import pandas as pd     import numpy as np      uniquifier   lambda alist  pd Series alist  drop duplicates   tolist          from the chosen answer      def f7 seq           seen   set           seen add   seen add         return   x for x in seq if not  x in seen or seen add x         alist   np random randint low 0  high 1000  size 10000  tolist        print uniquifier alist     f7 alist     True   Timing        In  104    timeit f7 alist      1000 loops  best of 3  1 3 ms per loop     In  110    timeit uniquifier alist      100 loops  best of 3  4 39 ms per loop

User · Answer

An in-place method  This method is quadratic  because we have a linear lookup into the list for every element of the list  to that we have to add the cost of rearranging the list because of the del s    That said  it is possible to operate in place if we start from the end of the list and proceed toward the origin removing each term that is present in the sub-list at its left  This idea in code is simply  for i in range len l -1 0 -1        if l i  in l  i   del l i       A simple test of the implementation  In  91   from random import randint  seed                                                                                             In  92   seed  20080808     l    randint 1 6  for   in range 12     Beijing Olympics                                                                  In  93   for i in range len l -1 0 -1                 print l                print i  l i   l  i   end                   if l i  in l  i                      print     remove   l i                      del l i                else                     print             print l   6  5  1  4  6  1  6  2  2  4  5  2  11 2  6  5  1  4  6  1  6  2  2  4  5   remove 2  6  5  1  4  6  1  6  2  2  4  5  10 5  6  5  1  4  6  1  6  2  2  4   remove 5  6  5  1  4  6  1  6  2  2  4  9 4  6  5  1  4  6  1  6  2  2   remove 4  6  5  1  4  6  1  6  2  2  8 2  6  5  1  4  6  1  6  2   remove 2  6  5  1  4  6  1  6  2  7 2  6  5  1  4  6  1  6   6  5  1  4  6  1  6  2  6 6  6  5  1  4  6  1   remove 6  6  5  1  4  6  1  2  5 1  6  5  1  4  6   remove 1  6  5  1  4  6  2  4 6  6  5  1  4   remove 6  6  5  1  4  2  3 4  6  5  1   6  5  1  4  2  2 1  6  5   6  5  1  4  2  1 5  6   6  5  1  4  2   In  94

User · Answer

x    1  2  1  3  1  4     brute force method arr      for i in x    if not i in arr      arr insert x i  i     recursive method tmp      def remove duplicates j 0       if j  lt  len x         if not x j  in tmp          tmp append x j         i   j 1         remove duplicates i           remove duplicates

[python] How do you remove duplicates from a list whilst preserving order?

Examples related to python

Examples related to list

Examples related to duplicates

Examples related to unique