How do I find the duplicates in a list and create another list with them

Question

How can I find the duplicates in a Python list and create another list of the duplicates  The list only contains integers

User · Answer

You can use iteration utilities duplicates    gt  gt  gt  from iteration utilities import duplicates   gt  gt  gt  list duplicates  1 1 2 1 2 3 4 2     1  1  2  2    or if you only want one of each duplicate this can be combined with iteration utilities unique everseen    gt  gt  gt  from iteration utilities import unique everseen   gt  gt  gt  list unique everseen duplicates  1 1 2 1 2 3 4 2      1  2    It can also handle unhashable elements  however at the cost of performance     gt  gt  gt  list duplicates   1    2    1    3    1       1    1     gt  gt  gt  list unique everseen duplicates   1    2    1    3    1        1     That s something that only a few of the other approaches here can handle   Benchmarks  I did a quick benchmark containing most  but not all  of the approaches mentioned here   The first benchmark included only a small range of list-lengths because some approaches have O n  2  behavior   In the graphs the y-axis represents the time  so a lower value means better  It s also plotted log-log so the wide range of values can be visualized better     Removing the O n  2  approaches I did another benchmark up to half a million elements in a list     As you can see the iteration utilities duplicates approach is faster than any of the other approaches and even chaining unique everseen duplicates       was faster or equally fast than the other approaches   One additional interesting thing to note here is that the pandas approaches are very slow for small lists but can easily compete for longer lists   However as these benchmarks show most of the approaches perform roughly equally  so it doesn t matter much which one is used  except for the 3 that had O n  2  runtime     from iteration utilities import duplicates  unique everseen from collections import Counter import pandas as pd import itertools  def georg counter it       return  item for item  count in Counter it  items   if count  gt  1   def georg set it       seen   set       uniq          for x in it          if x not in seen              uniq append x              seen add x   def georg set2 it       seen   set       return  x for x in it if x not in seen and not seen add x       def georg set3 it       seen          dupes           for x in it          if x not in seen              seen x    1         else              if seen x     1                  dupes append x              seen x     1  def RiteshKumar count l       return set  x for x in l if l count x   gt  1    def moooeeeep seq       seen   set       seen add   seen add       adds all elements it doesn t know yet to seen and all other to seen twice     seen twice   set  x for x in seq if x in seen or seen add x          turn the set into a list  as requested      return list  seen twice    def F1Rumors implementation c       a  b   itertools tee sorted c       next b  None      r   None     for k  g in zip a  b           if k    g  continue         if k    r              yield k             r   k  def F1Rumors c       return list F1Rumors implementation c    def Edward a       d          for elem in a          if elem in d              d elem     1         else              d elem    1     return  x for x  y in d items   if y  gt  1   def wordsmith a       return pd Series a  pd Series a  duplicated    values  def NikhilPrabhu li       li   li copy       for x in set li           li remove x       return list set li    def firelynx a       vc   pd Series a  value counts       return vc vc  gt  1  index tolist    def HenryDev myList       newList   set        for i in myList          if myList count i   gt   2              newList add i       return list newList   def yota number lst       seen set   set       duplicate set   set x for x in number lst if x in seen set or seen set add x       return seen set - duplicate set  def IgorVishnevskiy l       s set l      d        for x in l          if x in s              s remove x          else              d append x      return d  def it duplicates l       return list duplicates l    def it unique duplicates l       return list unique everseen duplicates l      Benchmark 1  from simple benchmark import benchmark import random  funcs         georg counter  georg set  georg set2  georg set3  RiteshKumar count  moooeeeep       F1Rumors  Edward  wordsmith  NikhilPrabhu  firelynx      HenryDev  yota  IgorVishnevskiy  it duplicates  it unique duplicates    args    2  i   random randint 0  2   i-1   for   in range 2  i   for i in range 2  12    b   benchmark funcs  args   list size    b plot     Benchmark 2  funcs         georg counter  georg set  georg set2  georg set3  moooeeeep       F1Rumors  Edward  wordsmith  firelynx      yota  IgorVishnevskiy  it duplicates  it unique duplicates    args    2  i   random randint 0  2   i-1   for   in range 2  i   for i in range 2  20    b   benchmark funcs  args   list size   b plot     Disclaimer  1 This is from a third-party library I have written  iteration utilities

User · Answer

the third example of the accepted answer give an erroneous answer and does not attempt to give duplicates  Here is the correct version    number lst    1  1  2  3  5        seen set   set   duplicate set   set x for x in number lst if x in seen set or seen set add x   unique set   seen set - duplicate set

User · Answer

def removeduplicates a     seen   set      for i in a      if i not in seen        seen add i    return seen   print removeduplicates  1 1 2 2

User · Answer

I guess the most effective way to find duplicates in a list is  from collections import Counter  def duplicates values       dups   Counter values  - Counter set values       return list dups keys     print duplicates  1 2 3 6 5 2     It uses Counter once on all the elements  and then on all unique elements  Subtracting the first one with the second will leave out the duplicates only

User · Answer

I came across this question whilst looking in to something related - and wonder why no-one offered a generator based solution   Solving this problem would be    gt  gt  gt  print list getDupes 9  1 2 3 2 1 5 6 5 5 5     1  2  5    I was concerned with scalability  so tested several approaches  including naive items that work well on small lists  but scale horribly as lists get larger  note- would have been better to use timeit  but this is illustrative    I included  moooeeeep for comparison  it is impressively fast  fastest if the input list is completely random  and an itertools approach that is even faster again for mostly sorted lists     Now includes pandas approach from  firelynx -- slow  but not horribly so  and simple  Note - sort tee zip approach is consistently fastest on my machine for large mostly ordered lists  moooeeeep is fastest for shuffled lists  but your mileage may vary   Advantages   very quick simple to test for  any  duplicates using the same code   Assumptions   Duplicates should be reported once only Duplicate order does not need to be preserved Duplicate might be anywhere in the list     Fastest solution  1m entries   def getDupes c              sort tee izip            a  b   itertools tee sorted c           next b  None          r   None         for k  g in itertools izip a  b               if k    g  continue             if k    r                  yield k                 r   k     Approaches tested  import itertools import time import random  def getDupes 1 c          naive        for i in xrange 0  len c            if c i  in c  i               yield c i   def getDupes 2 c          set len change        s   set       for i in c          l   len s          s add i          if len s     l              yield i  def getDupes 3 c          in dict        d          for i in c          if i in d              if d i                   yield i                 d i    False         else              d i    True  def getDupes 4 c          in set        s r   set   set       for i in c          if i not in s              s add i          elif i not in r              r add i              yield i  def getDupes 5 c          sort adjacent        c   sorted c      r   None     for i in xrange 1  len c            if c i     c i - 1               if c i     r                  yield c i                  r   c i   def getDupes 6 c          sort groupby        def multiple x           try              x next               x next               return True         except              return False     for k  g in itertools ifilter lambda x  multiple x 1    itertools groupby sorted c             yield k  def getDupes 7 c          sort zip        c   sorted c      r   None     for k  g in zip c  -1  c 1             if k    g              if k    r                  yield k                 r   k  def getDupes 8 c          sort izip        c   sorted c      r   None     for k  g in itertools izip c  -1  c 1             if k    g              if k    r                  yield k                 r   k  def getDupes 9 c          sort tee izip        a  b   itertools tee sorted c       next b  None      r   None     for k  g in itertools izip a  b           if k    g  continue         if k    r              yield k             r   k  def getDupes a l          moooeeeep        seen   set       seen add   seen add       adds all elements it doesn t know yet to seen and all other to seen twice     for x in l          if x in seen or seen add x               yield x  def getDupes b x          iter  sorted        x   sorted x      def  matches            for k g in itertools izip x  -1  x 1                 if k    g                  yield k     for k  n in itertools groupby  matches             yield k  def getDupes c a          pandas        import pandas as pd     vc   pd Series a  value counts       i   vc vc  gt  1  index     for   in i          yield    def hasDupes fn c       try          if fn c  next    return True      Found a dupe     except StopIteration          pass     return False  def getDupes fn c       return list fn c    STABLE   True if STABLE      print  Finding FIRST then ALL duplicates  single dupe of  nth  placed element in 1m element array  else      print  Finding FIRST then ALL duplicates  single dupe of  n  included in randomised 1m element array  for location in  50 250000 500000 750000 999999       for test in  getDupes 2  getDupes 3  getDupes 4  getDupes 5  getDupes 6                   getDupes 8  getDupes 9  getDupes a  getDupes b  getDupes c           print  Test  -15s  10d -    test   doc   or test   name   location           deltas              for FIRST in  True False               for i in xrange 0  5                   c   range 0 1000000                  if STABLE                      c 0    location                 else                      c append location                      random shuffle c                  start   time time                   if FIRST                      print     if location    test c  next   else                      else                      print     if  location     list test c   else                      deltas append time time  -start              print   --  0 3f     sum deltas  len deltas            print     print   The results for the  all dupes  test were consistent  finding  first  duplicate then  all  duplicates in this array   Finding FIRST then ALL duplicates  single dupe of  nth  placed element in 1m element array Test set len change      500000 -             -- 0 264              -- 0 402   Test in dict             500000 -             -- 0 163              -- 0 250   Test in set              500000 -             -- 0 163              -- 0 249   Test sort adjacent       500000 -             -- 0 159              -- 0 229   Test sort groupby        500000 -             -- 0 860              -- 1 286   Test sort izip           500000 -             -- 0 165              -- 0 229   Test sort tee izip       500000 -             -- 0 145              -- 0 206    Test moooeeeep           500000 -             -- 0 149              -- 0 232   Test iter  sorted        500000 -             -- 0 160              -- 0 221   Test pandas              500000 -             -- 0 493              -- 0 499     When the lists are shuffled first  the price of the sort becomes apparent - the efficiency drops noticeably and the  moooeeeep approach dominates  with set  amp  dict approaches being similar but lessor performers   Finding FIRST then ALL duplicates  single dupe of  n  included in randomised 1m element array Test set len change      500000 -             -- 0 321              -- 0 473   Test in dict             500000 -             -- 0 285              -- 0 360   Test in set              500000 -             -- 0 309              -- 0 365   Test sort adjacent       500000 -             -- 0 756              -- 0 823   Test sort groupby        500000 -             -- 1 459              -- 1 896   Test sort izip           500000 -             -- 0 786              -- 0 845   Test sort tee izip       500000 -             -- 0 743              -- 0 804   Test moooeeeep           500000 -             -- 0 234              -- 0 311    Test iter  sorted        500000 -             -- 0 776              -- 0 840   Test pandas              500000 -             -- 0 539              -- 0 540

User · Answer

Some other tests  Of course to do     set  x for x in l if l count x   gt  1        is too costly  It s about 500 times faster  the more long array gives better results  to use the next final method   def dups count dict l       d           for item in l          if item not in d              d item    0          d item     1      result d    key  val for key  val in d iteritems   if val  gt  1       return result d keys     Only 2 loops  no very costly l count   operations   Here is a code to compare the methods for example  The code is below  here is the output   dups count  13 368s   this is a function which uses l count   dups count dict  0 014s   this is a final best function  of the 3 functions  dups count counter  0 024s   collections Counter   The testing code   import numpy as np from time import time from collections import Counter  class TimerCounter object       def   init   self           self  time sum   0      def start self           self time   time        def stop self           self  time sum    time   - self time      def get time sum self           return self  time sum   def dups count l       return set  x for x in l if l count x   gt  1     def dups count dict l       d           for item in l          if item not in d              d item    0          d item     1      result d    key  val for key  val in d iteritems   if val  gt  1       return result d keys     def dups counter l       counter   Counter l           result d    key  val for key  val in counter iteritems   if val  gt  1       return result d keys      def gen array        np random seed 17      return list np random randint 0  5000  10000     def assert equal results  results       primary result   results 0      other results   results 1        for other result in other results          assert set primary result     set other result  and len primary result     len other result    if   name         main         dups count time   TimerCounter       dups count dict time   TimerCounter       dups count counter   TimerCounter        l   gen array        for i in range 3           dups count time start           result1   dups count l          dups count time stop            dups count dict time start           result2   dups count dict l          dups count dict time stop            dups count counter start           result3   dups counter l          dups count counter stop            assert equal results result1  result2  result3       print  dups count    3f    dups count time get time sum       print  dups count dict    3f    dups count dict time get time sum       print  dups count counter    3f    dups count counter get time sum

User · Answer

I would do this with pandas  because I use pandas a lot  import pandas as pd a    1 2 3 3 3 4 5 6 6 7  vc   pd Series a  value counts   vc vc  gt  1  index tolist     Gives   3 6    Probably isn t very efficient  but it sure is less code than a lot of the other answers  so I thought I would contribute

User · Answer

When using toolz   from toolz import frequencies  valfilter  a    1 2 2 3 4 5 4   gt  gt  gt  list valfilter lambda count  count  gt  1  frequencies a   keys     2 4

User · Answer

raw list    1 2 3 3 4 5 6 6 7 2 3 4 2 3 4 1 3 4    clean list   list set raw list   duplicated items       for item in raw list      try          clean list remove item      except ValueError          duplicated items append item    print duplicated items     3  6  2  3  4  2  3  4  1  3  4    You basically remove duplicates by converting to set  clean list   then iterate the raw list  while removing each item in the clean list for occurrence in raw list  If item is not found  the raised ValueError Exception is caught and the item is added to duplicated items list    If the index of duplicated items is needed  just enumerate the list and play around with the index   for index  item in enumerate raw list    which is faster and optimised for large lists  like thousands  of elements

User · Answer

We can use itertools groupby in order to find all the items that have dups   from itertools import groupby  myList     2  4  6  8  4  6  12    when the list is sorted  groupby groups by consecutive elements which are similar for x  y in groupby sorted myList           list y  returns all the occurences of item x     if len list y    gt  1          print x     The output will be   4 6

User · Answer

Without converting to list and probably the simplest way would be something like below  This may be useful during a interview when they ask not to use sets  a  1 2 3 3 3  dup    for each in a    if each not in dup      dup append each  print dup            else to get 2 separate lists of unique values and duplicate values  a  1 2 3 3 3  uniques    dups     for each in a    if each not in uniques      uniques append each    else      dups append each  print  Unique values are below    print uniques  print  Duplicate values are below    print dups

User · Answer

use of list count   method in the list to find out the duplicate elements of a given list  arr    dup     for i in range int input  Enter range of list            arr append int input  Enter Element in a list       for i in arr      if arr count i  gt 1 and i not in dup          dup append i  print dup

User · Answer

list2    1  2  3  4  1  2  3  lset   set     lset add item   list2 append item    for item in list2 if item not in lset  print list lset

User · Answer

one-liner  for fun  and where a single statement is required    lambda iterable  reduce lambda  uniq  dup   item   uniq  dup    item   if item in uniq else  uniq    item   dup   iterable   set    set      some iterable

User · Answer

collections Counter is new in python 2 7    Python 2 5 4  r254 67916  May 31 2010  15 03 39    GCC 4 1 2 20080704  Red Hat 4 1 2-46   on linux2 a    1 2 3 2 1 5 6 5 5 5  import collections print  x for x  y in collections Counter a  items   if y   1  Type  help    copyright    credits  or  license  for more information    File     line 1  in  AttributeError   module  object has no attribute  Counter         In an earlier version you can use a conventional dict instead   a    1 2 3 2 1 5 6 5 5 5  d      for elem in a      if elem in d          d elem     1     else          d elem    1  print  x for x  y in d items   if y  gt  1

User · Answer

this is the way I had to do it because I challenged myself not to use other methods   def dupList oldlist       if type oldlist   type  2 2            oldlist  x for x in oldlist      newList        newList newList oldlist     oldlist oldlist     forbidden        checkPoint 0     for i in range len oldlist             print  start i   i         if i in forbidden              continue         else              for j in range len oldlist                     print  start j   j                 if j in forbidden                      continue                 else                       print  after Else                      if i  j                            print  i j   i j                          print oldlist                          print newList                         if oldlist j   oldlist i                                print  oldlist i  oldlist j    oldlist i  oldlist j                              forbidden append j                               print  forbidden   forbidden                             del newList j-checkPoint                               print newList                             checkPoint checkPoint 1     return newList   so your sample works as    gt  gt  gt a    1 2 3 3 3 4 5 6 6 7   gt  gt  gt dupList a   1  2  3  4  5  6  7

User · Answer

A bit late  but maybe helpful for some  For a largish list  I found this worked for me   l  1 2 3 5 4 1 3 1  s set l  d    for x in l      if x in s          s remove x      else          d append x  d  1 3 1    Shows just and all duplicates and preserves order

User · Answer

Very simple and quick way of finding dupes with one iteration in Python is   testList     red    blue    red    green    blue    blue    testListDict       for item in testList    try      testListDict item     1   except      testListDict item    1  print testListDict   Output will be as follows    gt  gt  gt  print testListDict   blue   3   green   1   red   2    This and more in my blog http   www howtoprogramwithpython com

User · Answer

One line solution   set  i for i in list if sum  1 for a in list if a    i    gt  1

User · Answer

You don t need the count  just whether or not the item was seen before  Adapted that answer to this problem   def list duplicates seq     seen   set     seen add   seen add     adds all elements it doesn t know yet to seen and all other to seen twice   seen twice   set  x for x in seq if x in seen or seen add x        turn the set into a list  as requested    return list  seen twice    a    1 2 3 2 1 5 6 5 5 5  list duplicates a    yields  1  2  5      Just in case speed matters  here are some timings     file  test py import collections  def thg435 l       return  x for x  y in collections Counter l  items   if y  gt  1   def moooeeeep l       seen   set       seen add   seen add       adds all elements it doesn t know yet to seen and all other to seen twice     seen twice   set  x for x in l if x in seen or seen add x          turn the set into a list  as requested      return list  seen twice    def RiteshKumar l       return list set  x for x in l if l count x   gt  1     def JohnLaRooy L       seen   set       seen2   set       seen add   seen add     seen2 add   seen2 add     for item in L          if item in seen              seen2 add item          else              seen add item      return list seen2   l    1 2 3 2 1 5 6 5 5 5  100   Here are the results   well done  JohnLaRooy      python -mtimeit -s  import test   test JohnLaRooy test l   10000 loops  best of 3  74 6 usec per loop   python -mtimeit -s  import test   test moooeeeep test l   10000 loops  best of 3  91 3 usec per loop   python -mtimeit -s  import test   test thg435 test l   1000 loops  best of 3  266 usec per loop   python -mtimeit -s  import test   test RiteshKumar test l   100 loops  best of 3  8 35 msec per loop   Interestingly  besides the timings itself  also the ranking slightly changes when pypy is used  Most interestingly  the Counter-based approach benefits hugely from pypy s optimizations  whereas the method caching approach I have suggested seems to have almost no effect     pypy -mtimeit -s  import test   test JohnLaRooy test l   100000 loops  best of 3  17 8 usec per loop   pypy -mtimeit -s  import test   test thg435 test l   10000 loops  best of 3  23 usec per loop   pypy -mtimeit -s  import test   test moooeeeep test l   10000 loops  best of 3  39 3 usec per loop   Apparantly this effect is related to the  duplicatedness  of the input data  I have set l    random randrange 1000000  for i in xrange 10000   and got these results     pypy -mtimeit -s  import test   test moooeeeep test l   1000 loops  best of 3  495 usec per loop   pypy -mtimeit -s  import test   test JohnLaRooy test l   1000 loops  best of 3  499 usec per loop   pypy -mtimeit -s  import test   test thg435 test l   1000 loops  best of 3  1 68 msec per loop

User · Answer

To remove duplicates use set a   To print duplicates  something like   a    1 2 3 2 1 5 6 5 5 5   import collections print  item for item  count in collections Counter a  items   if count  gt  1        1  2  5    Note that Counter is not particularly efficient  timings  and probably overkill here  set will perform better  This code computes a list of unique elements in the source order   seen   set   uniq      for x in a      if x not in seen          uniq append x          seen add x    or  more concisely   seen   set   uniq    x for x in a if x not in seen and not seen add x         I don t recommend the latter style  because it is not obvious what not seen add x  is doing   the set add   method always returns None  hence the need for not    To compute the list of duplicated elements without libraries   seen      dupes       for x in a      if x not in seen          seen x    1     else          if seen x     1              dupes append x          seen x     1   If list elements are not hashable  you cannot use sets dicts and have to resort to a quadratic time solution  compare each with each   For example   a     1    2    3    1    5    3    no dupes    x for n  x in enumerate a  if x not in a  n   print no dupes     1    2    3    5    dupes    x for n  x in enumerate a  if x in a  n   print dupes     1    3

User · Answer

Method 1   list set  val for idx  val in enumerate input list  if val in input list idx 1        Explanation   val for idx  val in enumerate input list  if val in input list idx 1    is a list comprehension  that returns an element  if the same element is present from it s current position  in list  the index   Example   input list    42 31 42 31 3 31 31 5 6 6 6 6 6 7 42   starting with the first element in list  42  with index 0  it checks if the element 42  is present in input list 1    i e   from index 1 till end of list  Because 42 is present in input list 1    it will return 42   Then it goes to the next element 31  with index 1  and checks if element 31 is present in the input list 2    i e   from index 2 till end of list    Because 31 is present in input list 2    it will return 31   similarly it goes through all the elements in the list  and will return only the  repeated duplicate elements into a list   Then because we have duplicates  in a list  we need to pick one of each duplicate  i e  remove duplicate among duplicates  and to do so  we do call a python built-in named set    and it removes the duplicates   Then we are left with a set  but not a list  and hence to convert from a set to list  we use  typecasting  list    and that converts the set of elements to a list   Method 2   def dupes ilist       temp list        initially  empty temporary list     dupe list        initially  empty duplicate list     for each in ilist          if each in temp list    Found a Duplicate element             if not each in dupe list    Avoid duplicate elements in dupe list                 dupe list append each    Add duplicate element to dupe list         else               temp list append each    Add a new  non-duplicate  to temp list      return dupe list   Explanation  Here We create two empty lists  to start with  Then keep traversing through all the elements of the list  to see if it exists in temp list  initially empty   If it is not there in the temp list  then we add it to the temp list  using append method   If it already exists in temp list  it means  that the current element of the list is a duplicate  and hence we need to add it to dupe list using append method

User · Answer

Here s a fast generator that uses a dict to store each element as a key with a boolean value for checking if the duplicate item has already been yielded   For lists with all elements that are hashable types   def gen dupes array       unique          for value in array          if value in unique and unique value               unique value    False             yield value         else              unique value    True  array    1  2  2  3  4  1  5  2  6  6  print list gen dupes array        gt   2  1  6    For lists that might contain lists   def gen dupes array       unique          for value in array          is list   False         if type value  is list              value   tuple value              is list   True          if value in unique and unique value               unique value    False             if is list                  value   list value               yield value         else              unique value    True  array    1  2  2   1  2   3  4   1  2   5  2  6  6  print list gen dupes array        gt   2   1  2   6

User · Answer

This seems somewhat competitive despite its O n log n  complexity  see benchmarks below  a   sorted a  dupes   list set a   2    amp  set a 1  2     Sorting brings duplicates next to each other  so they re both at an even index and at an odd index  Unique values are only at an even or at an odd index  not both  So the intersection of even-index values and odd-index values is the duplicates  Benchmark results   This uses MSeifert s benchmark  but only with the solutions from the accepted answer  the georgs   the slowest solutions  the fastest solution  excluding it duplicates as it doesn t uniquify the duplicates   and mine  Otherwise it d be too crowded and the colors too similar  First line could be a sort   if we re allowed to modify the given list  that would be a bit faster  But the benchmark re-uses the same list multiple times  so modifying it would mess with the benchmark  And apparently set a   2   intersection a 1  2   wouldn t create a second set and be a bit faster  but meh  it s also a bit longer

User · Answer

Using pandas    gt  gt  gt  import pandas as pd  gt  gt  gt  a    1  2  1  3  3  3  0   gt  gt  gt  pd Series a  pd Series a  duplicated    values array  1  3  3

User · Answer

I do not see a solution that is purely using iterators  so here we go  This requires the list to be sorted  which may be the drawback here   a    1 2 3 2 1 5 6 5 5 5  a sort   set map lambda x  x 0   filter lambda x  x 0     x 1   zip a  a 1         1  2  5    You can easily check how fast this is on your machine with a million potential duplicates with this piece of code   First generate the data  import random from itertools import chain a   list chain    n    random randint 1  2  for n in range 1000000       And run the test   set map lambda x  x 0   filter lambda x  x 0     x 1   zip a  a 1         Needless to say  this solution is only good if your list is already sorted

User · Answer

gt  gt  gt  l    1 2 3 4 4 5 5 6 1   gt  gt  gt  set  x for x in l if l count x   gt  1   set  1  4  5

User · Answer

Here s a neat and concise solution -  for x in set li       li remove x   li   list set li

User · Answer

How about simply loop through each element in the list by checking the number of occurrences  then adding them to a set which will then print the duplicates  Hope this helps someone out there   myList     2  4   6  8  4  6  12   newList   set    for i in myList      if myList count i   gt   2          newList add i   print list newList       4   6

User · Answer

Try this For check duplicates  gt  gt  gt  def checkDuplicate List       duplicate        for i in List                 checking whether the item is already present in dictionary or not                increasing count if present                initializing count to 1 if not present          duplicate i  duplicate get i 0  1      return  k for k v in duplicate items   if v gt 1    gt  gt  gt  checkDuplicate  1 2 3  quot s quot  1 2 3    1  2  3

User · Answer

I am entering much much late in to this discussion  Even though  I would like to deal with this problem with one liners   Because that s the charm of Python  if we just want to get the duplicates in to a separate list  or any collection  I would suggest to do as below Say we have a duplicated list which we can call as  target       target  1 2 3 4 4 4 3 5 6 8 4 3    Now if we want to get the duplicates we can use the one liner as below       duplicates dict set  x target count x   for x in filter lambda rec   target count rec  gt 1 target      This code will put the duplicated records as key and count as value in to the dictionary  duplicates   duplicate  dictionary will look like as below        3  3  4  4   it saying 3 is repeated 3 times and 4 is 4 times   If you just want all the records with duplicates alone in a list  its again much shorter code       duplicates filter lambda rec   target count rec  gt 1 target    Output will be        3  4  4  4  3  4  3    This works perfectly in python 2 7 x   versions

User · Answer

Using Set Function eg -  arr  1 4 2 5 2 3 4 1 4 5 2 3  arr2 list set arr   print arr2       Output -  1  2  3  4  5     Remove Duplicates using array   eg -  arr  1 4 2 5 2 3 4 1 4 5 2 3  arr3    for i in arr      if i not in arr3        arr3 append i  print arr3       Output -       1  4  2  5  3     Using Lambda Function   eg -  rem duplicate func lambda arr set arr  print rem duplicate func arr        Output -       1  2  3  4  5     Remove Duplicate Values from dictionary   eg -  dict1        car    Ford   Toyota   Ford   Toyota         brand    Mustang   Ranz   Mustang   Ranz     dict2    for key value in dict1 items        dict2 key  set value  print dict2       Output -        car     Toyota    Ford     brand     Ranz    Mustang       Symmetric difference- Remove duplicate elements   eg -  set1  1 2 4 5  set2  2 1 5 7  rem dup ele set1 symmetric difference set2  print rem dup ele       Output -       4  7

User · Answer

There are a lot of answers up here  but I think this is relatively a very readable and easy to understand approach   def get duplicates sorted list       duplicates          last   sorted list 0      for x in sorted list 1            if x    last              duplicates append x          last   x     return set duplicates    Notes    If you wish to preserve duplication count  get rid of the cast to  set  at the bottom to get the full list If you prefer to use generators  replace duplicates append x  with yield x and the return statement at the bottom  you can cast to set later

User · Answer

Python 3 8 one-liner if you don t care to write your own algorithm or use libraries  l    1 2 3 2 1 5 6 5 5 5   res     x  count  for x  g in groupby sorted l   if  count    len list g     gt  1   print res   Prints item and count    1  2    2  2    5  4    groupby takes a grouping function so you can define your groupings in different ways and return additional Tuple fields as needed

User · Answer

Use the sort   function  Duplicates can be identified by looping over it and checking l1 i     l1 i 1

[python] How do I find the duplicates in a list and create another list with them?

Benchmarks

Benchmark 1

Benchmark 2

Disclaimer

Examples related to python

Examples related to list

Examples related to duplicates