How to find and return a duplicate value in array

Question

arr is array of strings     hello    world    stack    overflow    hello    again     What would be an easy and elegant way to check if arr has duplicates  and if so  return one of them  no matter which    Examples     A    B    C    B    A          gt   A  or  B    A    B    C                    gt  nil

User · Answer

I needed to find out how many duplicates there were and what they were so I wrote a function building off of what Naveed had posted earlier   def print duplicates array    puts  Array count    array count     map        total dups   0   array each do  v      map v     map v     0     1   end    map each do  k  v      if v    1       puts    k  appears   v  times        total dups    1     end   end   puts  Total items that are duplicated    total dups   end

User · Answer

detect only finds one duplicate  find all will find them all   a     A    B    C    B    A   a find all    e  a count e   gt  1

User · Answer

a     A    B    C    B    A   a each with object Hash new 0     i hash  hash i     1  select     count  count  gt  1  keys   This is a O n  procedure   Alternatively you can do either of the following lines  Also O n  but only one iteration  a each with object Hash new 0  merge dup       x h  h  dup   lt  lt  x if  h x     1     2   dup   a inject Hash new 0  merge dup       h x  h  dup   lt  lt  x if  h x     1     2 h   dup

User · Answer

a     A    B    C    B    A   a detect   e  a count e   gt  1       I know this isn t very elegant answer  but I love it  It s beautiful one liner code  And works perfectly fine unless you need to process huge data set    Looking for faster solution  Here you go   def find one using hash map array    map        dup   nil   array each do  v      map v     map v     0     1      if map v   gt  1       dup   v       break     end   end    return dup end   It s linear  O n   but now needs to manage multiple lines-of-code  needs test cases  etc   If you need an even faster solution  maybe try C instead   And here is the gist comparing different solutions  https   gist github com naveed-ahmad 8f0b926ffccf5fbd206a1cc58ce9743e

User · Answer

You can do this in a few ways  with the first option being the fastest   ary     A    B    C    B    A    ary group by   e  e   select    k  v  v size  gt  1   map  amp  first   ary sort chunk   e  e   select    e  chunk  chunk size  gt  1   map  amp  first    And a O N 2  option  i e  less efficient    ary select   e  ary count e   gt  1   uniq

User · Answer

Ruby Array objects have a great method  select   select   item  block     new ary select   an enumerator   The first form is what interests you here   It allows you to select objects which pass a test     Ruby Array objects have another method  count   count   int count obj    int count    item  block     int   In this case  you are interested in duplicates  objects which appear more than once in the array    The appropriate test is a count obj   gt  1   If a     A    B    C    B    A    then  a select  item  a count item   gt  1  uniq   gt    A    B     You state that you only want one object   So pick one

User · Answer

1 2 3  uniq  nil    gt  true    1 2 3 3  uniq  nil    gt  false   Notice the above is destructive

User · Answer

Here are two more ways of finding a duplicate   Use a set  require  set   def find a dup using set arr    s   Set new   arr find    e   s add  e    end  find a dup using set arr      gt   hello     Use select in place of find to return an array of all duplicates   Use Array difference  class Array   def difference other      h   other each with object Hash new 0      e h  h e     1       reject    e  h e   gt  0  amp  amp  h e  -  1     end end  def find a dup using difference arr    arr difference arr uniq  first end  find a dup using difference arr      gt   hello     Drop  first to return an array of all duplicates   Both methods return nil if there are no duplicates   I proposed that Array difference be added to the Ruby core  More information is in my answer here   Benchmark  Let s compare suggested methods  First  we need an array for testing   CAPS     AAA    ZZZ   to a first 10 000  def test array nelements  ndups    arr   CAPS 0  nelements-ndups    arr   arr concat arr 0 ndups   shuffle end   and a method to run the benchmarks for different test arrays   require  fruity   def benchmark nelements  ndups    arr   test array nelements  ndups   puts   n  ndups  duplicates n        compare      Naveed     - gt   arr detect  e  arr count e   gt  1        Sergio     - gt    arr inject Hash new 0     h e  h e     1  h  find   k v  v  gt  1                           nil   first        Ryan       - gt    arr group by  e  e  find   k v  v size  gt  1                           nil   first       Chris      - gt   arr detect   e  arr rindex e     arr index e          Cary set   - gt   find a dup using set arr        Cary diff  - gt   find a dup using difference arr       end   I did not include  JjP s answer because only one duplicate is to be returned  and when his her answer is modified to do that it is the same as  Naveed s earlier answer  Nor did I include  Marin s answer  which  while posted before  Naveed s answer  returned all duplicates rather than just one  a minor point but there s no point evaluating both  as they are identical when return just one duplicate    I also modified other answers that returned all duplicates to return just the first one found  but that should have essentially no effect on performance  as they computed all duplicates before selecting one   The results for each benchmark are listed from fastest to slowest   First suppose the array contains 100 elements   benchmark 100  0  0 duplicates Running each test 64 times  Test will take about 2 seconds  Cary set is similar to Cary diff Cary diff is similar to Ryan Ryan is similar to Sergio Sergio is faster than Chris by 4x    1 0 Chris is faster than Naveed by 2x    1 0  benchmark 100  1  1 duplicates Running each test 128 times  Test will take about 2 seconds  Cary set is similar to Cary diff Cary diff is faster than Ryan by 2x    1 0 Ryan is similar to Sergio Sergio is faster than Chris by 2x    1 0 Chris is faster than Naveed by 2x    1 0  benchmark 100  10  10 duplicates Running each test 1024 times  Test will take about 3 seconds  Chris is faster than Naveed by 2x    1 0 Naveed is faster than Cary diff by 2x    1 0  results differ  AAC vs AAF  Cary diff is similar to Cary set Cary set is faster than Sergio by 3x    1 0  results differ  AAF vs AAC  Sergio is similar to Ryan   Now consider an array with 10 000 elements   benchmark 10000  0  0 duplicates Running each test once  Test will take about 4 minutes  Ryan is similar to Sergio Sergio is similar to Cary set Cary set is similar to Cary diff Cary diff is faster than Chris by 400x    100 0 Chris is faster than Naveed by 3x    0 1  benchmark 10000  1  1 duplicates Running each test once  Test will take about 1 second  Cary set is similar to Cary diff Cary diff is similar to Sergio Sergio is similar to Ryan Ryan is faster than Chris by 2x    1 0 Chris is faster than Naveed by 2x    1 0  benchmark 10000  10  10 duplicates Running each test once  Test will take about 11 seconds  Cary set is similar to Cary diff Cary diff is faster than Sergio by 3x    1 0  results differ  AAE vs AAA  Sergio is similar to Ryan Ryan is faster than Chris by 20x    10 0 Chris is faster than Naveed by 3x    1 0  benchmark 10000  100  100 duplicates Cary set is similar to Cary diff Cary diff is faster than Sergio by 11x    10 0  results differ  ADG vs ACL  Sergio is similar to Ryan Ryan is similar to Chris Chris is faster than Naveed by 3x    1 0   Note that find a dup using difference arr  would be much more efficient if Array difference were implemented in C  which would be the case if it were added to the Ruby core   Conclusion  Many of the answers are reasonable but using a Set is the clear best choice  It is fastest in the medium-hard cases  joint fastest in the hardest and only in computationally trivial cases - when your choice won t matter anyway - can it be beaten    The one very special case in which you might pick Chris  solution would be if you want to use the method to separately de-duplicate thousands of small arrays and expect to find a duplicate typically less than 10 items in  This will be a bit faster as it avoids the small additional overhead of creating the Set

User · Answer

If you are comparing two different arrays  instead of one against itself  a very fast way is to use the intersect operator  amp  provided by Ruby s Array class     Given a     a    b    c    d   b     e    f    c    d      Then this    a  amp  b     gt    c    d

User · Answer

I know this thread is about Ruby specifically  but I landed here looking for how to do this within the context of Ruby on Rails with ActiveRecord and thought I would share my solution too   class ActiveRecordClass  lt  ActiveRecord  Base    has two columns  a primary key  id  and an email address  string  end  ActiveRecordClass group  email address  having  count     gt  1   count keys   The above returns an array of all email addresses that are duplicated in this example s database table  which in Rails would be  active record classes

User · Answer

Simply find the first instance where the index of the object  counting from the left  does not equal the index of the object  counting from the right    arr detect   e  arr rindex e     arr index e      If there are no duplicates  the return value will be nil   I believe this is the fastest solution posted in the thread so far  as well  since it doesn t rely on the creation of additional objects  and  index and  rindex are implemented in C  The big-O runtime is N 2 and thus slower than Sergio s  but the wall time could be much faster due to the the fact that the  slow  parts run in C

User · Answer

each with object is your friend   input     bla  blubb  bleh  bla  bleh  bla  blubb  brrr     to get the counts of the elements in the array   gt  input each with object      x h  h x      0  h x     1    gt    bla  gt 3   blubb  gt 2   bleh  gt 2   brrr  gt 1     to get only the counts of the non-unique elements in the array   gt  input each with object      x h  h x      0  h x     1  reject  k v  v  lt  2    gt    bla  gt 3   blubb  gt 2   bleh  gt 2

User · Answer

Alas most of the answers are O n 2    Here is an O n  solution   a    w the quick brown fox jumps over the lazy dog  h   Hash new 0  a find    each   h each     1     2       gt   the    What is the complexity of this    Runs in O n  and breaks on first match Uses O n  memory  but only the minimal amount    Now  depending on how frequent duplicates are in your array these runtimes might actually become even better  For example if the array of size O n  has been sampled from a population of k  lt  lt  n different elements only the complexity for both runtime and space becomes O k   however it is more likely that the original poster is validating input and wants to make sure there are no duplicates  In that case both runtime and memory complexity O n  since we expect the elements to have no repetitions for the majority of inputs

User · Answer

Something like this will work  arr     A    B    C    B    A   arr inject Hash new 0      h e  h e     1  h        select    k v  v  gt  1        collect    x  x first     That is  put all values to a hash where key is the element of array and value is number of occurences  Then select all elements which occur more than once  Easy

User · Answer

Try this  If you want to find the maximum duplicated element with their how many time is it has duplicated then should try     def get maximum duplicated element with count input array          a   input array         max duplicated val   max duplicated val count   0         a each do  n               max duplicated val  max duplicated val count   n  a count n  if a count n   gt   max duplicated val count               end         puts  quot Maximun Duplicated element Is   gt    max duplicated val  quot          puts  quot   max duplicated val  is Duplicated   max duplicated val count  times quot      end     get maximum duplicated element with count  1  4  4  5  6  6  2  6    Output will be Maximun Duplicated element Is   gt  6 6 is Duplicated 3 times

User · Answer

This code will return list of duplicated values  Hash keys are used as an efficient way of checking which values have already been seen  Based on whether value has been seen  the original array ary is partitioned into 2 arrays  first containing unique values and second containing duplicates   ary     hello    world    stack    overflow    hello    again    hash    arr partition    v  hash has key  v    false   hash v  0   last uniq    gt    hello     You can further shorten it - albeit at a cost of slightly more complex syntax - to this form   hash    arr partition    v   hash has key  v   amp  amp  hash v  0   last uniq

User · Answer

a     A    B    C    B    A   b   a select   e  a count e   gt  1  uniq c   a - b d   b   c   Results    d   gt    A    B    C

User · Answer

Let s create duplication method that take array of elements as input In the method body  let s create 2 new array objects one is seen and another one is duplicate finally lets iterate through each object in given array and for every iteration lets find that object existed in seen array  if object existed in the seen array  then it is considered as duplicate object and push that object into duplication array if object not-existed in the seen  then it is considered as unique object and push that object into seen array   let s demonstrate in Code Implementation  def duplication given array   seen objects        duplication objects         given array each do  element      duplication objects  lt  lt  element if seen objects include  element      seen objects  lt  lt  element   end    duplication objects end   Now call duplication method and output return result -   dup elements   duplication  1 2 3 4 4 5 6 6  puts dup elements inspect

User · Answer

Here is my take on it on a big set of data - such as a legacy dBase table to find duplicate parts    Assuming ps is an array of 20000 part numbers  amp  we want to find duplicates   actually had to it recently    having a result hash with part number and number of times part is    duplicated is much more convenient in the real world application   Takes about 6  seconds to run on my data set   - not too bad for an export script handling 20000 parts  h          or for readability  h        result hash ps select   e     ct   ps count e     h e    ct if ct  gt  1    nil   so that the huge result of select doesn t print in the console

User · Answer

find all   returns an array containing all elements of enum for which block is not false   To get duplicate elements   gt  gt  arr     A    B    C    B    A    gt  gt  arr find all    x  arr count x   gt  1      gt    A    B    B    A     Or duplicate uniq elements   gt  gt  arr find all    x  arr count x   gt  1   uniq   gt    A    B

User · Answer

r    1  2  3  5  1  2  3  1  2  1   r group by  amp  itself  map    k  v  v size  gt  1    k     v size    nil   compact sort by  amp  last  map  amp  first

[ruby] How to find and return a duplicate value in array

Examples related to ruby

Examples related to arrays