[ruby-on-rails] Remove duplicate elements from array in Ruby

I have a Ruby array which contains duplicate elements.

array = [1,2,2,1,4,4,5,6,7,8,5,6]

How can I remove all the duplicate elements from this array while retaining all unique elements without using for-loops and iteration?

This question is related to ruby-on-rails arrays ruby duplicates

The answer is


Just to provide some insight:

require 'fruity'
require 'set'

array = [1,2,2,1,4,4,5,6,7,8,5,6] * 1_000

def mithun_sasidharan(ary)
  ary.uniq
end

def jaredsmith(ary)
  ary & ary
end

def lri(ary)
  counts = Hash.new(0)
  ary.each { |v| counts[v] += 1 }
  counts.select { |v, count| count == 1 }.keys 
end

def finks(ary)
  ary.to_set
end

def santosh_mohanty(ary)
    result = ary.reject.with_index do |ele,index|
      res = (ary[index+1] ^ ele)
      res == 0
    end
end

SHORT_ARRAY = [1,1,2,2,3,1]
mithun_sasidharan(SHORT_ARRAY) # => [1, 2, 3]
jaredsmith(SHORT_ARRAY) # => [1, 2, 3]
lri(SHORT_ARRAY) # => [3]
finks(SHORT_ARRAY) # => #<Set: {1, 2, 3}>
santosh_mohanty(SHORT_ARRAY) # => [1, 2, 3, 1]

puts 'Ruby v%s' % RUBY_VERSION

compare do
  _mithun_sasidharan { mithun_sasidharan(array) }
  _jaredsmith { jaredsmith(array) }
  _lri { lri(array) }
  _finks { finks(array) }
  _santosh_mohanty { santosh_mohanty(array) }
end

Which, when run, results in:

# >> Ruby v2.7.1
# >> Running each test 16 times. Test will take about 2 seconds.
# >> _mithun_sasidharan is faster than _jaredsmith by 2x ± 0.1
# >> _jaredsmith is faster than _santosh_mohanty by 4x ± 0.1 (results differ: [1, 2, 4, 5, 6, 7, 8] vs [1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, ...
# >> _santosh_mohanty is similar to _lri (results differ: [1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, 7, 8, 5, 6, 1, 2, 1, 4, 5, 6, ...
# >> _lri is similar to _finks (results differ: [] vs #<Set: {1, 2, 4, 5, 6, 7, 8}>)

Note: these returned bad results:

  • lri(SHORT_ARRAY) # => [3]
  • finks(SHORT_ARRAY) # => #<Set: {1, 2, 3}>
  • santosh_mohanty(SHORT_ARRAY) # => [1, 2, 3, 1]

If someone was looking for a way to remove all instances of repeated values, see "How can I efficiently extract repeated elements in a Ruby array?".

a = [1, 2, 2, 3]
counts = Hash.new(0)
a.each { |v| counts[v] += 1 }
p counts.select { |v, count| count == 1 }.keys # [1, 3]

You can remove the duplicate elements with the uniq method:

array.uniq  # => [1, 2, 4, 5, 6, 7, 8]

What might also be useful to know is that uniq takes a block, so if you have a have an array of keys:

["bucket1:file1", "bucket2:file1", "bucket3:file2", "bucket4:file2"]

and you want to know what the unique files are, you can find it out with:

a.uniq { |f| f[/\d+$/] }.map { |p| p.split(':').last }

Just another alternative if anyone cares.

You can also use the to_set method of an array which converts the Array into a Set and by definition, set elements are unique.

[1,2,3,4,5,5,5,6].to_set => [1,2,3,4,5,6]

The simplest ways for me are these ones:

array = [1, 2, 2, 3]

Array#to_set

array.to_set.to_a

# [1, 2, 3]

Array#uniq

array.uniq

# [1, 2, 3]

You can return the intersection.

a = [1,1,2,3]
a & a

This will also delete duplicates.


Try using the XOR operator, without using built-in functions:

a = [3,2,3,2,3,5,6,7].sort!

result = a.reject.with_index do |ele,index|
  res = (a[index+1] ^ ele)
  res == 0
end

print result

With built-in functions:

a = [3,2,3,2,3,5,6,7]

a.uniq

Examples related to ruby-on-rails

Embed ruby within URL : Middleman Blog Titlecase all entries into a form_for text field Where do I put a single filter that filters methods in two controllers in Rails Empty brackets '[]' appearing when using .where How to integrate Dart into a Rails app Rails 2.3.4 Persisting Model on Validation Failure How to fix "Your Ruby version is 2.3.0, but your Gemfile specified 2.2.5" while server starting Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5432? Rails: Can't verify CSRF token authenticity when making a POST request Uncaught ReferenceError: React is not defined

Examples related to arrays

PHP array value passes to next row Use NSInteger as array index How do I show a message in the foreach loop? Objects are not valid as a React child. If you meant to render a collection of children, use an array instead Iterating over arrays in Python 3 Best way to "push" into C# array Sort Array of object by object field in Angular 6 Checking for duplicate strings in JavaScript array what does numpy ndarray shape do? How to round a numpy array?

Examples related to ruby

Uninitialized Constant MessagesController Embed ruby within URL : Middleman Blog Titlecase all entries into a form_for text field Ruby - ignore "exit" in code Empty brackets '[]' appearing when using .where find_spec_for_exe': can't find gem bundler (>= 0.a) (Gem::GemNotFoundException) How to update Ruby Version 2.0.0 to the latest version in Mac OSX Yosemite? How to fix "Your Ruby version is 2.3.0, but your Gemfile specified 2.2.5" while server starting Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5432? How to update Ruby with Homebrew?

Examples related to duplicates

Remove duplicates from dataframe, based on two columns A,B, keeping row with max value in another column C Remove duplicates from a dataframe in PySpark How to "select distinct" across multiple data frame columns in pandas? How to find duplicate records in PostgreSQL Drop all duplicate rows across multiple columns in Python Pandas Left Join without duplicate rows from left table Finding duplicate integers in an array and display how many times they occurred How do I use SELECT GROUP BY in DataTable.Select(Expression)? How to delete duplicate rows in SQL Server? Python copy files to a new directory and rename if file name already exists