[ruby] Get index of array element faster than O(n)

Given I have a HUGE array, and a value from it. I want to get index of the value in array. Is there any other way, rather then call Array#index to get it? The problem comes from the need of keeping really huge array and calling Array#index enormous amount of times.

After a couple of tries I found that caching indexes inside elements by storing structs with (value, index) fields instead of the value itself gives a huge step in performance (20x times win).

Still I wonder if there's a more convenient way of finding index of en element without caching (or there's a good caching technique that will boost up the performance).

This question is related to ruby arrays performance indexing

The answer is


Why not use index or rindex?

array = %w( a b c d e)
# get FIRST index of element searched
puts array.index('a')
# get LAST index of element searched
puts array.rindex('a')

index: http://www.ruby-doc.org/core-1.9.3/Array.html#method-i-index

rindex: http://www.ruby-doc.org/core-1.9.3/Array.html#method-i-rindex


If your array has a natural order use binary search.

Use binary search.

Binary search has O(log n) access time.

Here are the steps on how to use binary search,

  • What is the ordering of you array? For example, is it sorted by name?
  • Use bsearch to find elements or indices

Code example

# assume array is sorted by name!

array.bsearch { |each| "Jamie" <=> each.name } # returns element
(0..array.size).bsearch { |n| "Jamie" <=> array[n].name } # returns index

Still I wonder if there's a more convenient way of finding index of en element without caching (or there's a good caching technique that will boost up the performance).

You can use binary search (if your array is ordered and the values you store in the array are comparable in some way). For that to work you need to be able to tell the binary search whether it should be looking "to the left" or "to the right" of the current element. But I believe there is nothing wrong with storing the index at insertion time and then using it if you are getting the element from the same array.


Taking a combination of @sawa's answer and the comment listed there you could implement a "quick" index and rindex on the array class.

class Array
  def quick_index el
    hash = Hash[self.map.with_index.to_a]
    hash[el]
  end

  def quick_rindex el
    hash = Hash[self.reverse.map.with_index.to_a]
    array.length - 1 - hash[el]
  end
end

If it's a sorted array you could use a Binary search algorithm (O(log n)). For example, extending the Array-class with this functionality:

class Array
  def b_search(e, l = 0, u = length - 1)
    return if lower_index > upper_index

    midpoint_index = (lower_index + upper_index) / 2
    return midpoint_index if self[midpoint_index] == value

    if value < self[midpoint_index]
      b_search(value, lower_index, upper_index - 1)
    else
      b_search(value, lower_index + 1, upper_index)
    end
  end
end

Is there a good reason not to use a hash? Lookups are O(1) vs. O(n) for the array.


Other answers don't take into account the possibility of an entry listed multiple times in an array. This will return a hash where each key is a unique object in the array and each value is an array of indices that corresponds to where the object lives:

a = [1, 2, 3, 1, 2, 3, 4]
=> [1, 2, 3, 1, 2, 3, 4]

indices = a.each_with_index.inject(Hash.new { Array.new }) do |hash, (obj, i)| 
    hash[obj] += [i]
    hash
end
=> { 1 => [0, 3], 2 => [1, 4], 3 => [2, 5], 4 => [6] }

This allows for a quick search for duplicate entries:

indices.select { |k, v| v.size > 1 }
=> { 1 => [0, 3], 2 => [1, 4], 3 => [2, 5] }

Examples related to ruby

Uninitialized Constant MessagesController Embed ruby within URL : Middleman Blog Titlecase all entries into a form_for text field Ruby - ignore "exit" in code Empty brackets '[]' appearing when using .where find_spec_for_exe': can't find gem bundler (>= 0.a) (Gem::GemNotFoundException) How to update Ruby Version 2.0.0 to the latest version in Mac OSX Yosemite? How to fix "Your Ruby version is 2.3.0, but your Gemfile specified 2.2.5" while server starting Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5432? How to update Ruby with Homebrew?

Examples related to arrays

PHP array value passes to next row Use NSInteger as array index How do I show a message in the foreach loop? Objects are not valid as a React child. If you meant to render a collection of children, use an array instead Iterating over arrays in Python 3 Best way to "push" into C# array Sort Array of object by object field in Angular 6 Checking for duplicate strings in JavaScript array what does numpy ndarray shape do? How to round a numpy array?

Examples related to performance

Why is 2 * (i * i) faster than 2 * i * i in Java? What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check if a key exists in Json Object and get its value Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? Most efficient way to map function over numpy array The most efficient way to remove first N elements in a list? Fastest way to get the first n elements of a List into an Array Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3? pandas loc vs. iloc vs. at vs. iat? Android Recyclerview vs ListView with Viewholder

Examples related to indexing

numpy array TypeError: only integer scalar arrays can be converted to a scalar index How to print a specific row of a pandas DataFrame? What does 'index 0 is out of bounds for axis 0 with size 0' mean? How does String.Index work in Swift Pandas KeyError: value not in index Update row values where certain condition is met in pandas Pandas split DataFrame by column value Rebuild all indexes in a Database How are iloc and loc different? pandas loc vs. iloc vs. at vs. iat?