How do I get indices of N maximum values in a NumPy array

Question

NumPy proposes a way to get the index of the maximum value of an array via np argmax   I would like a similar thing  but returning the indexes of the N maximum values   For instance  if I have an array   1  3  2  4  5   function array  n 3  would return the indices  4  3  1  which correspond to the elements  5  4  3

User · Answer

I think the most time efficiency way is manually iterate through the array and keep a k-size min-heap, as other people have mentioned.

And I also come up with a brute force approach:

top_k_index_list = [ ]
for i in range(k):
    top_k_index_list.append(np.argmax(my_array))
    my_array[top_k_index_list[-1]] = -float('inf')

Set the largest element to a large negative value after you use argmax to get its index. And then the next call of argmax will return the second largest element. And you can log the original value of these elements and recover them if you want.

User · Answer

For multidimensional arrays you can use the axis keyword in order to apply the partitioning along the expected axis     For a 2D array indices   np argpartition arr  -N  axis 1     -N     And for grabbing the items   x   arr shape 0  arr np repeat np arange x   N   indices ravel    reshape x  N    But note that this won t return a sorted result  In that case you can use np argsort   along the intended axis   indices   np argsort arr  axis 1     -N      Result x   arr shape 0  arr np repeat np arange x   N   indices ravel    reshape x  N    Here is an example   In  42   a   np random randint 0  20   10  10    In  44   a Out 44   array    7  11  12   0   2   3   4  10   6  10           16  16   4   3  18   5  10   4  14   9            2   9  15  12  18   3  13  11   5  10           14   0   9  11   1   4   9  19  18  12            0  10   5  15   9  18   5   2  16  19           14  19   3  11  13  11  13  11   1  14            7  15  18   6   5  13   1   7   9  19           11  17  11  16  14   3  16   1  12  19            2   4  14   8   6   9  14   9   1   5            1  10  15   0   1   9  18   2   2  12     In  45   np argpartition a  np argmin a  axis 0      1     1 is because the first item is the minimum one  Out 45   array   4  5  6  8  0  7  9  1  2           2  7  5  9  6  8  1  0  4           5  8  1  9  7  3  6  2  4           4  5  2  6  3  9  0  8  7           7  2  6  4  1  3  8  5  9           2  3  5  7  6  4  0  9  1           4  3  0  7  8  5  1  2  9           5  2  0  8  4  6  3  1  9           0  1  9  4  3  7  5  2  6           0  4  7  8  5  1  9  2  6     In  46   np argpartition a  np argmin a  axis 0      -3   Out 46   array   9  1  2           1  0  4           6  2  4           0  8  7           8  5  9           0  9  1           1  2  9           3  1  9           5  2  6           9  2  6     In  89   a np repeat np arange x   3   ind ravel    reshape x  3  Out 89   array   10  11  12           16  16  18           13  15  18           14  18  19           16  18  19           14  14  19           15  18  19           16  17  19            9  14  14           12  15  18

User · Answer

Simpler yet   idx    -arr  argsort    n    where n is the number of maximum values

User · Answer

This code works for a numpy 2D matrix array  mat   np array   1  3    2  5      numpy matrix   n   2    n n largest mat   np sort mat  axis None  -n     n largest  tf n largest   np zeros  2 2   dtype bool    all false matrix for x in n largest mat     tf n largest    tf n largest     mat    x    true-false    n largest elems   mat tf n largest    true-false indexing   This produces a true-false n largest matrix indexing that also works to extract n largest elements from a matrix array

User · Answer

bottleneck has a partial sort function  if the expense of sorting the entire array just to get the N largest values is too great   I know nothing about this module  I just googled numpy partial sort

User · Answer

Three Answers Compared For Coding Ease And Speed Speed was important for my needs  so I tested three answers to this question  Code from those three answers was modified as needed for my specific case  I then compared the speed of each method  Coding wise   NPE s answer was the next most elegant and adequately fast for my needs  Fred Foos answer required the most refactoring for my needs but was the fastest  I went with this answer  because even though it took more work  it was not too bad and had significant speed advantages  off99555 s answer was the most elegant  but it is the slowest   Complete Code for Test and Comparisons import numpy as np import time import random import sys from operator import itemgetter from heapq import nlargest      Fake Data Setup     a1   list range 1000000   random shuffle a1  a1   np array a1                                                                NPE s Answer Modified A Bit For My Case     t0   time time   indices   np flip np argsort a1    5  results      for index in indices      results append  index  a1 index    t1   time time   print  quot NPE s Answer  quot   print results  print t1 - t0  print        Fred Foos Answer Modified A Bit For My Case    t0   time time   indices   np argpartition a1  -6  -5   results      for index in indices      results append  a1 index   index   results sort reverse True  results     b  a  for a  b in results  t1   time time   print  quot Fred Foo s Answer  quot   print results  print t1 - t0  print        off99555 s Answer - No Modification Needed For My Needs     t0   time time   result   nlargest 5  enumerate a1   itemgetter 1   t1   time time   print  quot off99555 s Answer  quot   print result  print t1 - t0   Output with Speed Reports NPE s Answer    631934  999999    788104  999998    413003  999997    536514  999996    81029  999995   0 1349949836730957  Fred Foo s Answer    631934  999999    788104  999998    413003  999997    536514  999996    81029  999995   0 011161565780639648  off99555 s Answer    631934  999999    788104  999998    413003  999997    536514  999996    81029  999995   0 439760684967041

User · Answer

The following is a very easy way to see the maximum elements and its positions  Here axis is the domain  axis   0 means column wise maximum number and axis   1 means row wise max number for the 2D case  And for higher dimensions it depends upon you   M   np random random  3  4   print M  print M max axis 1   M argmax axis 1

User · Answer

I found it most intuitive to use np unique    The idea is  that the unique method returns the indices of the input values  Then from the max unique value and the indicies  the position of the original values can be recreated   multi max    1 1 2 2 4 0 0 4  uniques  idx   np unique multi max  return inverse True  print np squeeze np argwhere idx    np argmax uniques     gt  gt   4 7

User · Answer

Here s a more complicated way that increases n if the nth value has ties   gt  gt  gt  gt  def get top n plus ties arr n    gt  gt  gt  gt      sorted args   np argsort -arr   gt  gt  gt  gt      thresh   arr sorted args n    gt  gt  gt  gt      n    np sum arr  gt   thresh   gt  gt  gt  gt      return sorted args  n    gt  gt  gt  gt  get top n plus ties np array  2 9 8 3 0 2 8 3 1 9 5   3  array  1  9  2  6

User · Answer

If you happen to be working with a multidimensional array then you ll need to flatten and unravel the indices   def largest indices ary  n          Returns the n largest indices from a numpy array         flat   ary flatten       indices   np argpartition flat  -n  -n       indices   indices np argsort -flat indices        return np unravel index indices  ary shape    For example    gt  gt  gt  xs   np sin np arange 9   reshape  3  3    gt  gt  gt  xs array    0            0 84147098   0 90929743            0 14112001  -0 7568025   -0 95892427           -0 2794155    0 6569866    0 98935825     gt  gt  gt  largest indices xs  3   array  2  0  0    array  2  2  1     gt  gt  gt  xs largest indices xs  3   array   0 98935825   0 90929743   0 84147098

User · Answer

Use    gt  gt  gt  import heapq  gt  gt  gt  import numpy  gt  gt  gt  a   numpy array  1  3  2  4  5    gt  gt  gt  heapq nlargest 3  range len a    a take   4  3  1    For regular Python lists    gt  gt  gt  a    1  3  2  4  5   gt  gt  gt  heapq nlargest 3  range len a    a   getitem     4  3  1    If you use Python 2  use xrange instead of range   Source  heapq     Heap queue algorithm

User · Answer

This will be faster than a full sort depending on the size of your original array and the size of your selection    gt  gt  gt  A   np random randint 0 10 10   gt  gt  gt  A array  5  1  5  5  2  3  2  4  1  0    gt  gt  gt  B   np zeros 3  int   gt  gt  gt  for i in xrange 3           idx   np argmax A          B i  idx  A idx  0  something smaller than A min             gt  gt  gt  B array  0  2  3     It  of course  involves tampering with your original array   Which you could fix  if needed  by making a copy or replacing back the original values      whichever is cheaper for your use case

User · Answer

Use   def max indices arr  k               Returns the indices of the k first largest elements of arr      in descending order in values              assert k  lt   arr size   k should be smaller or equal to the array size      arr    arr astype float     make a copy of arr     max idxs          for   in range k           max element   np max arr           if np isinf max element               break         else              idx   np where arr     max element          max idxs append idx          arr  idx    -np inf     return max idxs   It also works with 2D arrays  For example   In  0   A   np array    0 51845014   0 72528114                          0 88421561   0 18798661                          0 89832036   0 19448609                          0 89832036   0 19448609    In  1   max indices A  8  Out 1         array  2  3   dtype int64   array  0  0   dtype int64          array  1   dtype int64   array  0   dtype int64          array  0   dtype int64   array  1   dtype int64          array  0   dtype int64   array  0   dtype int64          array  2  3   dtype int64   array  1  1   dtype int64          array  1   dtype int64   array  1   dtype int64     In  2   A max indices A  8  0   0  Out 2   array   0 89832036

User · Answer

Newer NumPy versions  1 8 and up  have a function called argpartition for this  To get the indices of the four largest elements  do   gt  gt  gt  a   np array  9  4  4  3  3  9  0  4  6  0    gt  gt  gt  a array  9  4  4  3  3  9  0  4  6  0    gt  gt  gt  ind   np argpartition a  -4  -4    gt  gt  gt  ind array  1  5  8  0    gt  gt  gt  a ind  array  4  9  6  9     Unlike argsort  this function runs in linear time in the worst case  but the returned indices are not sorted  as can be seen from the result of evaluating a ind   If you need that too  sort them afterwards    gt  gt  gt  ind np argsort a ind    array  1  8  5  0     To get the top-k elements in sorted order in this way takes O n   k log k  time

User · Answer

If you don t care about the order of the K-th largest elements you can use argpartition  which should perform better than a full sort through argsort   K   4   We want the indices of the four largest values a   np array  0  8  0  4  5  8  8  0  4  2   np argpartition a -K  -K   array  4  1  5  6     Credits go to this question   I ran a few tests and it looks like argpartition outperforms argsort as the size of the array and the value of K increase

User · Answer

Use   from operator import itemgetter from heapq import nlargest result   nlargest N  enumerate your list   itemgetter 1     Now the result list would contain N tuples  index  value  where value is maximized

User · Answer

When top k lt  lt axis length it better than argsort  import numpy as np  def get sorted top k array  top k 1  axis -1  reverse False       if reverse          axis length   array shape axis          partition index   np take np argpartition array  kth -top k  axis axis                                     range axis length - top k  axis length   axis      else          partition index   np take np argpartition array  kth top k  axis axis   range 0  top k   axis      top scores   np take along axis array  partition index  axis        resort partition     sorted index   np argsort top scores  axis axis      if reverse          sorted index   np flip sorted index  axis axis      top sorted scores   np take along axis top scores  sorted index  axis      top sorted indexes   np take along axis partition index  sorted index  axis      return top sorted scores  top sorted indexes  if   name       quot   main   quot       import time     from sklearn metrics pairwise import cosine similarity      x   np random rand 10  128      y   np random rand 1000000  128      z   cosine similarity x  y      start time   time time       sorted index 1   get sorted top k z  top k 3  axis 1  reverse True  1      print time time   - start time

User · Answer

Method np argpartition only returns the k largest indices  performs a local sort  and is faster than np argsort performing a full sort  when array is quite large  But the returned indices are NOT in ascending descending order  Let s say with an example     We can see that if you want a strict ascending order top k indices  np argpartition won t return what you want   Apart from doing a sort manually after np argpartition  my solution is to use PyTorch  torch topk  a tool for neural network construction  providing NumPy-like APIs with both CPU and GPU support  It s as fast as NumPy with MKL  and offers a GPU boost if you need large matrix vector calculations   Strict ascend descend top k indices code will be     Note that torch topk accepts a torch tensor  and returns both top k values and top k indices in type torch Tensor  Similar with np  torch topk also accepts an axis argument so that you can handle multi-dimensional arrays tensors

User · Answer

The simplest I ve been able to come up with is   In  1   import numpy as np  In  2   arr   np array  1  3  2  4  5    In  3   arr argsort   -3     -1  Out 3   array  4  3  1     This involves a complete sort of the array  I wonder if numpy provides a built-in way to do a partial sort  so far I haven t been able to find one   If this solution turns out to be too slow  especially for small n   it may be worth looking at coding something up in Cython

[python] How do I get indices of N maximum values in a NumPy array?

Examples related to python

Examples related to numpy

Examples related to max

Examples related to numpy-ndarray