Numpy Get random set of rows from 2D array

Question

I have a very large 2D array which looks something like this   a    a1  b1  c1     a2  b2  c2           an  bn  cn     Using numpy  is there an easy way to get a new 2D array with  e g   2 random rows from the initial array a  without replacement    e g   b    a4   b4   c4     a99  b99  c99

User · Answer

This is an old post  but this is what works best for me   A np random choice A shape 0   num rows 2 sample  replace False     change the replace False to True to get the same thing  but with replacement

User · Answer

I see permutation has been suggested  In fact it can be made into one line    gt  gt  gt  A   np random randint 5  size  10 3    gt  gt  gt  np random permutation A   2   array   0  3  0           3  1  2

User · Answer

Another option is to create a random mask if you just want to down-sample your data by a certain factor  Say I want to down-sample to 25  of my original data set  which is currently held in the array data arr     generate random boolean mask the length of data   use p 0 75 for False and 0 25 for True mask   numpy random choice  False  True   len data arr   p  0 75  0 25     Now you can call data arr mask  and return  25  of the rows  randomly sampled

User · Answer

An alternative way of doing it is by using the choice method of the Generator class  https   github com numpy numpy issues 10835 import numpy as np    generate the random array A   np random randint 5  size  10 3      use the choice method of the Generator class rng   np random default rng   A sampled   rng choice A  2   leading to a sampled data  array   1  3  2           1  2  1     The running time is also profiled compared as follows   timeit rng choice A  2  15 1   s    115 ns per loop  mean    std  dev  of 7 runs  100000 loops each    timeit np random permutation A   2  4 22   s    83 9 ns per loop  mean    std  dev  of 7 runs  100000 loops each    timeit A np random randint A shape 0   size 2      10 6   s    418 ns per loop  mean    std  dev  of 7 runs  100000 loops each   But when the array goes big  A   np random randint 10  size  1000 300    working on the index is the best way   timeit A np random randint A shape 0   size 50      17 6   s    657 ns per loop  mean    std  dev  of 7 runs  100000 loops each    timeit rng choice A  50  22 3   s    134 ns per loop  mean    std  dev  of 7 runs  10000 loops each    timeit np random permutation A   50  143   s    1 33   s per loop  mean    std  dev  of 7 runs  10000 loops each   So the permutation method seems to be the most efficient one when your array is small while working on the index is the optimal solution when your array goes big

User · Answer

If you need the same rows but just a random sample then   import random new array   random sample old array x    Here x  has to be an  int  defining the number of rows you want to randomly pick

User · Answer

If you want to generate multiple random subsets of rows  for example if your doing RANSAC   num pop   10 num samples   2 pop in sample   3 rows to sample   np random random  num pop  5   random numbers   np random random  num samples  num pop   samples   np argsort random numbers  axis 1      pop in sample    will be shape  num samples  pop in sample  5  row subsets   rows to sample samples

User · Answer

This is a similar answer to the one Hezi Rasheff provided  but simplified so newer python users understand what s going on  I noticed many new datascience students fetch random samples in the weirdest ways because they don t know what they are doing in python   You can get a number of random indices from your array by using  indices   np random choice A shape 0   amount of samples  replace False   You can then use fancy indexing with your numpy array to get the samples at those indices  A indices   This will get you the specified number of random samples from your data

User · Answer

gt  gt  gt  A   np random randint 5  size  10 3    gt  gt  gt  A array   1  3  0           3  2  0           0  2  1           1  1  4           3  2  2           0  1  0           1  3  1           0  4  1           2  4  2           3  3  1     gt  gt  gt  idx   np random randint 10  size 2   gt  gt  gt  idx array  7  6    gt  gt  gt  A idx    array   0  4  1           1  3  1      Putting it together for a general case   A np random randint A shape 0   size 2        For non replacement  numpy 1 7 0     A np random choice A shape 0   2  replace False        I do not believe there is a good way to generate random list without replacement before 1 7  Perhaps you can setup a small definition that ensures the two values are not the same

[python] Numpy: Get random set of rows from 2D array

Examples related to python

Examples related to numpy