Better way to shuffle two numpy arrays in unison

Question

I have two numpy arrays of different shapes  but with the same length  leading dimension   I want to shuffle each of them  such that corresponding elements continue to correspond -- i e  shuffle them in unison with respect to their leading indices   This code works  and illustrates my goals   def shuffle in unison a  b       assert len a     len b      shuffled a   numpy empty a shape  dtype a dtype      shuffled b   numpy empty b shape  dtype b dtype      permutation   numpy random permutation len a       for old index  new index in enumerate permutation           shuffled a new index    a old index          shuffled b new index    b old index      return shuffled a  shuffled b   For example    gt  gt  gt  a   numpy asarray   1  1    2  2    3  3     gt  gt  gt  b   numpy asarray  1  2  3    gt  gt  gt  shuffle in unison a  b   array   2  2           1  1           3  3     array  2  1  3      However  this feels clunky  inefficient  and slow  and it requires making a copy of the arrays -- I d rather shuffle them in-place  since they ll be quite large   Is there a better way to go about this  Faster execution and lower memory usage are my primary goals  but elegant code would be nice  too   One other thought I had was this   def shuffle in unison scary a  b       rng state   numpy random get state       numpy random shuffle a      numpy random set state rng state      numpy random shuffle b    This works   but it s a little scary  as I see little guarantee it ll continue to work -- it doesn t look like the sort of thing that s guaranteed to survive across numpy version  for example

User · Answer

This seems like a very simple solution   import numpy as np def shuffle in unison a b        assert len a   len b      c   np arange len a       np random shuffle c       return a c  b c   a    np asarray   1  1    2  2    3  3    b    np asarray  11  22  33    shuffle in unison a b  Out 94     array   3  3            2  2            1  1      array  33  22  11

User · Answer

you can make an array like   s   np arange 0  len a   1    then shuffle it   np random shuffle s    now use this s as argument of your arrays  same shuffled arguments return same shuffled vectors   x data   x data s  x label   x label s

User · Answer

from np random import permutation from sklearn datasets import load iris iris   load iris   X   iris data  numpy array y   iris target  numpy array    Data is currently unshuffled  we should shuffle    each X i  with its corresponding y i  perm   permutation len X   X   X perm  y   y perm

User · Answer

Say we have two arrays  a and b    a   np array   1 2 3   4 5 6   7 8 9    b   np array   9 1 1   6 6 6   4 2 0       We can first obtain row indices by permutating first dimension   indices   np random permutation a shape 0    1 2 0    Then use advanced indexing  Here we are using the same indices to shuffle both arrays in unison    a shuffled   a indices   np newaxis   np arange a shape 1    b shuffled   b indices   np newaxis   np arange b shape 1      This is equivalent to  np take a  indices  axis 0    4 5 6    7 8 9    1 2 3    np take b  indices  axis 0    6 6 6    4 2 0    9 1 1

User · Answer

Just use numpy     First merge the two input arrays 1D array is labels y  and 2D array is data x  and shuffle them with NumPy shuffle method  Finally split them and return   import numpy as np  def shuffle 2d a  b       rows  a shape 0      if b shape     rows 1           b   b reshape  rows 1       S   np hstack  b a       np random shuffle S      b  a    S   0   S   1       return a b  features  samples   2  5 x  y   np random random  samples  features    np arange samples  x  y   shuffle 2d train  test

User · Answer

Your  scary  solution does not appear scary to me   Calling shuffle   for two sequences of the same length results in the same number of calls to the random number generator  and these are the only  random  elements in the shuffle algorithm   By resetting the state  you ensure that the calls to the random number generator will give the same results in the second call to shuffle    so the whole algorithm will generate the same permutation   If you don t like this  a different solution would be to store your data in one array instead of two right from the beginning  and create two views into this single array simulating the two arrays you have now   You can use the single array for shuffling and the views for all other purposes   Example  Let s assume the arrays a and b look like this   a   numpy array      0     1     2                         3     4     5                           6     7     8                         9    10    11                          12    13    14                        15    16    17       b   numpy array    0    1                       2    3                       4    5       We can now construct a single array containing all the data   c   numpy c  a reshape len a   -1   b reshape len b   -1     array     0     1     2     3     4     5     0     1                6     7     8     9    10    11     2     3               12    13    14    15    16    17     4     5       Now we create views simulating the original a and b   a2   c     a size  len a   reshape a shape  b2   c    a size  len a    reshape b shape    The data of a2 and b2 is shared with c   To shuffle both arrays simultaneously  use numpy random shuffle c    In production code  you would of course try to avoid creating the original a and b at all and right away create c  a2 and b2   This solution could be adapted to the case that a and b have different dtypes

User · Answer

I extended python s random shuffle   to take a second arg   def shuffle together x  y       assert len x     len y       for i in reversed xrange 1  len x               pick an element in x  i 1  with which to exchange x i          j   int random random      i 1           x i   x j    x j   x i          y i   y j    y j   y i    That way I can be sure that the shuffling happens in-place  and the function is not all too long or complicated

User · Answer

If you want to avoid copying arrays  then I would suggest that instead of generating a permutation list  you go through every element in the array  and randomly swap it to another position in the array  for old index in len a       new index   numpy random randint old index 1      a old index   a new index    a new index   a old index      b old index   b new index    b new index   b old index    This implements the Knuth-Fisher-Yates shuffle algorithm

User · Answer

James wrote in 2015 an sklearn solution which is helpful  But he added a random state variable  which is not needed  In the below code  the random state from numpy is automatically assumed   X   np array   1   0     2   1     0   0     y   np array  0  1  2   from sklearn utils import shuffle X  y   shuffle X  y

User · Answer

X   np array   1   0     2   1     0   0     y   np array  0  1  2   from sklearn utils import shuffle X  y   shuffle X  y  random state 0    To learn more  see http   scikit-learn org stable modules generated sklearn utils shuffle html

User · Answer

Shuffle any number of arrays together  in-place  using only NumPy   import numpy as np   def shuffle arrays arrays  set seed -1          Shuffles arrays in-place  in the same order  along axis 0      Parameters      -----------     arrays   List of NumPy arrays      set seed   Seed value if int  gt   0  else seed is random              assert all len arr     len arrays 0   for arr in arrays      seed   np random randint 0  2   32 - 1  - 1  if set seed  lt  0 else set seed      for arr in arrays          rstate   np random RandomState seed          rstate shuffle arr    And can be used like this  a   np array  1  2  3  4  5   b   np array  10 20 30 40 50   c   np array   1 10 11    2 20 22    3 30 33    4 40 44    5 50 55     shuffle arrays  a  b  c     A few things to note    The assert ensures that all input arrays have the same length along their first dimension  Arrays shuffled in-place by their first dimension - nothing returned  Random seed within positive int32 range  If a repeatable shuffle is needed  seed value can be set    After the shuffle  the data can be split using np split or referenced using slices - depending on the application

User · Answer

One way in which in-place shuffling  can be done for connected lists is using a seed  it could be random  and using numpy random shuffle to do the shuffling     Set seed to a random number if you want the shuffling to be non-deterministic  def shuffle a  b  seed      np random seed seed     np random shuffle a     np random seed seed     np random shuffle b    That s it  This will shuffle both a and b in the exact same way  This is also done in-place which is always a plus   EDIT  don t use np random seed   use np random RandomState instead  def shuffle a  b  seed      rand state   np random RandomState seed     rand state shuffle a     rand state seed seed     rand state shuffle b    When calling it just pass in any seed to feed the random state   a    1 2 3 4  b    11  22  33  44  shuffle a  b  12345    Output    gt  gt  gt  a  1  4  2  3   gt  gt  gt  b  11  44  22  33    Edit  Fixed code to re-seed the random state

User · Answer

With an example  this is what I m doing   combo      for i in range 60000       combo append  images i   labels i     shuffle combo   im      lab      for c in combo      im append c 0       lab append c 1   images   np asarray im  labels   np asarray lab

User · Answer

Your can use NumPy s array indexing   def unison shuffled copies a  b       assert len a     len b      p   numpy random permutation len a       return a p   b p    This will result in creation of separate unison-shuffled arrays

User · Answer

Very simple solution   randomize   np arange len x   np random shuffle randomize  x   x randomize  y   y randomize    the two arrays x y are now both randomly shuffled in the same way

User · Answer

There is a well-known function that can handle this   from sklearn model selection import train test split X     Y      train test split X Y  test size 0 0    Just setting test size to 0 will avoid splitting and give you shuffled data   Though it is usually used to split train and test data  it does shuffle them too  From documentation     Split arrays or matrices into random train and test subsets      Quick utility that wraps input validation and   next ShuffleSplit   split X  y   and application to input data into a   single call for splitting  and optionally subsampling  data in a   oneliner

[python] Better way to shuffle two numpy arrays in unison

Examples related to python

Examples related to numpy

Examples related to random

Examples related to shuffle

Examples related to numpy-ndarray