scikit-learn random state in splitting dataset

Question

Can anyone tell me why we set random state to zero in splitting train and test set  X train  X test  y train  y test         train test split X  y  test size 0 30  random state 0   I have seen situations like this where random state is set to 1  X train  X test  y train  y test         train test split X  y  test size 0 30  random state 1   What is the consequence of this random state in cross validation as well

User · Answer

If you don t mention the random state in the code  then whenever you execute your code a new random value is generated and the train and test datasets would have different values each time   However  if you use a particular value for random state random state   1 or any other value  everytime the result will be same i e  same values in train and test datasets

User · Answer

We used the random state parameter for reproducibility of the initial shuffling of training datasets after each epoch

User · Answer

It doesn t matter if the random state is 0 or 1 or any other integer  What matters is that it should be set the same value  if you want to validate your processing over multiple runs of the code  By the way I have seen random state 42 used in many official examples of scikit as well as elsewhere also   random state as the name suggests  is used for initializing the internal random number generator  which will decide the splitting of data into train and test indices in your case  In the documentation  it is stated that      If random state is None or np random  then a randomly-initialized RandomState object is returned       If random state is an integer  then it is used to seed a new RandomState object       If random state is a RandomState object  then it is passed through    This is to check and validate the data when running the code multiple times  Setting random state a fixed value will guarantee that same sequence of random numbers are generated each time you run the code  And unless there is some other randomness present in the process  the results produced will be same as always  This helps in verifying the output

User · Answer

when random state set to an integer  train test split will return same results for each execution   when random state set to an None  train test split will return different results for each execution   see below example   from sklearn model selection import train test split  X data   range 10  y data   range 10   for i in range 5       X train  X test  y train  y test   train test split X data  y data  test size   0 3 random state   0    zero or any other integer     print y test   print     30   for i in range 5        X train  X test  y train  y test   train test split X data  y data  test size   0 3 random state   None      print y test    Output    2  8  4    2  8  4    2  8  4    2  8  4    2  8  4      4  7  6    4  3  7    8  1  4    9  5  8    6  4  5

User · Answer

random state is None by default which means every time when you run your program you will get different output because of splitting between train and test varies within   random state   any int value means every time when you run your program you will get tehe same output because of splitting between train and test does not varies within

User · Answer

If you don t specify the random state in your code  then every time you run execute  your code a new random value is generated and the train and test datasets would have different values each time   However  if a fixed value is assigned like random state   0 or 1 or 42 then no matter how many times you execute your code the result would be the same  i e  same values in train and test datasets

User · Answer

The random state splits a randomly selected data but with a twist  And the twist is the order of the data will be same for a particular value of random state You need to understand that it s not a bool accpeted value  starting from 0 to any integer no  if you pass as random state it ll be a permanent order for it  Ex  the order you will get in random state 0 remain same  After that if you execuit random state 5 and again come back to random state 0 you ll get the same order  And like 0 for all integer will go same  How ever random state None splits randomly each time   If still having doubt watch this

User · Answer

The random state is an integer value which implies the selection of a random combination of train and test  When you set the test size as 1 4 the there is a set generated of permutation and combination of train and test and each combination has one state  Suppose you have a dataset--- gt   1 2 3 4  Train      Test     State  1 2 3      4         0    1 3 4      2         1    4 2 3      1         2    2 4 1      3         3    We need it because while param tuning of model same state will considered again and again  So that there won t be any inference with the accuracy  But in case of Random forest there is also similar story but in a different way w r t the variables

User · Answer

For multiple times of execution of our model  random state  make sure that data values  will be same for training and testing data sets  It fixes the order of data for train test split

[python] scikit-learn random state in splitting dataset

Examples related to python

Examples related to random

Examples related to machine-learning

Examples related to scikit-learn