Reasons for using the set seed function

Question

Many times I have seen the set seed function in R  before starting the program  I know it s basically used for the random number generation  Is there any specific need to set this

User · Answer

set seed is a base function that it is able to generate  every time you want  together other functions  rnorm  runif  sample  the same random value  Below an example without set seed  gt  set seed NULL   gt  rnorm 5   1   1 5982677 -2 2572974  2 3057461  0 5935456  0 1143519  gt  rnorm 5   1   0 15135371  0 20266228  0 95084266  0 09319339 -1 11049182  gt  set seed NULL   gt  runif 5   1  0 05697712 0 31892399 0 92547023 0 88360393 0 90015169  gt  runif 5   1  0 09374559 0 64406494 0 65817582 0 30179009 0 19760375  gt  set seed NULL   gt  sample 5   1  5 4 3 1 2  gt  sample 5   1  2 1 5 4 3  Below an example with set seed  gt  set seed 123   gt  rnorm 5   1  -0 56047565 -0 23017749  1 55870831  0 07050839  0 12928774  gt  set seed 123   gt  rnorm 5   1  -0 56047565 -0 23017749  1 55870831  0 07050839  0 12928774  gt  set seed 123   gt  runif 5   1  0 2875775 0 7883051 0 4089769 0 8830174 0 9404673  gt  set seed 123   gt  runif 5   1  0 2875775 0 7883051 0 4089769 0 8830174 0 9404673  gt  set seed 123   gt  sample 5   1  3 2 5 4 1  gt  set seed 123   gt  sample 5   1  3 2 5 4 1

User · Answer

Fixing the seed is essential when we try to optimize a function that involves randomly generated numbers  e g  in simulation based estimation   Loosely speaking  if we do not fix the seed  the variation due to drawing different random numbers will likely cause the optimization algorithm to fail   Suppose that  for some reason  you want to estimate the standard deviation  sd  of a mean-zero normal distribution by simulation  given a sample  This can be achieved by running a numerical optimization around steps    Setting the seed  Given a value for sd  generate normally distributed data Evaluate the likelihood of your data given the simulated distributions   The following functions do this  once without step 1   once including it     without fixing the seed simllh  lt - function sd  y  Ns     simdist  lt - density rnorm Ns  mean   0  sd   sd     llh  lt - sapply y  function x   simdist y which min  x - simdist x  2        return -sum log llh        same function with fixed seed simllh fix seed  lt - function sd y Ns     set seed 48    simdist  lt - density rnorm Ns mean 0 sd sd     llh  lt - sapply y function x  simdist y which min  x-simdist x  2       return -sum log llh        We can check the relative performance of the two functions in discovering the true parameter value with a short Monte Carlo study   N  lt - 20  sd  lt - 2   features of simulated data est1  lt - rep NA 1000   est2  lt - rep NA 1000    initialize the estimate stores for  i in 1 1000      as numeric Sys time   - gt  t  set seed  t - floor t     1e8 - gt  seed    set the seed to random seed   y  lt - rnorm N  sd   sd    generate the data   est1 i   lt - optim 1  simllh  y   y  Ns   1000  lower   0 01  par   est2 i   lt - optim 1  simllh fix seed  y   y  Ns   1000  lower   0 01  par   hist est1  hist est2    The resulting distributions of the parameter estimates are      When we fix the seed  the numerical search ends up close to the true parameter value of 2 far more often

User · Answer

basically set seed   function will help to reuse the same set of random variables    which we may need in future to again evaluate particular task again with same random varibales  we just need to declare it before using any random numbers generating function

User · Answer

Just adding some addition aspects  Need for setting seed  In the academic world  if one claims that his algorithm achieves  say 98 05  performance in one simulation  others need to be able to reproduce it    set seed   Going through the help file of this function  these are some interesting facts       1  set seed   returns NULL  invisible       2   Initially  there is no seed  a new one is created from the current time and the process ID when one is required  Hence different sessions will give different simulation results  by default  However  the seed might be restored from a previous session if a previously saved workspace is restored    this is why you would want to call set seed   with same integer values the next time you want a same sequence of random sequence

User · Answer

You have to set seed every time you want to get a reproducible random result   set seed 1  rnorm 4  set seed 1  rnorm 4

User · Answer

The need is the possible desire for reproducible results  which may for example come from trying to debug your program  or of course from trying to redo what it does   These two results we will  never  reproduce as I just asked for something  random    R gt  sample LETTERS  5   1   K   N   R   Z   G  R gt  sample LETTERS  5   1   L   P   J   E   D    These two  however  are identical because I set the seed   R gt  set seed 42   sample LETTERS  5   1   X   Z   G   T   O  R gt  set seed 42   sample LETTERS  5   1   X   Z   G   T   O  R gt     There is vast literature on all that  Wikipedia is a good start  In essence  these RNGs are called Pseudo Random Number Generators because they are in fact fully algorithmic  given the same seed  you get the same sequence   And that is a feature and not a bug

[r] Reasons for using the set.seed function

Examples related to r

Examples related to random