[r] Reasons for using the set.seed function

Many times I have seen the set.seed function in R, before starting the program. I know it's basically used for the random number generation. Is there any specific need to set this?

This question is related to r random

The answer is


set.seed is a base function that it is able to generate (every time you want) together other functions (rnorm, runif, sample) the same random value.

Below an example without set.seed

> set.seed(NULL)
> rnorm(5)
[1]  1.5982677 -2.2572974  2.3057461  0.5935456  0.1143519
> rnorm(5)
[1]  0.15135371  0.20266228  0.95084266  0.09319339 -1.11049182
> set.seed(NULL)
> runif(5)
[1] 0.05697712 0.31892399 0.92547023 0.88360393 0.90015169
> runif(5)
[1] 0.09374559 0.64406494 0.65817582 0.30179009 0.19760375
> set.seed(NULL)
> sample(5)
[1] 5 4 3 1 2
> sample(5)
[1] 2 1 5 4 3

Below an example with set.seed

> set.seed(123)
> rnorm(5)
[1] -0.56047565 -0.23017749  1.55870831  0.07050839  0.12928774
> set.seed(123)
> rnorm(5)
[1] -0.56047565 -0.23017749  1.55870831  0.07050839  0.12928774
> set.seed(123)
> runif(5)
[1] 0.2875775 0.7883051 0.4089769 0.8830174 0.9404673
> set.seed(123)
> runif(5)
[1] 0.2875775 0.7883051 0.4089769 0.8830174 0.9404673
> set.seed(123)
> sample(5)
[1] 3 2 5 4 1
> set.seed(123)
> sample(5)
[1] 3 2 5 4 1

Just adding some addition aspects. Need for setting seed: In the academic world, if one claims that his algorithm achieves, say 98.05% performance in one simulation, others need to be able to reproduce it.

?set.seed

Going through the help file of this function, these are some interesting facts:

(1) set.seed() returns NULL, invisible

(2) "Initially, there is no seed; a new one is created from the current time and the process ID when one is required. Hence different sessions will give different simulation results, by default. However, the seed might be restored from a previous session if a previously saved workspace is restored.", this is why you would want to call set.seed() with same integer values the next time you want a same sequence of random sequence.


Fixing the seed is essential when we try to optimize a function that involves randomly generated numbers (e.g. in simulation based estimation). Loosely speaking, if we do not fix the seed, the variation due to drawing different random numbers will likely cause the optimization algorithm to fail.

Suppose that, for some reason, you want to estimate the standard deviation (sd) of a mean-zero normal distribution by simulation, given a sample. This can be achieved by running a numerical optimization around steps

  1. (Setting the seed)
  2. Given a value for sd, generate normally distributed data
  3. Evaluate the likelihood of your data given the simulated distributions

The following functions do this, once without step 1., once including it:

# without fixing the seed
simllh <- function(sd, y, Ns){
  simdist <- density(rnorm(Ns, mean = 0, sd = sd))
  llh <- sapply(y, function(x){ simdist$y[which.min((x - simdist$x)^2)] })
  return(-sum(log(llh)))
}
# same function with fixed seed
simllh.fix.seed <- function(sd,y,Ns){
  set.seed(48)
  simdist <- density(rnorm(Ns,mean=0,sd=sd))
  llh <- sapply(y,function(x){simdist$y[which.min((x-simdist$x)^2)]})
  return(-sum(log(llh)))
}

We can check the relative performance of the two functions in discovering the true parameter value with a short Monte Carlo study:

N <- 20; sd <- 2 # features of simulated data
est1 <- rep(NA,1000); est2 <- rep(NA,1000) # initialize the estimate stores
for (i in 1:1000) {
  as.numeric(Sys.time())-> t; set.seed((t - floor(t)) * 1e8 -> seed) # set the seed to random seed
  y <- rnorm(N, sd = sd) # generate the data
  est1[i] <- optim(1, simllh, y = y, Ns = 1000, lower = 0.01)$par
  est2[i] <- optim(1, simllh.fix.seed, y = y, Ns = 1000, lower = 0.01)$par
}
hist(est1)
hist(est2)

The resulting distributions of the parameter estimates are:

Histogram of parameter estimates without fixing the seed Histogram of parameter estimates fixing the seed

When we fix the seed, the numerical search ends up close to the true parameter value of 2 far more often.


basically set.seed() function will help to reuse the same set of random variables , which we may need in future to again evaluate particular task again with same random varibales

we just need to declare it before using any random numbers generating function.


You have to set seed every time you want to get a reproducible random result.

set.seed(1)
rnorm(4)
set.seed(1)
rnorm(4)