[python] What does numpy.random.seed(0) do?

What does np.random.seed do in the below code from a Scikit-Learn tutorial? I'm not very familiar with NumPy's random state generator stuff, so I'd really appreciate a layman's terms explanation of this.

np.random.seed(0)
indices = np.random.permutation(len(iris_X))

This question is related to python numpy

The answer is


Imagine you are showing someone how to code something with a bunch of "random" numbers. By using numpy seed they can use the same seed number and get the same set of "random" numbers.

So it's not exactly random because an algorithm spits out the numbers but it looks like a randomly generated bunch.


I hope to give a really short answer:

seed make (the next series) random numbers predictable. You can think every time after you call seed, it pre-defines series numbers and numpy random keeps the iterator of it, then every time you get a random number it just gonna call get next.

e.g.:

np.random.seed(2)
np.random.randn(2) # array([-0.41675785, -0.05626683])
np.random.randn(1) # array([-1.24528809])

np.random.seed(2)
np.random.randn(1) # array([-0.41675785])
np.random.randn(2) # array([-0.05626683, -1.24528809])

You can notice when I set the same seed, no matter how many random number you request from numpy each time, it always gives the same series of numbers, in this case which is array([-0.41675785, -0.05626683, -1.24528809]).


A random seed specifies the start point when a computer generates a random number sequence.

For example, let’s say you wanted to generate a random number in Excel (Note: Excel sets a limit of 9999 for the seed). If you enter a number into the Random Seed box during the process, you’ll be able to use the same set of random numbers again. If you typed “77” into the box, and typed “77” the next time you run the random number generator, Excel will display that same set of random numbers. If you type “99”, you’ll get an entirely different set of numbers. But if you revert back to a seed of 77, then you’ll get the same set of random numbers you started with.

For example, “take a number x, add 900 +x, then subtract 52.” In order for the process to start, you have to specify a starting number, x (the seed). Let’s take the starting number 77:

Add 900 + 77 = 977 Subtract 52 = 925 Following the same algorithm, the second “random” number would be:

900 + 925 = 1825 Subtract 52 = 1773 This simple example follows a pattern, but the algorithms behind computer number generation are much more complicated


There is a nice explanation in Numpy docs: https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.RandomState.html it refers to Mersenne Twister pseudo-random number generator. More details on the algorithm here: https://en.wikipedia.org/wiki/Mersenne_Twister


If you set the np.random.seed(a_fixed_number) every time you call the numpy's other random function, the result will be the same:

>>> import numpy as np
>>> np.random.seed(0) 
>>> perm = np.random.permutation(10) 
>>> print perm 
[2 8 4 9 1 6 7 3 0 5]
>>> np.random.seed(0) 
>>> print np.random.permutation(10) 
[2 8 4 9 1 6 7 3 0 5]
>>> np.random.seed(0) 
>>> print np.random.permutation(10) 
[2 8 4 9 1 6 7 3 0 5]
>>> np.random.seed(0) 
>>> print np.random.permutation(10) 
[2 8 4 9 1 6 7 3 0 5]
>>> np.random.seed(0) 
>>> print np.random.rand(4) 
[0.5488135  0.71518937 0.60276338 0.54488318]
>>> np.random.seed(0) 
>>> print np.random.rand(4) 
[0.5488135  0.71518937 0.60276338 0.54488318]

However, if you just call it once and use various random functions, the results will still be different:

>>> import numpy as np
>>> np.random.seed(0) 
>>> perm = np.random.permutation(10)
>>> print perm 
[2 8 4 9 1 6 7 3 0 5]
>>> np.random.seed(0) 
>>> print np.random.permutation(10)
[2 8 4 9 1 6 7 3 0 5]
>>> print np.random.permutation(10) 
[3 5 1 2 9 8 0 6 7 4]
>>> print np.random.permutation(10) 
[2 3 8 4 5 1 0 6 9 7]
>>> print np.random.rand(4) 
[0.64817187 0.36824154 0.95715516 0.14035078]
>>> print np.random.rand(4) 
[0.87008726 0.47360805 0.80091075 0.52047748]

numpy.random.seed(0)
numpy.random.randint(10, size=5)

This produces the following output: array([5, 0, 3, 3, 7]) Again,if we run the same code we will get the same result.

Now if we change the seed value 0 to 1 or others:

numpy.random.seed(1)
numpy.random.randint(10, size=5)

This produces the following output: array([5 8 9 5 0]) but now the output not the same like above.


I have used this very often in neural networks. It is well known that when we start training a neural network we randomly initialise the weights. The model is trained on these weights on a particular dataset. After number of epochs you get trained set of weights.

Now suppose you want to again train from scratch or you want to pass the model to others to reproduce your results, the weights will be again initialised to a random numbers which mostly will be different from earlier ones. The obtained trained weights after same number of epochs ( keeping same data and other parameters ) as earlier one will differ. The problem is your model is no more reproducible that is every time you train your model from scratch it provides you different sets of weights. This is because the model is being initialized by different random numbers every time.

What if every time you start training from scratch the model is initialised to the same set of random initialise weights? In this case your model could become reproducible. This is achieved by numpy.random.seed(0). By mentioning seed() to a particular number, you are hanging on to same set of random numbers always.


As noted, numpy.random.seed(0) sets the random seed to 0, so the pseudo random numbers you get from random will start from the same point. This can be good for debuging in some cases. HOWEVER, after some reading, this seems to be the wrong way to go at it, if you have threads because it is not thread safe.

from differences-between-numpy-random-and-random-random-in-python:

For numpy.random.seed(), the main difficulty is that it is not thread-safe - that is, it's not safe to use if you have many different threads of execution, because it's not guaranteed to work if two different threads are executing the function at the same time. If you're not using threads, and if you can reasonably expect that you won't need to rewrite your program this way in the future, numpy.random.seed() should be fine for testing purposes. If there's any reason to suspect that you may need threads in the future, it's much safer in the long run to do as suggested, and to make a local instance of the numpy.random.Random class. As far as I can tell, random.random.seed() is thread-safe (or at least, I haven't found any evidence to the contrary).

example of how to go about this:

from numpy.random import RandomState
prng = RandomState()
print prng.permutation(10)
prng = RandomState()
print prng.permutation(10)
prng = RandomState(42)
print prng.permutation(10)
prng = RandomState(42)
print prng.permutation(10)

may give:

[3 0 4 6 8 2 1 9 7 5]

[1 6 9 0 2 7 8 3 5 4]

[8 1 5 0 7 2 9 4 3 6]

[8 1 5 0 7 2 9 4 3 6]

Lastly, note that there might be cases where initializing to 0 (as opposed to a seed that has not all bits 0) may result to non-uniform distributions for some few first iterations because of the way xor works, but this depends on the algorithm, and is beyond my current worries and the scope of this question.


All the random numbers generated after setting particular seed value are same across all the platforms/systems.


All the answers above show the implementation of np.random.seed() in code. I'll try my best to explain briefly why it actually happens. Computers are machines that are designed based on predefined algorithms. Any output from a computer is the result of the algorithm implemented on the input. So when we request a computer to generate random numbers, sure they are random but the computer did not just come up with them randomly!

So when we write np.random.seed(any_number_here) the algorithm will output a particular set of numbers that is unique to the argument any_number_here. It's almost like a particular set of random numbers can be obtained if we pass the correct argument. But this will require us to know about how the algorithm works which is quite tedious.

So, for example if I write np.random.seed(10) the particular set of numbers that I obtain will remain the same even if I execute the same line after 10 years unless the algorithm changes.