[c++] Generating random integer from a range

I need a function which would generate a random integer in given range (including border values). I don't unreasonable quality/randomness requirements, I have four requirements:

  • I need it to be fast. My project needs to generate millions (or sometimes even tens of millions) of random numbers and my current generator function has proven to be a bottleneck.
  • I need it to be reasonably uniform (use of rand() is perfectly fine).
  • the min-max ranges can be anything from <0, 1> to <-32727, 32727>.
  • it has to be seedable.

I currently have following C++ code:

output = min + (rand() * (int)(max - min) / RAND_MAX)

The problem is, that it is not really uniform - max is returned only when rand() = RAND_MAX (for Visual C++ it is 1/32727). This is major issue for small ranges like <-1, 1>, where the last value is almost never returned.

So I grabbed pen and paper and came up with following formula (which builds on the (int)(n + 0.5) integer rounding trick):

enter image description here

But it still doesn't give me uniform distribution. Repeated runs with 10000 samples give me ratio of 37:50:13 for values values -1, 0. 1.

Could you please suggest better formula? (or even whole pseudo-random number generator function)

This question is related to c++ random

The answer is


Let's split the problem into two parts:

  • Generate a random number n in the range 0 through (max-min).
  • Add min to that number

The first part is obviously the hardest. Let's assume that the return value of rand() is perfectly uniform. Using modulo will add bias to the first (RAND_MAX + 1) % (max-min+1) numbers. So if we could magically change RAND_MAX to RAND_MAX - (RAND_MAX + 1) % (max-min+1), there would no longer be any bias.

It turns out that we can use this intuition if we are willing to allow pseudo-nondeterminism into the running time of our algorithm. Whenever rand() returns a number which is too large, we simply ask for another random number until we get one which is small enough.

The running time is now geometrically distributed, with expected value 1/p where p is the probability of getting a small enough number on the first try. Since RAND_MAX - (RAND_MAX + 1) % (max-min+1) is always less than (RAND_MAX + 1) / 2, we know that p > 1/2, so the expected number of iterations will always be less than two for any range. It should be possible to generate tens of millions of random numbers in less than a second on a standard CPU with this technique.

EDIT:

Although the above is technically correct, DSimon's answer is probably more useful in practice. You shouldn't implement this stuff yourself. I have seen a lot of implementations of rejection sampling and it is often very difficult to see if it's correct or not.


The formula for this is very simple, so try this expression,

 int num = (int) rand() % (max - min) + min;  
 //Where rand() returns a random number between 0.0 and 1.0

int RandU(int nMin, int nMax)
{
    return nMin + (int)((double)rand() / (RAND_MAX+1) * (nMax-nMin+1));
}

This is a mapping of 32768 integers to (nMax-nMin+1) integers. The mapping will be quite good if (nMax-nMin+1) is small (as in your requirement). Note however that if (nMax-nMin+1) is large, the mapping won't work (For example - you can't map 32768 values to 30000 values with equal probability). If such ranges are needed - you should use a 32-bit or 64-bit random source, instead of the 15-bit rand(), or ignore rand() results which are out-of-range.


If your compiler supports C++0x and using it is an option for you, then the new standard <random> header is likely to meet your needs. It has a high quality uniform_int_distribution which will accept minimum and maximum bounds (inclusive as you need), and you can choose among various random number generators to plug into that distribution.

Here is code that generates a million random ints uniformly distributed in [-57, 365]. I've used the new std <chrono> facilities to time it as you mentioned performance is a major concern for you.

#include <iostream>
#include <random>
#include <chrono>

int main()
{
    typedef std::chrono::high_resolution_clock Clock;
    typedef std::chrono::duration<double> sec;
    Clock::time_point t0 = Clock::now();
    const int N = 10000000;
    typedef std::minstd_rand G;
    G g;
    typedef std::uniform_int_distribution<> D;
    D d(-57, 365);
    int c = 0;
    for (int i = 0; i < N; ++i) 
        c += d(g);
    Clock::time_point t1 = Clock::now();
    std::cout << N/sec(t1-t0).count() << " random numbers per second.\n";
    return c;
}

For me (2.8 GHz Intel Core i5) this prints out:

2.10268e+07 random numbers per second.

You can seed the generator by passing in an int to its constructor:

    G g(seed);

If you later find that int doesn't cover the range you need for your distribution, this can be remedied by changing the uniform_int_distribution like so (e.g. to long long):

    typedef std::uniform_int_distribution<long long> D;

If you later find that the minstd_rand isn't a high enough quality generator, that can also easily be swapped out. E.g.:

    typedef std::mt19937 G;  // Now using mersenne_twister_engine

Having separate control over the random number generator, and the random distribution can be quite liberating.

I've also computed (not shown) the first 4 "moments" of this distribution (using minstd_rand) and compared them to the theoretical values in an attempt to quantify the quality of the distribution:

min = -57
max = 365
mean = 154.131
x_mean = 154
var = 14931.9
x_var = 14910.7
skew = -0.00197375
x_skew = 0
kurtosis = -1.20129
x_kurtosis = -1.20001

(The x_ prefix refers to "expected")


The simplest (and hence best) C++ (using the 2011 standard) answer is

#include <random>

std::random_device rd;     // only used once to initialise (seed) engine
std::mt19937 rng(rd());    // random-number engine used (Mersenne-Twister in this case)
std::uniform_int_distribution<int> uni(min,max); // guaranteed unbiased

auto random_integer = uni(rng);

No need to re-invent the wheel. No need to worry about bias. No need to worry about using time as random seed.


I recommend the Boost.Random library, it's super detailed and well-documented, lets you explicitly specify what distribution you want, and in non-cryptographic scenarios can actually outperform a typical C library rand implementation.


Notice that in most suggestions the initial random value that you have got from rand() function, which is typically from 0 to RAND_MAX, is simply wasted. You are creating only one random number out of it, while there is a sound procedure that can give you more.

Assume that you want [min,max] region of integer random numbers. We start from [0, max-min]

Take base b=max-min+1

Start from representing a number you got from rand() in base b.

That way you have got floor(log(b,RAND_MAX)) because each digit in base b, except possibly the last one, represents a random number in the range [0, max-min].

Of course the final shift to [min,max] is simple for each random number r+min.

int n = NUM_DIGIT-1;
while(n >= 0)
{
    r[n] = res % b;
    res -= r[n];
    res /= b;
    n--;
}

If NUM_DIGIT is the number of digit in base b that you can extract and that is

NUM_DIGIT = floor(log(b,RAND_MAX))

then the above is as a simple implementation of extracting NUM_DIGIT random numbers from 0 to b-1 out of one RAND_MAX random number providing b < RAND_MAX.


How about the Mersenne Twister? The boost implementation is rather easy to use and is well tested in many real-world applications. I've used it myself in several academic projects such as artificial intelligence and evolutionary algorithms.

Here's their example where they make a simple function to roll a six-sided die:

#include <boost/random/mersenne_twister.hpp>
#include <boost/random/uniform_int.hpp>
#include <boost/random/variate_generator.hpp>

boost::mt19937 gen;

int roll_die() {
    boost::uniform_int<> dist(1, 6);
    boost::variate_generator<boost::mt19937&, boost::uniform_int<> > die(gen, dist);
    return die();
}

Oh, and here's some more pimping of this generator just in case you aren't convinced you should use it over the vastly inferior rand():

The Mersenne Twister is a "random number" generator invented by Makoto Matsumoto and Takuji Nishimura; their website includes numerous implementations of the algorithm.

Essentially, the Mersenne Twister is a very large linear-feedback shift register. The algorithm operates on a 19,937 bit seed, stored in an 624-element array of 32-bit unsigned integers. The value 2^19937-1 is a Mersenne prime; the technique for manipulating the seed is based on an older "twisting" algorithm -- hence the name "Mersenne Twister".

An appealing aspect of the Mersenne Twister is its use of binary operations -- as opposed to time-consuming multiplication -- for generating numbers. The algorithm also has a very long period, and good granularity. It is both fast and effective for non-cryptographic applications.


The following expression should be unbiased if I am not mistaken:

std::floor( ( max - min + 1.0 ) * rand() ) + min;

I am assuming here that rand() gives you a random value in the range between 0.0 and 1.0 NOT including 1.0 and that max and min are integers with the condition that min < max.


assume min and max are int values, [ and ] means include this value, ( and ) means not include this value, using above to get the right value using c++ rand()

reference: for ()[] define, visit:

https://en.wikipedia.org/wiki/Interval_(mathematics)

for rand and srand function or RAND_MAX define, visit:

http://en.cppreference.com/w/cpp/numeric/random/rand

[min, max]

int randNum = rand() % (max - min + 1) + min

(min, max]

int randNum = rand() % (max - min) + min + 1

[min, max)

int randNum = rand() % (max - min) + min

(min, max)

int randNum = rand() % (max - min - 1) + min + 1

Here is an unbiased version that generates numbers in [low, high]:

int r;
do {
  r = rand();
} while (r < ((unsigned int)(RAND_MAX) + 1) % (high + 1 - low));
return r % (high + 1 - low) + low;

If your range is reasonably small, there is no reason to cache the right-hand side of the comparison in the do loop.


In this thread rejection sampling was already discussed, but I wanted to suggest one optimization based on the fact that rand() % 2^something does not introduce any bias as already mentioned above.

The algorithm is really simple:

  • calculate the smallest power of 2 greater than the interval length
  • randomize one number in that "new" interval
  • return that number if it is less than the length of the original interval
    • reject otherwise

Here's my sample code:

int randInInterval(int min, int max) {
    int intervalLen = max - min + 1;
    //now calculate the smallest power of 2 that is >= than `intervalLen`
    int ceilingPowerOf2 = pow(2, ceil(log2(intervalLen)));

    int randomNumber = rand() % ceilingPowerOf2; //this is "as uniform as rand()"

    if (randomNumber < intervalLen)
        return min + randomNumber;      //ok!
    return randInInterval(min, max);    //reject sample and try again
} 

This works well especially for small intervals, because the power of 2 will be "nearer" to the real interval length, and so the number of misses will be smaller.

PS
Obviously avoiding the recursion would be more efficient (no need to calculate over and over the log ceiling..) but I thought it was more readable for this example.