We are using the convention rand(n) -> [0, n - 1]
here
From many of the answer I read, they provide either uniformity or halt guarantee, but not both (adam rosenfeld second answer might).
It is, however, possible to do so. We basically have this distribution:
This leaves us a hole in the distribution over [0-6]
: 5 and 6 have no
probability of ocurrence. Imagine now we try to fill the hole it by shifting the
probability distribution and summing.
Indeed, we can the initial distribution with itself shifted by one, and repeating by summing the obtained distribution with the initial one shifted by two, then three and so on, until 7, not included (we covered the whole range). This is shown on the following figure. The order of the colors, corresponding to the steps, is blue -> green -> cyan -> white -> magenta -> yellow -> red.
Because each slot is covered by 5 of the 7 shifted distributions (shift varies from
0 to 6), and because we assume the random numbers are independent from one
ran5()
call to another, we obtain
p(x) = 5 / 35 = 1 / 7 for all x in [0, 6]
This means that, given 7 independent random numbers from ran5()
, we can
compute a random number with uniform probability in the [0-6]
range.
In fact, the ran5() probability
distribution does not even need to be uniform, as long as the samples are
independent (so the distribution stays the same from trial to trial).
Also, this is valid for other numbers than 5 and 7.
This gives us the following python function:
def rand_range_transform(rands):
"""
returns a uniform random number in [0, len(rands) - 1]
if all r in rands are independent random numbers from the same uniform distribution
"""
return sum((x + i) for i, x in enumerate(rands)) % len(rands) # a single modulo outside the sum is enough in modulo arithmetic
This can be used like this:
rand5 = lambda : random.randrange(5)
def rand7():
return rand_range_transform([rand5() for _ in range(7)])
If we call rand7()
70000 times, we can get:
max: 6 min: 0 mean: 2.99711428571 std: 2.00194697049
0: 10019
1: 10016
2: 10071
3: 10044
4: 9775
5: 10042
6: 10033
This is good, although far from perfect. The fact is, one of our assumption is most likely false in this implementation: we use a PRNG, and as such, the result of the next call is dependent from the last result.
That said, using a truly random source of numbers, the output should also be truly random. And this algorithm terminates in every case.
But this comes with a cost: we need 7 calls to rand5()
for a single rand7()
call.