Original post

The other day, while playing with a simple program involving randomness, I

noticed something strange. Python’s `random.randint()` function feels quite

slow, in comparison to other randomness-generating functions. Since

`randint()` is the canonical answer for “give me a random integer” in Python,

I decided to dig deeper to understand what’s going on.

This is a brief post that dives into the implementation of the `random`

module, and discusses some alternative methods for generating pseudo-random

integers.

First, a basic benchmark (Python 3.6):

```
$ python3 -m timeit -s 'import random' 'random.random()'
10000000 loops, best of 3: 0.0523 usec per loop
$ python3 -m timeit -s 'import random' 'random.randint(0, 128)'
1000000 loops, best of 3: 1.09 usec per loop
```

Whoa! It’s about 20x more expensive to generate a random integer in the range

`[0, 128)` than to generate a random float in the range `[0, 1)`. That’s

pretty steep, indeed.

To understand why `randint()` is so slow, we’ll have to dig into the Python

source. Let’s start with `random()`

[1]. In `Lib/random.py`, the exported function `random` is an alias to the

`random` method of the class `Random`, which inherits this method directly

from `_Random`. This is the C companion defined in

`Modules/_randommodule.c`, and it defines its `random` method as follows:

```
static PyObject *
random_random(RandomObject *self, PyObject *Py_UNUSED(ignored))
{
uint32_t a=genrand_int32(self)>>5, b=genrand_int32(self)>>6;
return PyFloat_FromDouble((a*67108864.0+b)*(1.0/9007199254740992.0));
}
```

Where `getrand_int32` is defined directly above and implements a step of the

Mersenne Twister PRNG. All

in all, when we call `random.random()` in Python, the C function is directly

invoked and there’s not much extra work done beyond converting the result of

`genrand_int32` to a floating point number in a line of C.

Now let’s take a look at what `randint()` is up to:

```
def randint(self, a, b):
"""Return random integer in range [a, b], including both end points.
"""
return self.randrange(a, b+1)
```

It calls `randrange`, fair enough. Here it is:

```
def randrange(self, start, stop=None, step=1, _int=int):
"""Choose a random item from range(start, stop[, step]).
This fixes the problem with randint() which includes the
endpoint; in Python this is usually not what you want.
"""
# This code is a bit messy to make it fast for the
# common case while still doing adequate error checking.
istart = _int(start)
if istart != start:
raise ValueError("non-integer arg 1 for randrange()")
if stop is None:
if istart > 0:
return self._randbelow(istart)
raise ValueError("empty range for randrange()")
# stop argument supplied.
istop = _int(stop)
if istop != stop:
raise ValueError("non-integer stop for randrange()")
width = istop - istart
if step == 1 and width > 0:
return istart + self._randbelow(width)
if step == 1:
raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width))
# Non-unit step argument supplied.
istep = _int(step)
if istep != step:
raise ValueError("non-integer step for randrange()")
if istep > 0:
n = (width + istep - 1) // istep
elif istep < 0:
n = (width + istep + 1) // istep
else:
raise ValueError("zero step for randrange()")
if n <= 0:
raise ValueError("empty range for randrange()")
return istart + istep*self._randbelow(n)
```

That’s quite a bit of case checking and setting up parameters before we get to

the next level. There are a couple of fast-path cases (for example, when the

`stop` parameter is not supplied, this function will be a bit faster),

but overall after a bunch of checking we get to call the `_randbelow()`

method.

By default, `_randbelow()` gets mapped to `_randbelow_with_getrandbits()`:

```
def _randbelow_with_getrandbits(self, n):
"Return a random int in the range [0,n). Raises ValueError if n==0."
getrandbits = self.getrandbits
k = n.bit_length() # don't use (n-1) here because n can be 1
r = getrandbits(k) # 0 <= r < 2**k
while r >= n:
r = getrandbits(k)
return r
```

Note that it does a couple more computations and can end up invoking

`getrandbits()` multiple times (esp. if `n` is far from a power of two).

`getrandbits()` is in C, and while it also ends up invoking the PRNG

`getrand_int32()`, it’s somewhat heavier than `random()` and runs twice as

slow.

In other words, there’s a lot of Python and C code in the way to invoke the same

underlying C function. Since Python is bytecode-interpreted, all of this ends up

being quite a bit slower than simply calling the C function directly. A death

by a thousand cuts. To be fair, `randint()` is also more flexible in that it

can generate pseudo-random numbers of any size; that said, it’s not very common

to need huge pseudo-random numbers, and our tests were with small numbers

anyway.

Here’s a couple of experiments to help us test this hypothesis. First, let’s try

to hit the fast-path we’ve seen above in `randrange`, by calling `randrange`

without a `stop` parameter:

```
$ python3 -m timeit -s 'import random' 'random.randrange(1)'
1000000 loops, best of 3: 0.784 usec per loop
```

As expected, the run-time is somewhat better than `randint`. Another

experiment is to rerun the comparison in PyPy, which is a JIT compiler that

should end up tracing through the Python code and generating efficient machine

code that strips a lot of abstractions.

```
$ pypy -m timeit -s 'import random' 'random.random()'
100000000 loops, best of 3: 0.0139 usec per loop
$ pypy -m timeit -s 'import random' 'random.randint(0, 128)'
100000000 loops, best of 3: 0.0168 usec per loop
```

As expected, the difference between these calls in PyPy is small.

## Faster methods for generating pseudo-random integers

So `randint()` turns out to be very slow. In most cases, no one cares; but

just occasionally, we need many random numbers – so what is there to do?

One tried and true trick is just using `random.random()` instead, multiplying

by our integer limit:

```
$ python3 -m timeit -s 'import random' 'int(128 * random.random())'
10000000 loops, best of 3: 0.193 usec per loop
```

This gives us pseudo-random integers in the range `[0, 128)`, much faster. One

word of caution: Python represents its floats in double-precision, with 53 bits

of accuracy. When the limit is above 53 bits, the numbers we’ll be getting using

this method are not quite random – bits will be missing. This is rarely a

problem because we don’t usually need such huge integers, but definitely

something to keep in mind [2].

Another quick way to generate pseudo-random integers is to use `getrandbits()`

directly:

```
$ python3 -m timeit -s 'import random' 'random.getrandbits(7)'
10000000 loops, best of 3: 0.102 usec per loop
```

This method is fast but limited – it only supports ranges that are powers of

two. If we want to limit the range we can’t just compute a modulo – this will

skew the distribution; rather we’ll have to use a loop similarly to what

`_randbelow_with_getrandbits()` does in the sample above. This will slow

things down, of course.

Finally, we can turn away from the `random` module altogether, and use Numpy:

```
$ python3 -m timeit -s 'import numpy.random' 'numpy.random.randint(128)'
1000000 loops, best of 3: 1.21 usec per loop
```

Surprisingly, this is slow; that’s because Numpy isn’t great for working with

single datums – it likes to amortize costs over large arrays created /

manipulated in C. To see this in action, let’s see how long it takes to generate

100 random integers:

```
$ python3 -m timeit -s 'import numpy.random' 'numpy.random.randint(128, size=100)'
1000000 loops, best of 3: 1.91 usec per loop
```

Only 60% slower than generating a single one! With 0.019 usec per integer, this

is the fastest method *by far* – 3 times faster than calling

`random.random()`. The reason this method is so fast is because the Python

call overheads are amortized over all generated integers, and deep inside Numpy

runs an efficient C loop to generate them.

To conclude, use Numpy if you want to generate large numbers of random ints; if

you’re just generating one-at-a-time, it may not be as useful (but then how much

do you care about performance, really?)

## Conclusion: performance vs. abstraction

In programming, performance and abstraction/flexibility are often at odds. By

making a certain function more flexible, we inevitably make it slower – and

`randint()` is a great practical example of this problem. 9 times out of 10

we don’t care about the performance of these functions, but when we do, it’s

useful to know what’s going on and how to improve the situation.

In a way, pure Python code itself is one of the slowest abstractions we

encounter, since every line gets translated to a bunch of bytecode that

has to be interpreted at run-time.

To mitigate these effects, Python programmers who care about performance have

many techniques at their disposal. Libraries like Numpy carefully move as much

compute as possible to underlying C code; PyPy is a JIT compiler that can speed

up most pure Python code sequences considerably. Numba is somewhere in between,

while Cython lets us re-write chosen functions in a restricted subset of Python

that can be efficiently compiled to machine code.

[1] | From this point on, file names point to source files in the CPython repository. Feel free to follow along on your machine or on Github. |

[2] |
As an experiment, try to generate pseudo-random integers up to 2^54 More generally, the closer the multiplier is to machine precision, the |