Advanced vectorisation in numpy

Martin McBride, 2021-02-27
Tags arrays where fancy indexing vectorisation
Categories numpy

Here are some additional vectorisation techniques.

The where parameter

All universal functions have a where parameter.

The where parameter takes an array-like set of booleans. It applies the function only where the array is true. For example:

a = np.array([-1, 2, -3, 4])
t = a>0
b = np.sqrt(a, where=t)

Here we first create an array t that is true wherever the value in a is greater than 0. That is:

t = [False  True False  True]

When we calculate the square root, the calculation is only performed where t is true, so we don't attempt to calculate negative square roots. This means that in the output array b, only elements 1 and 3 are assigned values.

b = [?? 1.41421356 ?? 2.]

You can visualise it like this:

The np.sqrt function takes the input array a, and the where array t. It uses t as a filter, so that it only calculates the square root and writes teh result into b for those indices where t is true.

Here is the equivalent code using a loop:

a = np.array([-1, 2, -3, 4])
t = a>0
b = np.empty_like(a)
for i in range(a.size):
    if i[i]:
        b[i] = sqrt(a[i])

Using the where parameter allows us to perform the same functionality as the loop, but with the speed of a vectorised function.

If you run this example yourself, elements 0 and 2 (shown as ??) are uninitialised. They could be anything, maybe a crazy large floating point number, maybe something that looks sensible like 1. But even if it looks sensible, it is just whatever happened to be in memory, so it is actually meaningless.

We can avoid this randomness by creating an output array that is initialised to a particular value (for example using full), and then use out to update that array.

a = np.array([-1, 2, -3, 4])
b = np.full(4, -1.0)
np.sqrt(a, where=a>0, out=b)

This initialised b to -1, so that any elements that are left unchanged by where will remain set to -1:

b = [-1.          1.41421356 -1.          2.        ]

Fancy indexing

Fancy indexing allows you to use one array as an index into another array. It can be used to pick values from an array, and also to implement table lookup algorithms.

Here is a simple example:

a = np.array([1, 3, 5, 7, 9])
idx = np.array([1, 1, 2, 3, 3, 4])

b = a[idx]

Here the array a is a set of values for testing.

The array idx contains values that act as indexes into the array a. So value 1 indexes element 1 in a (which has the value 3), etc.

Now look at the final line:

b = a[idx]

Normally a[n] fetches the nth element from a. But because we are passing in an array, idx, the statement fetches an array of elements.

idx has values 1, 1, 2, 3, 3, 4. If we look at b it contains:

b = [3 3 5 7 7 9]

The array has been filled with element 1 from a (which is 3), then element 1 again, then element 2 from a (which is 5), then element 3 from a (which is 7), and so on.

This is called fancy indexing. It selects values from a based on the index values in idx.

Notice that the length of b is the same as the length of idx, rather than a. In fact, b will always be the same shape as idx. For example:

a = np.array([1, 3, 5, 7, 9])
idx = np.array([0, 4, 2, 3])

b = a[idx]

Here, b will be an array of length 6 (the same as idx). The first element of b will be set to element 0 from a. The second element of b will be set to element 4 from a, and so on. The result is:

b = [1 9, 5 7]

Notice that the shape of b is controlled by the shape of idx, as this diagram shows:

What is happening, in effect, is that the array idx is being passed through a lookup table (array a) and then written out to array b. So b will always be the same shape as idx.

Here is the equivalent code using a loop:

a = np.array([1, 3, 5, 7, 9])
idx = np.array([0, 4, 2, 3])
b = np.empty_like(idx)
for i in range(idx.size):
    b[i] = a[idx[i]]

The fancy indexing version is generally far more efficient than the loop.

Visit the PythonInformer Discussion Forum for numeric Python.

If you found this article useful, you might be interested in the book NumPy Recipes or other books by the same author.


Popular tags

2d arrays abstract data type alignment and animation arc array arrays behavioural pattern bezier curve built-in function callable object circle classes close closure cmyk colour comparison operator comprehension context context manager conversion creational pattern data types design pattern device space dictionary drawing duck typing efficiency else encryption enumerate fill filter font font style for loop function function composition function plot functools game development generativepy tutorial generator geometry gif gradient greyscale higher order function hsl html image image processing imagesurface immutable object index inner function input installing iter iterable iterator itertools l system lambda function len line linspace list list comprehension logical operator lru_cache magic method mandelbrot mandelbrot set map monad mutability named parameter numeric python numpy object open operator optional parameter or partial application path polygon positional parameter print pure function pycairo radial gradient range recipes rectangle recursion reduce rgb rotation scaling sector segment sequence singleton slice slicing sound spirograph sprite square str strategy stream string stroke structural pattern subpath symmetric encryption template text text metrics tinkerbell fractal transform translation transparency tuple turtle unpacking user space vectorisation webserver website while loop zip