Python informer

Improve your Python coding skills

Advanced vectorisation in in numpy

This article is part of a series on numpy.

One of the key benefits of numpy is its ability to perform operations on an entire array with a single operator or function call.

This means that you avoid executing a relatively slow Python loop, and instead use numpy to execute the loop in optimised C code.

There are various ways to do this that cover most of the common looping scenarios you might meet when processing large arrays.

Vectorised operators

Suppose we have two numpy arrays a and b, of equal size, and we want to create a new array c where:

c[i] = a[i] + b[i]    # for every i

ie, we want to do element wise addition. We could do this in a loop:

a = np.array([1, 2, 3, 4])
b = np.array([2, 4, 6, 8])
c = np.zeros_like(a)        #Zeros array same shape and type as a
for i in range(a.size):
    c[i] = a[i] + b[i]

The problem here is that the for loop is being executed in Python. Now Python is fairly efficient, but it will usually be quite a bit slower than native code. And if the array is very large, that can be quite a performance hit.

With numpy you can just do this:

a = np.array([1, 2, 3, 4])
b = np.array([2, 4, 6, 8])
c = a + b    # [ 3  6  9 12]

numpy will then loop over arrays a and b, adding each pair of elements and placing the result in c. It will all be done in optimised, native code. Not only does this make the code easier to read, but it also makes it run faster.

You can create more complex expressions, like this:

a = np.array([1, 2, 3, 4])
b = np.array([2, 4, 6, 8])
c = np.array([1, 4, 9, 16])
x = a*b + 5                #[ 7 13 23 37]

All operators are supported, for example modulo can be used:

y = (a + c) % 7            #[2 6 5 6]

In the next case z is calculated using a comparison operator, so the result array contains boolean values:

z = b > c                  #[ True False False False]

The arrays must be of compatible shape. Usually this means that they need to be the same shape (in this case, a 1 dimensional array of 4 elements), but there are (broadcast rules)[/python-libraries/numpy/avoiding-loops/broadcasting/] that allow for other situations.

Universal functions

Numpy also comes with a full set of mathematical functions that operate element-wise on arrays. In this case we use linspace to generate a set of time values t from 0 to 1, and then calculate an exponential decay s. This uses vectorised operators and the universal function np.exp:

t = np.linspace(0, 1, 11)
s = 3*np.exp(-t*4)

Giving:

[3.         2.01096014 1.34798689 0.90358264 0.60568955 0.40600585
 0.27215386 0.18243019 0.12228661 0.08197117 0.05494692]

Sums and averages

If you need to find the mean of all the elements in an array, you could use a loop like this:

a = np.array([1, 2, 3, 4])
sum = 0
for x in a:
    sum += x

mean = sum / a.size

However, numpy provides a function np.mean that calculates the mean of an array without looping

Visit the PythonInformer Discussion Forum for numeric Python.