# Advanced vectorisation in in numpy

Martin McBride, 2018-09-15
Tags arrays data types vectorisation
Categories numpy

One of the key benefits of numpy is its ability to perform operations on an entire array with a single operator or function call.

This means that you avoid executing a relatively slow Python loop, and instead use numpy to execute the loop in optimised C code.

There are various ways to do this that cover most of the common looping scenarios you might meet when processing large arrays.

## Vectorised operators

Suppose we have two numpy arrays a and b, of equal size, and we want to create a new array c where:

c[i] = a[i] + b[i]    # for every i


ie, we want to do element wise addition. We could do this in a loop:

a = np.array([1, 2, 3, 4])
b = np.array([2, 4, 6, 8])
c = np.zeros_like(a)        #Zeros array same shape and type as a
for i in range(a.size):
c[i] = a[i] + b[i]


The problem here is that the for loop is being executed in Python. Now Python is fairly efficient, but it will usually be quite a bit slower than native code. And if the array is very large, that can be quite a performance hit.

With numpy you can just do this:

a = np.array([1, 2, 3, 4])
b = np.array([2, 4, 6, 8])
c = a + b    # [ 3  6  9 12]


numpy will then loop over arrays a and b, adding each pair of elements and placing the result in c. It will all be done in optimised, native code. Not only does this make the code easier to read, but it also makes it run faster.

You can create more complex expressions, like this:

a = np.array([1, 2, 3, 4])
b = np.array([2, 4, 6, 8])
c = np.array([1, 4, 9, 16])
x = a*b + 5                #[ 7 13 23 37]


All operators are supported, for example modulo can be used:

y = (a + c) % 7            #[2 6 5 6]


In the next case z is calculated using a comparison operator, so the result array contains boolean values:

z = b > c                  #[ True False False False]


The arrays must be of compatible shape. Usually this means that they need to be the same shape (in this case, a 1 dimensional array of 4 elements), but there are (broadcast rules)[/python-libraries/numpy/avoiding-loops/broadcasting/] that allow for other situations.

## Universal functions

Numpy also comes with a full set of mathematical functions that operate element-wise on arrays. In this case we use linspace to generate a set of time values t from 0 to 1, and then calculate an exponential decay s. This uses vectorised operators and the universal function np.exp:

t = np.linspace(0, 1, 11)
s = 3*np.exp(-t*4)


Giving:

[3.         2.01096014 1.34798689 0.90358264 0.60568955 0.40600585
0.27215386 0.18243019 0.12228661 0.08197117 0.05494692]


## Sums and averages

If you need to find the mean of all the elements in an array, you could use a loop like this:

a = np.array([1, 2, 3, 4])
sum = 0
for x in a:
sum += x

mean = sum / a.size


However, numpy provides a function np.mean that calculates the mean of an array without looping

Visit the PythonInformer Discussion Forum for numeric Python.

#### Tag cloud

2d arrays abstract data type alignment and array arrays bezier curve built-in function close closure colour comparison operator comprehension context conversion data types device space dictionary duck typing efficiency encryption enumerate filter font font style for loop function function plot functools generator gif gradient html image processing imagesurface immutable object index input installing iter iterator itertools lambda function len linspace list list comprehension logical operator lru_cache mandelbrot map mutability named parameter numeric python numpy object open operator optional parameter or path positional parameter print pure function radial gradient range recursion reduce rotation scaling sequence slice slicing sound spirograph str stream string subpath symmetric encryption template text text metrics transform translation transparency tuple unpacking user space vectorisation webserver website while loop zip