Martin McBride, 2021-02-27

Tags arrays, data types, vectorisation

Categories numpy

In section Python libraries

Vectorisation is the secret sauce of NumPy. It allows you to perform element-wise operations on NumPy arrays without using Python loops. Behind the scenes, the processing is done by optimised C code.

This can allow many array operations to be written in simple Python but execute almost as fast as C code. This can make a huge difference when processing a large array, such as an array of image data.

Here is a simple example of vectorisation:

import numpy as np a = np.array([1.1, 3.6, 4.0, 8.2]) b = a + 1

We first create an array with content `[1.1, 3.6, 4.0, 8.2]`

. Then we execute:

b = a + 1

Now because `a`

is a NumPy array, Python uses the NumPy version of the `+`

operator. This operator adds 1 to each element in the array, resulting in this:

b = [2.1 4.6 5. 9.2]

That is vectorisation in a nutshell. You can apply pretty much any Python maths operator to an array, and it will automatically be applied to every element of that array, at lightning speed!

Here is the equivalent code to do a similar thing with a Python list:

a = [1.1, 3.6, 4.0, 8.2] b = [] for x in a: b.append(x + 1)

Or if you prefer to use a list comprehension:

a = [1.1, 3.6, 4.0, 8.2] b = [x + 1 for x in a]

As you can see, not only is the NumPy version faster, it is also shorter and more readable!

In this case we will use the `dtype`

parameter to create an array of 16 bit integers:

a = np.array([1, 3, 4, 8], dtype=np.int16) b = a * 2

This time we are multiplying `a`

by 2. Again, we are using the NumPy `*`

operator. If `a`

was a list, the multiply operator would do something very different of course! But for a NumPy array, it simple doubles every element:

[ 2 6 8 16]

If you check the `dtype`

of `b`

, you will find it is also `int16`

. The new array takes the type of the original array.

In fact, that isn't quite true. Similar to normal Python arithmetic, ints can be converted to floats automatically when required. So for example:

a = np.array([1, 3, 4, 8], dtype=np.int16) b = a * 2.1

Because we are multiplying an int by a float, the result is automatically a float, so we get this:

[ 2.1 6.3 8.4 16.8]

Which is a float array.

We can apply vectorisation to a multi-dimensional array. NumPy just applies the operation to every element in the array, we don't need to do anything special:

a = np.array([[7, 5, 3], [2, 4, 6]], dtype=np.int16) b = a*2 - 5

This gives:

[[ 9 5 1] [-1 3 7]]

Notice also that we have used a compound expression - we are multiplying `a`

by 2, then subtracting 5.

You can use more than one array in a NumPy expression. For example:

a = np.array([1, 2, 3, 4]) b = np.array([10, 20, 30, 40]) c = a + b

This performs an element by element addition of the two arrays `a`

and `b`

, to create the result in `c`

. This gives:

c = [11 22 33 44]

This result is obtained by adding each corresponding element in `a`

and `b`

:

The first element of each array (1 and 10) add to give the first element of the result (11). The second element of each array (2 and 20) add to give the second element of the result (22). And so on.

Arrays must have *compatible shapes* to be combined in this way. Two arrays that have the exact same shape will always be compatible. However, NumPy arrays also support *broadcasting*. This allows two arrays of different shapes to be matched under specific circumstances, by replicating elements to make them the same shape. This is covered in a later chapter.

You can use multi-dimensional arrays in a NumPy expression. For example here we use 2 by 2 arrays:

a = np.array([[1, 2], [3, 4]]) b = np.array([[5, 6], [7, 8]]) c = a * b

This performs an element by element multiplication of the two arrays `a`

and `b`

, to create the result in `c`

. This gives:

c = [[ 5 12] [21 32]]

This result is obtained by multiplying each corresponding element in `a`

and `b`

:

The element [0, 0] of each array (1 and 5) multiply to give the element [0, 0] of the result (5). The element [0, 1] of each array (2 and 6) multiply to give the element [0, 1] of the result (12). And so on.

If you are familiar with *matrix multiplication* note that NumPy array multiplication **doesn't** work in the same way. You can use the NumPy `dot`

function to perform matrix multiplication, but the `*`

operator always performs element by element multiplication.

You can, of course, use expressions that include multiple arrays, for example:

a = np.array([1, 2, 3, 4]) b = np.array([5, 6, 7, 8]) c = np.array([10, 11, 12, 13]) d = np.array([14, 15, 16, 17]) e = a * b + c**2 + c**3 +2*d

As always, the arrays must be compatible, and broadcasting can be used.

You can vectorise conditional operators:

a = np.array([1, 6, 9, 4, 2, 8, 7]) b = a > 5

What will this give us? Well a regular conditional expression returns a `bool`

value, so a vectorised conditional expression will give us an array of NumPy bools:

b = [False True True False False True True]

The array `b`

is true for every element of `a`

that is greater than 5, false otherwise.

You cannot use `and`

or `or`

directly with a numpy array. This is to say, the following are not allowed:

a = np.array([1, 2, 3, 4, 5, 6]) b = a > 2 and a < 5 # Not allowed c = a < 3 or a > 4 # Not allowed

Instead you must use special NumPy *universal functions*. For example `np.logical_and`

is used in place of `and`

. This is covered further the universal functions article.

Visit the PythonInformer Discussion Forum for numeric Python.

*If you found this article useful, you might be interested in the book NumPy Recipes, or other books, by the same author.*

Copyright (c) Axlesoft Ltd 2020