Vectorisation in numpy

By Martin McBride, 2021-02-27

Tags: arrays data types vectorisation
Categories: numpy

Vectorisation is the secret sauce of NumPy. It allows you to perform element-wise operations on NumPy arrays without using Python loops. Behind the scenes, the processing is done by optimised C code.

This can allow many array operations to be written in simple Python but execute almost as fast as C code. This can make a huge difference when processing a large array, such as an array of image data.

Performing simple maths on an array

Here is a simple example of vectorisation:

import numpy as np

a = np.array([1.1, 3.6, 4.0, 8.2])
b = a + 1

We first create an array with content [1.1, 3.6, 4.0, 8.2]. Then we execute:

b = a + 1

Now because a is a NumPy array, Python uses the NumPy version of the + operator. This operator adds 1 to each element in the array, resulting in this:

b = [2.1 4.6 5.  9.2]

That is vectorisation in a nutshell. You can apply pretty much any Python maths operator to an array, and it will automatically be applied to every element of that array, at lightning speed!

Here is the equivalent code to do a similar thing with a Python list:

a = [1.1, 3.6, 4.0, 8.2]
b = []

for x in a:
    b.append(x + 1)

Or if you prefer to use a list comprehension:

a = [1.1, 3.6, 4.0, 8.2]
b = [x + 1 for x in a]

As you can see, not only is the NumPy version faster, it is also shorter and more readable!

Vectorisation with other data types

In this case we will use the dtype parameter to create an array of 16 bit integers:

a = np.array([1, 3, 4, 8], dtype=np.int16)
b = a * 2

This time we are multiplying a by 2. Again, we are using the NumPy * operator. If a was a list, the multiply operator would do something very different of course! But for a NumPy array, it simple doubles every element:

[ 2  6  8 16]

If you check the dtype of b, you will find it is also int16. The new array takes the type of the original array.

In fact, that isn't quite true. Similar to normal Python arithmetic, ints can be converted to floats automatically when required. So for example:

a = np.array([1, 3, 4, 8], dtype=np.int16)
b = a * 2.1

Because we are multiplying an int by a float, the result is automatically a float, so we get this:

[ 2.1  6.3  8.4 16.8]

Which is a float array.

Vectorisation with multi-dimensional arrays

We can apply vectorisation to a multi-dimensional array. NumPy just applies the operation to every element in the array, we don't need to do anything special:

a = np.array([[7, 5, 3],
              [2, 4, 6]], dtype=np.int16)
b = a*2 - 5

This gives:

[[ 9  5  1]
 [-1  3  7]]

Notice also that we have used a compound expression - we are multiplying a by 2, then subtracting 5.

Expressions using two arrays

You can use more than one array in a NumPy expression. For example:

a = np.array([1, 2, 3, 4])
b = np.array([10, 20, 30, 40])

c = a + b

This performs an element by element addition of the two arrays a and b, to create the result in c. This gives:

c = [11 22 33 44]

This result is obtained by adding each corresponding element in a and b:

array

The first element of each array (1 and 10) add to give the first element of the result (11). The second element of each array (2 and 20) add to give the second element of the result (22). And so on.

Arrays must have compatible shapes to be combined in this way. Two arrays that have the exact same shape will always be compatible. However, NumPy arrays also support broadcasting. This allows two arrays of different shapes to be matched under specific circumstances, by replicating elements to make them the same shape. This is covered in a later chapter.

Expressions using two multi-dimensional arrays

You can use multi-dimensional arrays in a NumPy expression. For example here we use 2 by 2 arrays:

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

c = a * b

This performs an element by element multiplication of the two arrays a and b, to create the result in c. This gives:

c = [[ 5 12]
     [21 32]]

This result is obtained by multiplying each corresponding element in a and b:

array

The element [0, 0] of each array (1 and 5) multiply to give the element [0, 0] of the result (5). The element [0, 1] of each array (2 and 6) multiply to give the element [0, 1] of the result (12). And so on.

If you are familiar with matrix multiplication note that NumPy array multiplication doesn't work in the same way. You can use the NumPy dot function to perform matrix multiplication, but the * operator always performs element by element multiplication.

4.6 More complex expressions

You can, of course, use expressions that include multiple arrays, for example:

a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
c = np.array([10, 11, 12, 13])
d = np.array([14, 15, 16, 17])

e = a * b + c**2 + c**3 +2*d

As always, the arrays must be compatible, and broadcasting can be used.

Using conditional operators

You can vectorise conditional operators:

a = np.array([1, 6, 9, 4, 2, 8, 7])
b = a > 5

What will this give us? Well a regular conditional expression returns a bool value, so a vectorised conditional expression will give us an array of NumPy bools:

b = [False  True  True False False  True  True]

The array b is true for every element of a that is greater than 5, false otherwise.

Combining conditional operators

You cannot use and or or directly with a numpy array. This is to say, the following are not allowed:

a = np.array([1, 2, 3, 4, 5, 6])

b = a > 2 and a < 5  # Not allowed
c = a < 3 or a > 4   # Not allowed

Instead you must use special NumPy universal functions. For example np.logical_and is used in place of and. This is covered further the universal functions article.

Vectorisation in numpy

Performing simple maths on an array

Vectorisation with other data types

Vectorisation with multi-dimensional arrays

Expressions using two arrays

Expressions using two multi-dimensional arrays

4.6 More complex expressions

Using conditional operators

Combining conditional operators

See also

Join the PythonInformer Newsletter

Popular tags