NumPy introduction

By Martin McBride, 2021-09-21

Tags: arrays data types vectorisation pandas matplotlib scipy data science
Categories: numpy

NumPy is a Python package that allows you to efficiently store and process large arrays of numerical data. Obvious examples of this type of data are sound data and image data, but NumPy can also be used anywhere you have large data sets to process.

Part of the attraction of NumPy is that it uses simple and familiar Python syntax to perform complex operations on arrays, which simplifies your code. The other benefit is that NumPy is highly efficient, both in terms of speed and memory usage. These two factors are not unrelated - NumPy provides high-level array operations, and these operations are efficient because, under the hood, the entire processing loop is written in C.

In this tutorial, we will take a quick tour of NumPy arrays.

Before you start, you will need to install NumPy. The official numpy.org site will point you at the latest version, with instructions for installing the package.

NumPy for data science

NumPy is a key library for handling large numerical data sets in Python. It is often used as the interface between other data science libraries such as Pandas, Matplotlib, and SciPy

Import NumPy

First, of course, you must import the NumPy package. It is common practice to import numpy as np (so that you can use the short name np in your code). You don't have to, but most people who use NumPy do and will recognise the np prefix.

>>> import numpy as np

Creating NumPy arrays

There are several ways to create NumPy arrays, we will just look at a couple of methods here.

You can create an array of zeros using the zeros function, supplying the required array length:

>>> a = np.zeros(5)
>>> print(a)
[ 0.  0.  0.  0.  0.]

>>> m = np.zeros((3, 4))
>>> print(m)
[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]

As you can see, we can also create a 2-dimensional array by passing in a tuple such as (3, 4) to specify the number of rows and columns. You can create 3-dimensional array by passing in a tuple with 3 values, etc. You can have as many dimensions as you like.

You can also initialise an array from the values in a list, using the array function:

>>> a = np.array([2, 4, 6, 8])
>>> print(a)
[2 4 6 8]

A multidimensional list will create a multidimensional NumPy array:

>>> m = np.array([[1, 2], [3, 4], [5, 6]])
>>> print(m)
[[1 2]
 [3 4]
 [5 6]]

Vectorised operators

When you apply arithmetic operations to NumPy arrays, they are automatically applied to each element individually. This is called vectorisation. Here is a simple example:

>>> x = np.array([1, 3, 5, 7])
>>> y = np.array([0, 1, 2, 3])
>>> z = x * y
>>> print(z)
[ 0  3 10 21]

Each element of z is calculated by multiplying together the corresponding elements of x and y:

x[0] is 1, y[0] is 0, so z[0] is 0
x[1] is 3, y[1] is 1, so z[1] is 3
x[2] is 5, y[2] is 2, so z[2] is 10, etc

This makes your code a lot neater, but it is also usually faster. The implicit loop is performed in NumPy's native C code, which is usually faster than a Python for loop.

Universal functions

NumPy has its own versions of common maths functions like sin, cos, exp etc, that are applied to each element individually. For example:

>>> a = np.array([1, 4, 9, 16])
>>> b = np.sqrt(a)
>>> print(b)
[1. 2. 3. 4.]

This code applies the square root function to all the elements in a and creates a new NumPy array with the results. As with vectorised operators, the implicit loop is performed very efficiently.

Slices

You can slice NumPy arrays, just like lists. You can also slice multidimensional arrays. For example, this code inserts a 2 row by 4 column array into the middle two rows of a 4 by 4 array

>>> a = np.zeros((4, 4))
>>> b = np.array([[1., 2., 3., 4.], [5., 6., 7., 8.]])
>>> a[1:3] = b
>>> print(a)
[[ 0.  0.  0.  0.]
 [ 1.  2.  3.  4.]
 [ 5.  6.  7.  8.]
 [ 0.  0.  0.  0.]]

You can also insert a 4 by 2 array into a 4 by 4 array:

>>> a = np.zeros((4, 4))
>>> b = np.array([[1., 2.], [3., 4.], [5., 6.], [7., 8.]])
>>> a[:,1:3] = b
>>> print(a)
[[ 0.  1.  2.  0.]
 [ 0.  3.  4.  0.]
 [ 0.  5.  6.  0.]
 [ 0.  7.  8.  0.]]

You can slice in more than one dimension, and copy a slice of one array into a slice of another. This code copies the middle 4 elements of b into the bottom right corner of a:

>>> a = np.zeros((4, 4))
>>> b = np.array([[1., 2.], [3., 4.], [5., 6.], [7., 8.]])
>>> a[2:4, 2:4] = b[1:3]
>>> print(a)
[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  3.  4.]
 [ 0.  0.  5.  6.]]

Data types

NumPy uses homogeneous arrays (all the elements must be the same type). This is different to a Python list, where different elements of the same list can have different types.

By default, when you create an array with the zeros function, it will contain floating-point values. You can choose a different type by using the dtype parameter. For example, this creates an array of 16-bit integer values.

>>> a = np.ones(4, dtype=np.int16)
>>> print(a)
[1 1 1 1]

If you create an array using the array function, the data type will depend on the types in the source list. If the source list is all integers, the NumPy array will contain ints. If the list contains any floats, the array will contain floats. If the list is a mixture, the array will contain all floats, with the integer values converted to float. Once again, you can use the dtype parameter to override this.

Arrays filled with a value range

You can use the arange function to fill an array with a range of values:

>>> a = np.arange(5)
>>> print(a)
[0 1 2 3 4]

This function can be used with optional start and step arguments, just like the standard range function. But you can also use float values:

>>> a = np.arange(1.0, 3.0, .3)
>>> print(a)
[ 1.   1.3  1.6  1.9  2.2  2.5  2.8]

An alternative function, linspace, allows you to specify the exact start and end values, and the exact number of elements, and it will calculate the increment between the values:

>>> a = np.linspace(1.0, 3.0, 5)
>>> print(a)
[ 1.   1.5  2.   2.5  3. ]

This has just been a quick introduction to NumPy arrays. You can learn more by following the more detailed articles in the rest of this tutorial, or by visiting the numpy.org site.

NumPy introduction

NumPy for data science

Import NumPy

Creating NumPy arrays

Vectorised operators

Universal functions

Slices

Data types

Arrays filled with a value range

See also

Join the PythonInformer Newsletter

Popular tags