NumPy introduction

By Martin McBride, 2021-09-21
Tags: arrays data types vectorisation pandas matplotlib scipy data science
Categories: numpy


NumPy is a Python package that allows you to efficiently store and process large arrays of numerical data. Obvious examples of this type of data are sound data and image data, but NumPy can also be used anywhere you have large data sets to process.

Part of the attraction of NumPy is that it uses simple and familiar Python syntax to perform complex operations on arrays, which simplifies your code. The other benefit is that NumPy is highly efficient, both in terms of speed and memory usage. These two factors are not unrelated - NumPy provides high-level array operations, and these operations are efficient because, under the hood, the entire processing loop is written in C.

In this tutorial, we will take a quick tour of NumPy arrays.

Before you start, you will need to install NumPy. The official numpy.org site will point you at the latest version, with instructions for installing the package.

NumPy for data science

NumPy is a key library for handling large numerical data sets in Python. It is often used as the interface between other data science libraries such as Pandas, Matplotlib, and SciPy

Import NumPy

First, of course, you must import the NumPy package. It is common practice to import numpy as np (so that you can use the short name np in your code). You don't have to, but most people who use NumPy do and will recognise the np prefix.

>>> import numpy as np

Creating NumPy arrays

There are several ways to create NumPy arrays, we will just look at a couple of methods here.

You can create an array of zeros using the zeros function, supplying the required array length:

>>> a = np.zeros(5)
>>> print(a)
[ 0.  0.  0.  0.  0.]

>>> m = np.zeros((3, 4))
>>> print(m)
[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]

As you can see, we can also create a 2-dimensional array by passing in a tuple such as (3, 4) to specify the number of rows and columns. You can create 3-dimensional array by passing in a tuple with 3 values, etc. You can have as many dimensions as you like.

You can also initialise an array from the values in a list, using the array function:

>>> a = np.array([2, 4, 6, 8])
>>> print(a)
[2 4 6 8]

A multidimensional list will create a multidimensional NumPy array:

>>> m = np.array([[1, 2], [3, 4], [5, 6]])
>>> print(m)
[[1 2]
 [3 4]
 [5 6]]

Vectorised operators

When you apply arithmetic operations to NumPy arrays, they are automatically applied to each element individually. This is called vectorisation. Here is a simple example:

>>> x = np.array([1, 3, 5, 7])
>>> y = np.array([0, 1, 2, 3])
>>> z = x * y
>>> print(z)
[ 0  3 10 21]

Each element of z is calculated by multiplying together the corresponding elements of x and y:

  • x[0] is 1, y[0] is 0, so z[0] is 0
  • x[1] is 3, y[1] is 1, so z[1] is 3
  • x[2] is 5, y[2] is 2, so z[2] is 10, etc

This makes your code a lot neater, but it is also usually faster. The implicit loop is performed in NumPy's native C code, which is usually faster than a Python for loop.

Universal functions

NumPy has its own versions of common maths functions like sin, cos, exp etc, that are applied to each element individually. For example:

>>> a = np.array([1, 4, 9, 16])
>>> b = np.sqrt(a)
>>> print(b)
[1. 2. 3. 4.]

This code applies the square root function to all the elements in a and creates a new NumPy array with the results. As with vectorised operators, the implicit loop is performed very efficiently.

Slices

You can slice NumPy arrays, just like lists. You can also slice multidimensional arrays. For example, this code inserts a 2 row by 4 column array into the middle two rows of a 4 by 4 array

>>> a = np.zeros((4, 4))
>>> b = np.array([[1., 2., 3., 4.], [5., 6., 7., 8.]])
>>> a[1:3] = b
>>> print(a)
[[ 0.  0.  0.  0.]
 [ 1.  2.  3.  4.]
 [ 5.  6.  7.  8.]
 [ 0.  0.  0.  0.]]

You can also insert a 4 by 2 array into a 4 by 4 array:

>>> a = np.zeros((4, 4))
>>> b = np.array([[1., 2.], [3., 4.], [5., 6.], [7., 8.]])
>>> a[:,1:3] = b
>>> print(a)
[[ 0.  1.  2.  0.]
 [ 0.  3.  4.  0.]
 [ 0.  5.  6.  0.]
 [ 0.  7.  8.  0.]]

You can slice in more than one dimension, and copy a slice of one array into a slice of another. This code copies the middle 4 elements of b into the bottom right corner of a:

>>> a = np.zeros((4, 4))
>>> b = np.array([[1., 2.], [3., 4.], [5., 6.], [7., 8.]])
>>> a[2:4, 2:4] = b[1:3]
>>> print(a)
[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  3.  4.]
 [ 0.  0.  5.  6.]]

Data types

NumPy uses homogeneous arrays (all the elements must be the same type). This is different to a Python list, where different elements of the same list can have different types.

By default, when you create an array with the zeros function, it will contain floating-point values. You can choose a different type by using the dtype parameter. For example, this creates an array of 16-bit integer values.

>>> a = np.ones(4, dtype=np.int16)
>>> print(a)
[1 1 1 1]

If you create an array using the array function, the data type will depend on the types in the source list. If the source list is all integers, the NumPy array will contain ints. If the list contains any floats, the array will contain floats. If the list is a mixture, the array will contain all floats, with the integer values converted to float. Once again, you can use the dtype parameter to override this.

Arrays filled with a value range

You can use the arange function to fill an array with a range of values:

>>> a = np.arange(5)
>>> print(a)
[0 1 2 3 4]

This function can be used with optional start and step arguments, just like the standard range function. But you can also use float values:

>>> a = np.arange(1.0, 3.0, .3)
>>> print(a)
[ 1.   1.3  1.6  1.9  2.2  2.5  2.8]

An alternative function, linspace, allows you to specify the exact start and end values, and the exact number of elements, and it will calculate the increment between the values:

>>> a = np.linspace(1.0, 3.0, 5)
>>> print(a)
[ 1.   1.5  2.   2.5  3. ]

This has just been a quick introduction to NumPy arrays. You can learn more by following the more detailed articles in the rest of this tutorial, or by visiting the numpy.org site.

See also

If you found this article useful, you might be interested in the book NumPy Recipes or other books by the same author.

Join the PythonInformer Newsletter

Sign up using this form to receive an email when new content is added:

Popular tags

2d arrays abstract data type alignment and angle animation arc array arrays bar chart bar style behavioural pattern bezier curve built-in function callable object chain circle classes clipping close closure cmyk colour combinations comparison operator comprehension context context manager conversion count creational pattern data science data types decorator design pattern device space dictionary drawing duck typing efficiency ellipse else encryption enumerate fill filter font font style for loop formula function function composition function plot functools game development generativepy tutorial generator geometry gif global variable gradient greyscale higher order function hsl html image image processing imagesurface immutable object in operator index inner function input installing iter iterable iterator itertools join l system lambda function latex len lerp line line plot line style linear gradient linspace list list comprehension logical operator lru_cache magic method mandelbrot mandelbrot set map marker style matplotlib monad mutability named parameter numeric python numpy object open operator optimisation optional parameter or pandas partial application path pattern permutations pie chart pil pillow polygon pong positional parameter print product programming paradigms programming techniques pure function python standard library radial gradient range recipes rectangle recursion reduce regular polygon repeat rgb rotation roundrect scaling scatter plot scipy sector segment sequence setup shape singleton slice slicing sound spirograph sprite square str stream string stroke structural pattern subpath symmetric encryption template tex text text metrics tinkerbell fractal transform translation transparency triangle truthy value tuple turtle unpacking user space vectorisation webserver website while loop zip zip_longest