Creating data series in numpy

By Martin McBride, 2019-09-15
Tags: arrays data types arange linspace vectorisation
Categories: numpy


In this section we will look at how to create numpy arrays initialised with data series.

arange

arange works in a similar way to the built-in range function, except that it creates a numpy array. The other difference is that it can work with floating point values:

r1 = np.arange(4.9)
print(r1)
r2 = np.arange(.5, 4.9)
print(r2)
r3 = np.arange(.5, 4.9, 1.3)
print(r3)

r1 uses the default start and step values. It counts from 0.0 up to but not including 4.9, in steps of 1.0:

[0. 1. 2. 3. 4.]

r2 uses the default step value. It counts from 0.5 up to but not including 4.9, in steps of 1.0:

[0.5 1.5 2.5 3.5 4.5]

r3 counts from 0.5 up to but not including 4.9, in steps of 1.3:

[0.5 1.8 3.1 4.4]

Setting the type

You can set the type of the array using the dtype parameter of arange:

i1 = np.arange(5, dtype='np.int8')
print(11)

THis creates an array of 8 bit integers:

[0 1 2 3 4]

All the functions described in this section support dtype. The types available are described in data types.

arange and rounding errors

There is a potential problem with arange when using floating point values. Consider this:

r2 = np.arange(0, 6, 1.2)

This creates an array:

[0.  1.2 2.4 3.6 4.8]

As you would expect. The next element is 6.0, and since arange counts up to but not including 6.0, the array has only 5 elements.

A problem could occur if a rounding error caused the final calculation to be very slightly wrong, for example 5.999999999999999. Since that is less than 6.0, the final element would be included in the array, so it would now have 6 elements.

That means that the in some cases length of the array could change depending on tiny rounding errors. A possible solution is linspace.

linspace

linspace creates a series of equally spaced numbers, in a similar way to arange. The difference is that linspace specifies the start and end points, plus the required number of steps:

k5 = np.linspace(0, 10, 5)
print(k5)

This prints:

[ 0.   2.5  5.   7.5 10. ]

That is, 5 equally spaced values between 0 and 10, inclusive. Unlike arange, the start and end values will be exactly correct (exactly 0 and 10) because they are specified rather than being calculated. You will also get exactly the required number of elements in the array.

endpoint parameter for linspace

endpoint can be set to False alter the behaviour of linspace (to make it a bit more like arange):

k5 = np.linspace(0, 10, 5, endpoint=False)
print(k5)

In this case, linspace creates 6 equally spaced values, but doesn't return the final value (so the result still has 5 elements). Here is the result:

[0. 2. 4. 6. 8.]

As you can see, the range is now divided into intervals of 2.0 (rather than 2.5), but the final element is 8.0 rather than 10.0

retstep parameter for linspace

retstep can be set to True to obtain the step size used by linspace. The sample array and the step are returned as a tuple:

k5, step = np.linspace(0, 10, 5, retstep=True)
print(k5)
print(step)

This prints:

[ 0.   2.5  5.   7.5 10. ]  # samples
2.5                         # step size

Using vectorisation

If you need a non-standard data series, it will usually be most efficient to use vectorisation if possible.

For example, to create a series containing the cubes of each number: 0, 8, 27, 64... you could do this:

cubes = np.arange(10)**3
print(cubes)

This will normally be a lot quicker than using a Python loop.

See also

If you found this article useful, you might be interested in the book NumPy Recipes or other books by the same author.

Join the PythonInformer Newsletter

Sign up using this form to receive an email when new content is added:

Popular tags

2d arrays abstract data type alignment and angle animation arc array arrays bar chart bar style behavioural pattern bezier curve built-in function callable object chain circle classes clipping close closure cmyk colour combinations comparison operator comprehension context context manager conversion count creational pattern data science data types decorator design pattern device space dictionary drawing duck typing efficiency ellipse else encryption enumerate fill filter font font style for loop formula function function composition function plot functools game development generativepy tutorial generator geometry gif global variable gradient greyscale higher order function hsl html image image processing imagesurface immutable object in operator index inner function input installing iter iterable iterator itertools join l system lambda function latex len lerp line line plot line style linear gradient linspace list list comprehension logical operator lru_cache magic method mandelbrot mandelbrot set map marker style matplotlib monad mutability named parameter numeric python numpy object open operator optimisation optional parameter or pandas partial application path pattern permutations pie chart pil pillow polygon pong positional parameter print product programming paradigms programming techniques pure function python standard library radial gradient range recipes rectangle recursion reduce regular polygon repeat rgb rotation roundrect scaling scatter plot scipy sector segment sequence setup shape singleton slice slicing sound spirograph sprite square str stream string stroke structural pattern subpath symmetric encryption template tex text text metrics tinkerbell fractal transform translation transparency triangle truthy value tuple turtle unpacking user space vectorisation webserver website while loop zip zip_longest