In this section we will look at how to create numpy arrays initialised with data series. A previous article shows how to create simple arrays filled with a constant value..
From an existing sequence
An easy way to create an array based on data is to use the
import numpy as np k1 = [1, 3, 5, 7] d1 = np.array(k1) print(d1)
This creates a numpy array
d1 based on the list
[1 3 5 7]
You can do this for more dimensions:
k2 = [[10, 20, 30], [40, 50, 60]] d2 = np.array(k2) print(d2)
[[10 20 30] [40 50 60]]
array will accept any sequence, or numpy array, or anything that behaves like a numpy array.
arange works in a similar way to the built in
range function, except that it creates a numpy array. The other difference is that it can work with floating point values:
r1 = np.arange(4.9) print(r1) r2 = np.arange(.5, 4.9) print(r2) r3 = np.arange(.5, 4.9, 1.3) print(r3)
r1 uses the default start and step values. It counts from 0.0 up to but not including 4.9, in steps of 1.0:
[0. 1. 2. 3. 4.]
r2 uses the default step value. It counts from 0.5 up to but not including 4.9, in steps of 1.0:
[0.5 1.5 2.5 3.5 4.5]
r3 counts from 0.5 up to but not including 4.9, in steps of 1.3:
[0.5 1.8 3.1 4.4]
A potential problem with arange is that the values of the series are calculated, and therefore may be subject to rounding errors. In particular, the final item in the sequence might not be the exact expected value
linspace creates a series of equally spaced numbers, in a similar way to
arange. The difference is that
linspace specifies the start and end points, plus the required number of steps:
k5 = np.linspace(0, 10, 5) print(k5)
[ 0. 2.5 5. 7.5 10. ]
That is, 5 equally spaced values between 0 and 10, inclusive. Unlike
arange, the start and end values will be exactly correct (exactly 0 and 10) because they are specified rather than being calculated.
endpoint parameter for linspace
endpoint can be set to
False alter the behaviour of
linspace (to make it a bit more like
k5 = np.linspace(0, 10, 5, endpoint=False) print(k5)
In this case,
linspace creates 6 equally spaced values, but doesn’t return the final value (so the result still has 5 elements). Here is the result:
[0. 2. 4. 6. 8.]
As you can see, the range is now divided into intervals of 2.0 (rather than 2.5), but the final element is 8.0 rather than 10.0
retstep parameter for linspace
retstep can be set to
True to obtain the step size used by
linspace. The sample array and the step are returns as a tuple:
k5, step = np.linspace(0, 10, 5, retstep=True) print(k5) print(step)
[ 0. 2.5 5. 7.5 10. ] # samples 2.5 # step size
There are various ways to create an array of random numbers in numpy.
If you read the numpy documentation, you will find that most of the random functions have several variants that do more or less the same thing. They might vary in minor ways - parameter order, whether the value range is inclusive or exclusive etc. The basic set described below should be enough to do everything you need, but if you prefer to use the other variants they will deliver the same results.
This will create an array of random numbers in the range 0.0 up to but not including 1.0. This means that the range can included anything from 0.0 up to the largest float that is less than 1 (eg something like 0.99999999…), but it will never actually include 1.0. In maths we sometimes write this as [0.0, 1.0). The values are distributed uniformly, so every values is equally likely to occur.
r = np.random.random((3, 2)) print(r)
This creates a 3 by 2 array of random numbers, like this (of course you will get different numbers):
[[0.40704545 0.47734427] [0.76764629 0.37887717] [0.82443478 0.36409071]]
If you want to create random number over a different range, for example [a, b), you can do it using vectorised operators like this:
r = (b - a)*np.random.random((3, 2)) + a print(r)
randint function creates an array of integers. In its simplest form it creates values in the range [0, high), that is integers from 0 up to but not including
r = np.random.randint(4, size=(3, 4)) print(r)
Notice that the
size is passed in as a named parameter, unfortunately it isn’t just the first parameter like most numpy functions.
This code, with a value of 4, will create value in the range 0 to 3:
[[3 3 0 3] [3 1 3 0] [2 3 3 1]]
You can also pass in two values,
high, resulting in numbers in the range [low, high). For example to simulate a dice (output values 1 to 6 inclusive), you would use values 1 and 7:
r = np.random.randint(1, 7, size=10) print(r)
[1 3 3 5 4 1 2 1 6 4]
choice picks values at random from a list (in this case the list is all prime numbers less than 20):
r = np.random.choice([2, 3, 5, 7, 11, 13, 17, 19], size=10) print(r)
[17 19 7 5 11 11 2 7 11 3]
There are other options (for example you can set different probabilities for each item in the list) but we won’t cover that here.
This function creates values using the standard Normal distribution. The Normal distribution is the classic bell shaped curve, centred on zero.
r = np.random.standard_normal((3, 3)) print(r)
[[-0.20059509 -1.70950313 0.1355992 ] [-0.84462048 1.27934375 1.30837433] [-1.34519813 -1.18474318 -0.83397725]]
If you need a non-standard data series, it will usually be most efficient to use vectorisation if possible.
For example, to create a series containing the cubes of each number: 0, 8, 27, 64… you could do this:
cubes = np.arange(10)**3 print(cubes)
This will normally be a lot quicker than using a Python loop.