Data types

By Martin McBride, 2019-09-14

Tags: data types efficiency
Categories: numpy

numpy supports five main data types - ints, unsigned ints, floats, complex numbers, and booleans.

Integers

Integers in Python can represent positive or negative numbers of any size. That is because Python integers are objects, and the implementation automatically grabs more memory if necessary to store very large values.

Integers in numpy are very different. An integer occupies a fixed number of bytes. For example, the type np.int32 occupies exactly 4 byte of memory (A byte contains 8 bits, so 4 bytes is 32 bits, hence int32). These are called primitive types because they aren't object, they are just data bytes stored directly in memory.

The reasons for using primitive types is explained in detail in the article on numpy efficiency. In summary:

An arrays of primitive types takes a lot less memory than a list of Python integer objects.
Accessing primitive values is faster.
Primitive types don't require garbage collection.

In fact, numpy provides several different integer sizes:

Yype	Bytes	Range
np.int8	1	-128 to 127
np.int16	2	-32768 to 32767
np.int32	4	-2147483648 to 2147483647
np.int64	8	-9223372036854775808 to 9223372036854775807

There are a couple of reasons for this. The first is fairly obvious, if you are using data that has a limited range there is no point using more memory than you need. For example, sound data is often stored using 16 bits per sample (ie the sound is represented by an array of 16 bit values). Storing this data as 64 bit integers would make no sense, you would be using 4 times a much memory for no reason.

The second reason is slightly less obvious. Some applications use a mix of Python and C code for efficiency. With numpy, it is possible to pass a pointer to the array data into a C function, so that the C code can access the data in memory without the need to make a copy of it. This can improve efficiency when dealing with very large arrays. For this to work, the data needs to be stored in the format the C code is expecting. So if the C code is expecting an array of 16 bit integers, it is useful to be able to specify that in numpy. We won't be covering that in these tutorials, it is quite specialised.

Unsigned integers

Unsigned integers are similar to normal integers, but they can only hold non-zero values. Here are the available types:

Type	Bytes	Range
np.uint8	1	0 to 255
np.uint16	2	0 to 65535
np.uint32	4	0 to 4294967295
np.uint64	8	0 to 18446744073709551615

Unsigned integers are useful for data that can never be negative, for example population data. The population of a town can never be less than zero.

The advantage of unsigned data is that it can represent larger positive numbers than signed data. An int8 goes up to 127, but a uint8 goes up to 255.

Floats

numpy floating point numbers also have different sizes (usually called precisions). There are two types:

Type	Bytes	Range	Precision
np.float32	4	±1.18×10⁻³⁸ to ±3.4×10³⁸	7 to 8 decimal digits
np.float64	8	±2.23×10⁻³⁰⁸ to ±1.80×10³⁰⁸	15 to 16 decimal digits

float64 numbers store floating point numbers in the same way as a Python float value. They are sometimes called double precision.

float32 numbers take half as much storage as float64, but they have considerably smaller range and . They are sometimes called single precision.

Complex numbers

A complex number consist of two floating point numbers, on representing the real part and one representing the imaginary part. If you have not met complex numbers before, here is a wikipedia article.

Type	Bytes	Precision
np.complex64	8	Two 32-bit floats
np.complex128	16	Two 64-bit floats

complex128 is equivalent to the Python complex type.

Booleans

numpy supports boolean values np.bool. A bool is one byte in size, with 0 representing false, and any non-zero value representing true.

Setting the data type

All of the functions available for created numpy arrays have an optional parameter dtype that allows you to specify the data type (such as np.uint8 or np.float64 etc). For example:

a = np.zeros((2, 3), dtype=np.int32)

Creates an array that is 2 rows by 3 columns of zeros with data type int32:

[[0 0 0]
 [0 0 0]]

System dependent types

numpy also provides a numpy of types that don't specify a particular size. These include np.byte, np.short, np.int, np.long, amongst others. There are also unsigned versions np.ubyte, np.ushort etc.

These types have system dependent sizes. For example np.int might be equivalent to np.int32 or np.int64 depending on the system it is running on. It depends on the type of processor, the type of operating system, and perhaps the version of the operating system.

In general, don't use these types. They are provided for situations where numpy is passing data in memory to a library written in C. For historical reasons, C has always had system dependent types like int and short whose exact size can vary between systems. If you were interfacing to such a library you would need to use compatible types. Unless you are using any libraries that specifically tell you to use these types, don't use them. Stick to the fixed-size types shown above instead.

Data ordering

Some functions (such as zeros used above) allow you to select an order for the data. The choices are C-style or Fortran-style ordering (sometimes a couple of other variants too). Again, these options are intended for use if you are passing data in memory to a library written in C (or even Fortran). Unless you have good reason to change it, just use the default option.