Low level code optimisation

By Martin McBride, 2022-03-16

Tags: optimisation
Categories: python language intermediate python

This article looks at low-level code optimisation techniques. These techniques often involve changing individual lines of code to do the same thing in a slightly different way, with the aim of making the code run faster.

This type of optimisation should not be taken lightly, and should only be applied when strictly necessary. These techniques have several downsides:

It takes longer to write highly optimised code.
It often makes the code more difficult to understand.
They need to be carefully maintained. A developer who doesn't understand the optimisations might accidentally undo them in a naive attempt to improve the code.
When applied in the wrong situation, they might well make the code slower.

These optimisations should not be applied in every situation, nor should they be applied on a hunch that they might help performance. At best you might be wasting your time, at worst you will be making the code slower.

They should only be used when absolutely necessary.

1 Profile and experiment

If code is running slower than expected or required, it is important to find out which parts of the code are taking all the time. If the code is taking an hour to process some data, there is no point optimising a line of code that only gets called once right at the start and takes a few microseconds of CPU time.

There are several ways to identify the areas of code that might be problematic.

It is important to perform these steps using realistic data, and preferably multiple sets of data. Using unrepresentative data can sometimes give misleading results.

The first is to simply observe where the program gets stuck. This might be something very obvious, for example the program start to run slowly when it processes a particular type of file, or the program takes a long time to start up but runs fine after that. It might be something more subtle. These things might help you identify the areas to concentrate on.

The next stage might be to add logging statements at key points, for example the entry points to the main sections of code, and at the start of any loops you suspect might be slow. If you are not using a logger, then print statements are a good alternative. This is often an iterative process, starting with the top-level code then going into more detail in the parts of code that seem to be taking the most time.

Bear in mind that if some part of the code is creating lots of log messages (hundreds per second or more) this alone might slow the program down.

In many cases, you will find that the cause of the slowdown is either:

A loop that executes many, many times and takes too long. You can try the techniques here, or if that fails it might require a more structural solution such as using a different algorithm or finding a library that performs the calculation in C code.
The code spends time waiting for something, for example waiting for data from the network. You can investigate a non-blocking solution, or using multi-threading to allow the program to continue working until the data arrives.

The rest of this article is mainly about the first case, where your code is stuck in a loop and the solution is to make the loop run faster.

At this point, you might want to attempt some profiling. This allows you to find out exactly how slow your code is, so you can try different techniques and check which methods give the greatest improvement. This isn't always necessary, for example if the loop has an obvious problem then you can just go ahead and fix it.

If you do need to go down the profiling route, there are several methods:

Adding a timestamp to the logging messages you already have is a quick and dirty method, but be aware that it can distort the timings if you have a high rate of messages.
The timeit module is useful if you need to speed up a block of code that is fairly isolated and can be run independently of the main application. timeit will execute and time a code fragment.
The cProfile module allows you to profile an entire program and will tell you how long the program spent in each function call.

Using these techniques, you should be able to identify the slow part of your code and evaluate different improvements to see which works best.

2 Exit early

Consider this code:

for x in k:
    stmt1
    stmt2
    if cond:
        break
    stmt3
    stmt4

In this code, we loop over a list k. In each cycle:

We execute some initial code stmt1, stmt2.
We check the condition cond and exit the current iteration early if it is true (using break).
We then execute stmt3, stmt4, only of cond was false.

For this code to be optimal, we need to ensure that the initial code only contains the things that have to be run every time. If there are things in the initial code that don't need to be run if cond is true, then we are wasting time executing them. They should be placed after the if statement.

This might seem obvious, but it can easily be missed in complex code. In particular, if the initial code makes calls to other functions, you should check that those functions aren't doing unnecessary work.

You should also check the if statement if it uses compound conditions:

if cond1 and cond2:
    ...

In this case, Python first evaluates cond1. If cond1 is false, Python won't evaluate cond2 at all. It doesn't need to, because if the first condition is false the entire condition must be false. This is called short circuit evaluation.

This means that swapping cond1 and cond2 will potentially make a difference to how fast the code runs, and if the conditions involve calling a slow function it might make a big difference.

This means that:

If one condition is quick to evaluate, and one is slower, do the quick one first, because then you might not need to do the slow one at all.
If one condition is almost always false, do that one first, because then you won't usually need to evaluate the other one.

A similar situation exists for or conditions, except that the cond2 is not evaluated if cond1 is true. As ever, it is worth testing to be sure the optimisation is worthwhile.

This optimisation is potentially fragile, in that a future developer might change the code without understanding the importance of the order of operations.

3 Use iterators rather than indexes for looping

When looping over a list or other sequence, we should always try to use the iterator form rather than the index form:

# Don't do this
for i in in range(len(k)):
    print(k[i])

# Do this
for x in k:
    print(x)

Apart from being shorter and more pythonic, the second form is faster. But there are a few situations where it might appear that we need to use indexing. We usually don't - Python has some built-in functions to help us.

The first is where we need to use the index within the loop for any reason. In this example, we will print the index along with the value:

# Don't do this
for i in in range(len(k)):
    print(i, k[i])

# Do this
for i, x in in enumerate(k):
    print(i, x)

The enumerate function returns a sequence of tuples each containing the index and value of the next element. These values are unpacked into the variables i and x.

The other situation is where we need to loop over two (or more) lists together. In this example, we will print corresponding values from lists p and q:

# Don't do this
for i in in range(len(p)):
    print(p[i], q[i])

# Do this
for a, b in zip(p, q):
    print(a, b)

The enumerate function returns a sequence of tuples each containing the next value from p and the next value from q. These values are unpacked into the variables a and b.

These will handle most cases, with no real disadvantages. There may still be cases where you have to loop over indexes, but they should be quite rare.

4 Reduce the number of function calls

Calling a function takes time. We don't (and shouldn't) usually worry about it, because it doesn't take that long, and using functions can make our code much more readable, maintainable and reusable. In general code, functions are all good.

However, in some types of loops, we need to consider the time taken to call a function. The time to execute a loop can be expressed, roughly, as:

total_time = number_of_iterations * time_per_iteration

Sometimes we will see loops that don't do much (the time_per_iteration is very small), but if the number_of_iterations is huge the loop can still take a long time. If the body of the loop makes function calls, that time might be significant.

A special case of this is if we are looping over a sequence, calling a function once per element:

def f1(element):
    do_stuff_with_element

for x in long_list:
    f1(x)

We can reduce the number of function calls by putting the for loop inside the function:

def f1(list_of_elements):
    for element in list_of_elements:
        do_stuff_with_element

f1(long_list)

The general rule here is to put loops inside functions rather than putting functions inside loops. Or put another way, we should try to pass aggregate data (lists etc) into functions.

A second technique is more general but also a little more drastic. Consider this code:

def f2():
    do_stuff2

def f3():
    do_stuff3

while cond:
    f2()
    f3()

We can remove the function calls by putting the code directly in the loop:

while cond:
    do_stuff2
    do_stuff3

Of course, this means our loop has a blob of unstructured code rather than some nice clean function calls, but if it makes the difference between the code being fast enough or not, then it might be the only option.

As with all these examples, measure the resulting performance and decide if it is worth it, then add comments or other documentation to explain why the code is the way it is.

5 Cache attributes locally

There are various types of variables in Python, for example:

Local variables, defined and used within a function.
Global variables, defined outside of a function, and accessible from anywhere, even from other modules.
Instance variables, defined in a class and accessible via objects of that class.

Accessing local variables is usually slightly faster, so if you have code that makes frequent access to other types of variables, it can be worth making local copies outside of the loop.

For example:

global_var = 0

def access_global():
    for i in range(1000):
        calculation_with_global_var

def cache_global():
    x = global_var
    for i in range(1000):
        calculation_with_x
    global_var = x

The first function, access_global does its work directly with global_var.

The second function, cache_global copies global_var to a local variable x, does the same work with x, then copies the result back to global_var. This will be slightly faster than

We can do something similar with methods of objects:

k = [1, 2, 3, 4]
ki = k.insert

ki(2, 'a')  # Same as k.insert(2, 'a')

Here we have created ki, a function that is equivalent to the insert method of the object k. This means that when we call the ki function we don't need to specify that we wish to operate on the k object, because the ki function is already bound to the k object.

These techniques can shave time off critical code, but they should only be used when necessary. Caching attributes creates code that can be confusing to read. And when aliasing variables, we must remember to copy the result back to the original variable at the end.

6 Avoid excessive abstraction in time-critical code

This is really a generalisation of the previous two items.

Deep levels of abstraction include:

Deeply nested functions - functions that call functions that call functions.
Deeply nested objects - objects that are members of other objects that are members of other objects.
A combination of the above.

Abstraction is good. It is a key part of writing clean, maintainable code. Every level of abstraction will make the code slightly slower, but most of the time this doesn't really matter. If the code is waiting for the user to press a key, or for a byte to be read from disk, then taking a few nanoseconds longer to process that event will make no difference.

But in time-critical areas of the code, it may well be worth limiting the number of levels of abstraction, so that we don't need to do too many of the hacks we discussed earlier.

7 Multiple assignments

Python allows multiple assignments in one statement. For example here are two different ways to assign values to a, b, and c:

# Individual assignments
a = 1
b = 2
c = 2

# Equivalent multiple assignments
a, b, c = 1, 2, 3

In most cases, this is a matter of style. However the second method might be slightly faster, so should be considered if the code needs to be optimised.

8 Lazy importing

This final item is more about latency than actual speed.

We normally import all the modules we need at the top of our Python file:

import numpy
import matplotlib

# Use numpy

# Use matplotlib

This is normally good practice, as it lets us see at a glance which modules are being used.

However, you don't strictly need to import a module until the point where you actually use it:

import numpy

# Use numpy

import matplotlib

# Use matplotlib

This won't make the code run any faster, but it might allow it to start up slightly faster because it doesn't need to load matplotlib until it is required.

Incidentally, don't worry about importing the same module more than once. Python keeps track and will ignore any attempts to import a module that has already been imported.

Summary

We have looked at some low-level coding changes that can help speed up Python code in certain circumstances. They are worth trying if you have identified particular areas of your code that are running too slow. But all of these methods have downsides in terms of code clarity and maintainability, and they might even be counterproductive in some cases. They should be used carefully.