Martin McBride, 2022-02-14
Tags optimisation
Categories python language intermediate python

Python is an interpreted language, which means it will generally be slower than compiled languages such as C++. For this reason, it is sometimes necessary to take additional steps to optimise Python code. In this article, we will look at some of the principles we should follow when optimising Python code.

Optimisation techniques fall into 3 main categories:

  • Good practice techniques - things that are a good idea anyway but might also make your code more performant.
  • Structural optimisation - looking at the big picture, choosing optimal algorithms, data structures, modules etc to speed up code.
  • Low level optimisation - analysing at the relevant parts code line by line, tweaking the code to improve performance.

There is quite a lot of nonsense talked about optimisation. Some people will pick on some feature of the language and say that you must never use this feature because it is slow. They are wrong - in most cases creating slightly inefficient code will make no more than an infinitesimal difference to the performance of the program, and the feature itself might make the code more readable. Some people will tell you that doing this one thing will make all your code run twice as fast. They usually back this up by supplying timings of one very specific and unrealistic code sample that will indeed run twice as fast with a small tweak. But that doesn't carry over to other code.

In fact, optimising code is often quite a laborious process, with no magic bullets. But if you are smart about it, you will get 99% of the benefit by carefully optimising 1% of your code.

There are exceptions, for example, if you are writing code to run on a small embedded system that doesn't have a powerful processor. But the discussion below applies to most situations.

Python is slow but computers are fast

Modern computers are extremely fast, of course. They might struggle a bit with the most intensive tasks such as mining bitcoins or rendering complex 3D scenes at 70 frames per second, but for most tasks they have plenty of capacity to spare. The fact that Python might be slightly slower than compiled code doesn't usually make any noticeable difference to overall performance.

We use Python because it is a high level, expressive language that allows us to quickly develop readable, maintainable code. And for 99% of your code that is exactly what you should do - write clean code without worrying about performance at all. Concentrate your optimisation effort on those parts of the code that genuinely require it.

Apply good practice

Of course, we should always try to write good quality code. Fortunately, in a lot of cases high quality, Pythonic code also happens to be pretty efficient too. This type of efficiency is all good, and should always be our aim.

Optimisation has its downsides

Optimising code - that is, changing the code specifically to make it run faster - often comes at a cost. That is the main reason we don't want to optimise everything. As we mentioned before, optimising code is time-consuming, and that is wasted time if the optimisations provide no detectable improvement.

Highly optimised code often does things in non-standard or non-obvious ways, which can make the code less readable and more difficult to maintain. It can even lead to bugs if maintainers don't properly understand what is going on.

Optimisations often involve trade-offs. For example, sometimes optimised code might use more memory in order to run faster. If the optimised code doesn't result in any useful performance benefit, it is better to not introduce negative side effects.

Worst case. an inappropriate optimisation can sometimes take longer than the non-optimised code. For example, if your code does some complex pre-calculations to allow the body of a loop to run faster, but then that loop only executes a couple of times, it could be counter-productive.

Optimisations need to be justified, and where they are necessary they should be well documented for the sake of maintainability.

Profile and select areas for improvement

Profiling is a good way to identify which areas of the code to work on. It is generally best to start on the inner loops of the most serious bottlenecks and work outwards.

Speeding up an inner loop will often be enough. Remember that if a loop executes 10,000 times, then any improvement made inside the loop will be 10,000 times more effective than an improvement made outside the loop.

When profiling, it is useful to use a variety of realistic data, and maybe some corner cases too. Trivial data, or unusually short data, can give misleading timings.

Make it work then make it fast

There are many variants of this old adage, but the meaning is the same. Trying to develop non-trivial, optimised code in one step is very difficult, so we should break it down into stages:

  • Get the code working first (even if it is slow).
  • Test it, ideally with appropriate unit tests so that you can easily verify later versions.
  • Check the working code in, so you can always get back to it if you need to.
  • Then try to improve the speed by optimising the code.

The last step may well require multiple iterations, trying different optimisation techniques, before an acceptable result is obtained.

Strutural optimisation

Strutural optimisation looks at the big picture. Low level optimisation (below) looks at the fine detail of the code.

Strutural optimisation covers questions like:

  • Does the code contain unnecessary repetition, for example, large data structures being copied unnecessarily, or maybe a configuration file being read in at the start of every operation rather than once only?
  • Are the algorithms optimal? As a simple example if your code needs to search for values in a large array, then using a binary search or hash table can be far quicker than a linear search.
  • Could the code be speeded up by switching to a procedural or functional programming paradigm rather than using OOP? Python allows different paradigms to be mixed freely.
  • Are the data structures optimal? For example, list is often the go-to data structure used in Python. Lists are implemented as arrays with dynamic reallocation, but for certain applications, a linked list (such as collections.deque) or set might be faster.
  • Could compiled code libraries be used to things speed up? Libraries such as NumPy or Pandas can be used to process large amounts of data in a single Python call, with the data processing being performed by highly optimised C code.

Low-level optimisation

Low-level optimisation involves making line by line code changes to try to speed up the execution. It is usually applied to code that is executed many times as part of an inner loop.

Generally, this involves re-writing the code in a way that is different to the way you might naturally do the task, specifically to make it run faster. This is obviously not something we should be doing to large amounts of code unless we absolutely have to.

There are quite a few different techniques but some examples are:

  • Caching object attributes and non-local variables.
  • Delaying importing modules until they are needed.
  • Remapping functions at runtime.
  • Avoiding function calls by including the code inline.
  • Avoiding the use of objects in time-critical code, using local functions and variables instead.

To a greater or lesser extent, each of these techniques makes the code more difficult to read and maintain. For compiled languages like C, the compiler applies techniques like this automatically, and aggressively, so you have the best of both worlds - clean code but a high level of optimisation. As an interpreted language, Python can't do much automatic optimisation so we must do it manually.

In future articles, we will look at each of the main topics mentioned here, in more detail.

If you found this article useful, you might be interested in the book Python Quick Start or other books by the same author.


Popular tags

2d arrays abstract data type alignment and angle animation arc array arrays bar chart bar style behavioural pattern bezier curve built-in function callable object chain circle classes clipping close closure cmyk colour combinations comparison operator comprehension context context manager conversion count creational pattern data science data types decorator design pattern device space dictionary drawing duck typing efficiency ellipse else encryption enumerate fill filter font font style for loop function function composition function plot functools game development generativepy tutorial generator geometry gif global variable gradient greyscale higher order function hsl html image image processing imagesurface immutable object in operator index inner function input installing iter iterable iterator itertools join l system lambda function len lerp line line plot line style linear gradient linspace list list comprehension logical operator lru_cache magic method mandelbrot mandelbrot set map marker style matplotlib monad mutability named parameter numeric python numpy object open operator optimisation optional parameter or pandas partial application path pattern permutations pie chart polygon positional parameter print pure function python standard library radial gradient range recipes rectangle recursion reduce regular polygon repeat rgb rotation roundrect scaling scatter plot scipy sector segment sequence setup shape singleton slice slicing sound spirograph sprite square str stream string stroke structural pattern subpath symmetric encryption template text text metrics tinkerbell fractal transform translation transparency triangle truthy value tuple turtle unpacking user space vectorisation webserver website while loop zip