Pure functions

Martin McBride, 2019-09-11
Tags pure function lru_cache functools
Categories functional programming

Functional programming has the concept of pure functions. What are they, and why are they so useful?

Mathematical functions

Functional programming is based on the mathematical idea of functions. A mathematical function such as sin(x) simply returns a value. You give it an x value, it returns the value of the sine of x. If you give it the same x value you will always get the same answer.

Python functions are different to mathematical functions, because a Python function doesn't just calculate values, it can actually do things too. A python function can set a global variable that might influence the result of a different function when that is called. It might write something to disk, or send some data across the network. These influences are called side effects.

Side effects make programming, and debugging, much more difficult. You don't just have to think about what a particular function is doing, you also have to consider what other functions might have done previously that can affect things. Pure functions attempt to eliminate side effects.

Pure functions

The basic definition of a pure function is a function that doesn't cause or rely on side effects. The output of a pure function should only depend on its inputs.

There are two basic ways a function can cause side effects that directly affect other parts of the code. The first is by reading or writing global variables. For example:

gvalue = 0

def set_value(x):
global gvalue;
gvalue = x

def print_value():
print(gvalue)

set_value(3)
print_value()
set_value(5)
print_value()


Here, set_value isn't a pure function, because it set the value gvalue, which in turn affects how print_value behaves. print_value isn't a pure function either, because its output depends on the global variable. You can possible predict what print_value is going to print, without knowing how set_value was called before.

The other way that function can create side effects is by altering data structures. For example:

def tail(s):
del s[0]
return s

def print_value():
a = [1, 2, 3]
b = tail(a)
print(b)
print(a)

print_value()


Here the function tail accepts a list as input. It returns a list that contains all the elements except the first (head) element.

The function print_value calls tail, passing in the value a of [1, 2, 3]. The return value b is as we expect, [2, 3], the tail of the list.

However, when we print a after the call to tail, we see that it now contains [2, 3] as well. Calling tail has altered the list we passed into it, which is also a side effect. If print_value was expecting a to remain unchanged, it might not work properly.

A pure function must not alter the value of any data structure that is passed into it. This version of tail is not pure. We could create a pure version like this:

def tail(s):
return s[1:]

def print_value():
a = [1, 2, 3]
b = tail(a)
print(b)
print(a)

print_value()


This time, tail returns the slice s[1:] which contains a copy of the tail of s. The original list is not changed.

Other considerations

Functional that read or write data to disk can also cause unwanted side effects. For example if function_a writes data to a configuration file, and function_b reads that data, then neither function can be considered to be a pure function, because the interact (at least indirectly). Similar things can happen with functions that interact with a database, or exchange data over a network, where one function can indirectly influence another.

Of course, writing to a file doesn't automatically mean a function cannot be pure. For example, if a function simply writes to a log file, that might not cause any side effects provided no other parts of the program take actions based on what is written to the file.

A second aspect of pure functions that we mentioned earlier is that the output of a pure function should depend only on its inputs. Put another way, if you call it twice with the same inputs, you should always get the same result.

This is generally true of maths functions like sin or sqrt. There are some cases where it might not be true:

• Functions in the random module generate random values. Every time you call a random function you will get a different result. That is the whole point, of course, but it means the functions in this module are not strictly pure.
• The input function, which queries the user for an input value on the command line, returns a completely unpredictable result (whatever the user decided to type in), so is not a pure function.
• Any function that reads data from a file, database or network is also unpredictable and so not pure.

The main advantage of pure functions is predictability. Pure functions eliminate unexpected interactions that are the cause of so many bugs.

An additional benefit is that can make far easier to to use multithreading with your program. Imagine that you needed to run the same function on a very large number of data items. With pure code, you know that each time you call the function it will operate completely independently. So it doesn't matter what order you process the data in, you will still get the same result. You can split the data between different threads or even different computers, in parallel, without any danger of things getting out of step.

Finally, if you have pure functions where the output depends only on the input values, you can avoid having to calculate the same value more than once. For example if we needed to calculate the square root of these numbers:

[9, 16, 9, 25]


We could keep a record (a cache) of all the values we have already calculated, and the result. When we hit the second occurrence of 9 in the list, we could avoid calculating the value again, and simply return the previous value 3. If you are performing complex calculations on a set of data that has a lot of repeated values, this type of caching can be a major optimisation.

The functools module contains a decorator lru_cache that can apply this sort of caching more or less automatically.