functools module

By Martin McBride, 2021-11-20

Tags: functools python standard library
Categories: python standard library

The functools module provides support for higher-order functions - that is functions that act on other functions or return other functions.

Python supports higher-order functions natively, so you can easily create your own higher-order functions. The functools module provides implementations of some commonly used techniques:

Caching the results of pure functions.
Implementing partial functions.
A generic reduce implementation
Single dispatch.
Function wrapping.
A few other utility functions.

This article uses Python 3.9. Some of the features here are not available in earlier versions.

Caching

In functional programming, we say that a function is pure if:

It has no external dependencies. That is, if you call the function twice with the same parameters it will always return the same result.
It has no side effects. When you call the function it only calculate the result, it doesn't make any changes that are detectable from outside the function, such as changing non-local variables or reading/writing IO streams, etc).

For more details see this article.

An important property of a pure function is that you can cache the result. For example:

def pure(x, y):
    return math.sqrt(x*x + y*y)

a = pure(1, 2)
b = pure(1, 2)

The function pure is a pure function. We have called it twice with the same parameters, so in fact we could have done this instead:

a = pure(1, 2)
b = a

In this case, we have only called the function once and re-used the value to set b. The overall effect of this code is completely indistinguishable from the first case, no other part of the code will be affected in any way, it will just run slightly faster.

Of course if pure wasn't a pure function, you would not be able to do this. For example if pure wrote the answer out to a file, the two cases would not be identical, because the first would write two values to the file, the second case would only write one.

Caching takes advantage of this property of pure functions. It stores the parameters and result whenever the function is called, and if the exact same parameters are used again it will simply return the stored result. There are several ways to do this, which we will look at here.

This type of caching is also sometimes called memoization.

cache

cache is a decorator that caches every combination of input parameters and results. It is used like this:

import functools

@cache
def pure(x, y):
    return math.sqrt(x*x + y*y)

a = pure(1, 2)
b = pure(1, 2)

When pure(1, 2) is called for the first time, the function is executed and the result is cached. When pure(1, 2) is executed again, the function is not called. The cached result is returned instead.

lru_cache

lru_cache is similar to cache, except that, by default, it only stores 128 values. It is used like this:

import functools

@lru_cache
def pure(x, y):
    return math.sqrt(x*x + y*y)

a = pure(1, 2)
b = pure(1, 2)

This behaves just like cache until the maximum number of values has been stored. At that point, it will throw away the least recently used (LRU) entry, to make room for a new entry. The total number of stored entries will never exceed 128.

The policy of throwing away the least recently used entry is useful if you are in a situation where you are more likely to repeat parameters that have been used recently.

Why would you want to throw previous values away, rather than retaining every value (like cache does)? There are two reasons:

The cache could get very big if you allow it to grow forever.
A larger cache will take longer to search, so each function call will take longer to execute.

lru_cache is a compromise, retaining the entries that are judged most likely to be needed again, but keeping the cache to a limited size.

There are optional parameters:

maxsize is the size of the cache, default 128. If this is set to None, the size is unlimited (similar to cache).
typed controls whether the type is taken into account when deciding if values are equal, default False.

To understand the typed parameter, consider these two functions calls:

pure(1, 2)
pure(1.0, 2)

If typed is False, these two calls would be considered the same, because they both pass the values 1 and 2. But if typed is True the two calls are considered to be different because the type of the first parameter is different. This is useful if the function might provide different results depending on the data type.

Pros and cons of caching

Caching potentially saves time by avoiding calling a function multiple times with the same parameters.

It has a couple of potential disadvantages. Firstly, it takes time to check the cache before each function call. This means that in every case where the call is not found in the cache, the call will take longer than it would without caching.

Caching is ideal if you are using a pure function that takes a long time to execute, and where calls with the same parameters are very likely to occur.

If you cache a function that is very quick to execute, the time spent checking the cache might take longer than simply calling the function. If you cache a function where repeated parameter values are quite rare, the time spent checking the cache might outweigh the savings on the rare occasions that a match is found.

The second disadvantage is that it only works if the function is pure. If you apply caching to a function that is not pure, you may well create serious bugs.

Whenever you apply caching to a function there is potential for problems if the function is not pure, or if at some point in the future the function is modified to no longer be pure. This is an ongoing risk. It can be mitigated by only caching simple functions that are inherently pure. You might also consider highlighting this risk with a comment. It is also a good reason to avoid using caching unless there is a reasonably significant benefit.