functools module

By Martin McBride, 2021-11-20
Tags: functools python standard library
Categories: python standard library


The functools module provides support for higher-order functions - that is functions that act on other functions or return other functions.

Python supports higher-order functions natively, so you can easily create your own higher-order functions. The functools module provides implementations of some commonly used techniques:

  • Caching the results of pure functions.
  • Implementing partial functions.
  • A generic reduce implementation
  • Single dispatch.
  • Function wrapping.
  • A few other utility functions.

This article uses Python 3.9. Some of the features here are not available in earlier versions.

Caching

In functional programming, we say that a function is pure if:

  • It has no external dependencies. That is, if you call the function twice with the same parameters it will always return the same result.
  • It has no side effects. When you call the function it only calculate the result, it doesn't make any changes that are detectable from outside the function, such as changing non-local variables or reading/writing IO streams, etc).

For more details see this article.

An important property of a pure function is that you can cache the result. For example:

def pure(x, y):
    return math.sqrt(x*x + y*y)

a = pure(1, 2)
b = pure(1, 2)

The function pure is a pure function. We have called it twice with the same parameters, so in fact we could have done this instead:

a = pure(1, 2)
b = a

In this case, we have only called the function once and re-used the value to set b. The overall effect of this code is completely indistinguishable from the first case, no other part of the code will be affected in any way, it will just run slightly faster.

Of course if pure wasn't a pure function, you would not be able to do this. For example if pure wrote the answer out to a file, the two cases would not be identical, because the first would write two values to the file, the second case would only write one.

Caching takes advantage of this property of pure functions. It stores the parameters and result whenever the function is called, and if the exact same parameters are used again it will simply return the stored result. There are several ways to do this, which we will look at here.

This type of caching is also sometimes called memoization.

cache

cache is a decorator that caches every combination of input parameters and results. It is used like this:

import functools

@cache
def pure(x, y):
    return math.sqrt(x*x + y*y)

a = pure(1, 2)
b = pure(1, 2)

When pure(1, 2) is called for the first time, the function is executed and the result is cached. When pure(1, 2) is executed again, the function is not called. The cached result is returned instead.

lru_cache

lru_cache is similar to cache, except that, by default, it only stores 128 values. It is used like this:

import functools

@lru_cache
def pure(x, y):
    return math.sqrt(x*x + y*y)

a = pure(1, 2)
b = pure(1, 2)

This behaves just like cache until the maximum number of values has been stored. At that point, it will throw away the least recently used (LRU) entry, to make room for a new entry. The total number of stored entries will never exceed 128.

The policy of throwing away the least recently used entry is useful if you are in a situation where you are more likely to repeat parameters that have been used recently.

Why would you want to throw previous values away, rather than retaining every value (like cache does)? There are two reasons:

  • The cache could get very big if you allow it to grow forever.
  • A larger cache will take longer to search, so each function call will take longer to execute.

lru_cache is a compromise, retaining the entries that are judged most likely to be needed again, but keeping the cache to a limited size.

There are optional parameters:

  • maxsize is the size of the cache, default 128. If this is set to None, the size is unlimited (similar to cache).
  • typed controls whether the type is taken into account when deciding if values are equal, default False.

To understand the typed parameter, consider these two functions calls:

  • pure(1, 2)
  • pure(1.0, 2)

If typed is False, these two calls would be considered the same, because they both pass the values 1 and 2. But if typed is True the two calls are considered to be different because the type of the first parameter is different. This is useful if the function might provide different results depending on the data type.

Pros and cons of caching

Caching potentially saves time by avoiding calling a function multiple times with the same parameters.

It has a couple of potential disadvantages. Firstly, it takes time to check the cache before each function call. This means that in every case where the call is not found in the cache, the call will take longer than it would without caching.

Caching is ideal if you are using a pure function that takes a long time to execute, and where calls with the same parameters are very likely to occur.

If you cache a function that is very quick to execute, the time spent checking the cache might take longer than simply calling the function. If you cache a function where repeated parameter values are quite rare, the time spent checking the cache might outweigh the savings on the rare occasions that a match is found.

The second disadvantage is that it only works if the function is pure. If you apply caching to a function that is not pure, you may well create serious bugs.

Whenever you apply caching to a function there is potential for problems if the function is not pure, or if at some point in the future the function is modified to no longer be pure. This is an ongoing risk. It can be mitigated by only caching simple functions that are inherently pure. You might also consider highlighting this risk with a comment. It is also a good reason to avoid using caching unless there is a reasonably significant benefit.

If you found this article useful, you might be interested in the book NumPy Recipes or other books by the same author.

Join the PythonInformer Newsletter

Sign up using this form to receive an email when new content is added:

Popular tags

2d arrays abstract data type alignment and angle animation arc array arrays bar chart bar style behavioural pattern bezier curve built-in function callable object chain circle classes clipping close closure cmyk colour combinations comparison operator comprehension context context manager conversion count creational pattern data science data types decorator design pattern device space dictionary drawing duck typing efficiency ellipse else encryption enumerate fill filter font font style for loop formula function function composition function plot functools game development generativepy tutorial generator geometry gif global variable gradient greyscale higher order function hsl html image image processing imagesurface immutable object in operator index inner function input installing iter iterable iterator itertools join l system lambda function latex len lerp line line plot line style linear gradient linspace list list comprehension logical operator lru_cache magic method mandelbrot mandelbrot set map marker style matplotlib monad mutability named parameter numeric python numpy object open operator optimisation optional parameter or pandas partial application path pattern permutations pie chart pil pillow polygon pong positional parameter print product programming paradigms programming techniques pure function python standard library radial gradient range recipes rectangle recursion reduce regular polygon repeat rgb rotation roundrect scaling scatter plot scipy sector segment sequence setup shape singleton slice slicing sound spirograph sprite square str stream string stroke structural pattern subpath symmetric encryption template tex text text metrics tinkerbell fractal transform translation transparency triangle truthy value tuple turtle unpacking user space vectorisation webserver website while loop zip zip_longest