This article is part of a series on functional programming.
As an example of functional programming, in this article we will use some Python built-in functions to analyse a text file. In the first example we will count the number of words in the file. In the second example we will extract the unique words.
The basic pattern we will be using is map/reduce. In this pattern the
map function is used to pre-process the input data. The
reduce function is
then used to combine the data to give the final result.
As an aside, Python has function equivalents of every operator. These functions live in the
operator module. For example, the
operator.add function is equivalent to the
+ operator. Therefore the following two lines of code do the same thing:
import operator c = operator.add(a, b) c = a + b
This is very useful because you can use
add wherever you need a function object that perform the equivalent of
+. There are functions
for every operator, including non-symbolic operators like
The simplest form of
map takes a function object and an iterable. It applies the function to each item in the iterable, and returns
an iterator that accesses the results. For example:
from operator import neg it = map(neg, [1, -3, 5, -7]) print(list(it))
Here, we apply the
neg function (again from the
operator module) to the list
[1, -3, 5, -7].
neg is a function that does the same job as the unary minus operator. This results in a iterator
Remember that iterators are lazy - they don't perform any calculations until they need to. If we try to print the iterator we would just see the iterator object (a
map object), because the actual work of the map operation hasn't been done yet. When we convert the iterator to to a list, the iterable is processed to obtain its elements:
[-1, 3, -1, 7]
map can take more than one iterable, provided the function object takes the same number of parameters. For example,
mul (the function equivalent of
*) takes two parameters, so
map accepts two iterables:
from operator import mul it = map(mul, [1, 2, 3, 4], [10, 100, 1000]) print(list(it))
This takes a number from each list, and multiplies them together: 1*10, 2*100, 3*1000. If the iterables have different lengths (as in this example) it stops when the shortest sequence is exhausted. So it prints:
[10, 200, 3000]
reduce function takes a sequence of values and reduces it to a single value by applying a function repeatedly, and accumulating the result.
Here is a simple example to illustrate how it works:
from functools import reduce from operator import add x = reduce(add, [1, 2, 3, 4]) print(x)
What reduce does is take the first two elements and combine them using the function (
add in this case). It then takes the result and
combines it with the next element, and so on. It calculates
In other words, it sums all the values. In fact, there is a built-in function
sum that does the same job - you should use that normally as
it is simpler and more efficient. This is just an illustration.
As a practical example, here is a list of strings (an excerpt from the Zen of Python):
zen = ['beautiful is better than ugly', 'explicit is better than implicit', 'simple is better than complex', 'complex is better than complicated', 'flat is better than nested', 'sparse is better than dense', 'readability counts']
Now imagine we want to count the total number of words in all these strings. We can use map/reduce:
mapto convert the list of strings into a sequence of number-of-words values for each string
reduceto convert the list of lengths to a single total number-of-words
Here is how this looks in code
from functools import reduce from operator import add def number_of_words(s): words = s.split(' ') return len(words) counts = map(number_of_words, zen) total = reduce(add, counts)
We could get rid of the
number_of_words function by using a lambda:
lambda x: len(x.split(' '))
You can also combine the
reduce into a single line. This is our code for finding the total number of words in a
list of sentences:
total = reduce(add, map(lambda x: len(x.split(' ')), zen))
If you think this is a bit too terse, you could leave it as separate lines - that is a matter of style. But if you are used to functional Python this code is quite readable.
Now suppose we want to count the number of unique words in the list of sentences. We could do it like this:
mapto convert each sentence into a set of words (a set will only contain one copy of each word, even if it appears more than once in the sentence).
reduceto combine the list of sets into a single set.
This example is a little different because, rather than obtaining a single integer, we are obtaining a set of values. We are using
reduce to take a groups of sets of words, and reduce it down to a single set of unique words. But that is a perfectly valid use of
Here is the
def set_of_words(s): words = s.split(' ') return set(words) sets = map(set_of_words, zen)
Once again, we use
split to convert a string to a list of words. But this time we use
set to convert the list into a set. This creates
a list of sets (one set for each sentence).
Next we want to find the union of all these sets:
(((s1 | s2) | s3) | s4) etc...
Just as we can use the
add function instead of the + operator, we can use the
or_ function instead of the | operator (don't forget
or_ from operator):
all_words = reduce(or_, sets) total = len(all_words)
Once again we can use a lambda and code the whole thing as one line:
total = len(reduce(or_, map(lambda x: set(x.split(' ')), zen))
In this case, of course we could have simply concatenated all our input strings together, and removed the need to bother with map/reduce. However, imagine that instead of processing a few sentences, we had a massive database with millions of sentences.
The first advantage of
map is that it uses lazy iteration, so we don't have to load all that data into memory. We only ever need to store one sentence in memory at a time.
The second advantage is that every
map operation is independent. That is the beauty of functional programming, we can map the data in any order because there are no side effects. If we wanted, we could split our database into parts, and run
map on several different computers, each processing its own section of the database. A final computer could run the reduce task, speeding the whole process up by many times.
The map/reduce pattern is often used in big data applications because it allows for large scale parallelisation of data processing. But even on smaller data sets, it is an interesting way of breaking problems down into functional units.
If you found this article useful you might be interested in my ebook Functional Programming in Python.
Copyright (c) Axlesoft Ltd 2020