As an example of functional programming, in this article we will use some Python built in functions to analyse a text file. In the first example we will count the number of words in the file. In the second example we will extract the unique words.
The basic method is map/reduce. In this model the
map function is used to pre-process the input data. The
reduce function is
then used to combine the data to give the final result.
Built in operator functions
As an aside, Python has function equivalents of every operator. For example, the following two lines of code do the same thing:
c = operator.add(a, b) c = a + b
This is very useful because you can use
add wherever you need a function object that perform the equivalent of
+. There are functions
for every operator, including non-symbolic operators like
The map function
The simplest form of
map takes a function object and an iterable. It applies the function to each item in the iterable, and returns
an iterator containing the results. For example:
from operator import neg it = map(neg, [1, -3, 5, -7]) print(list(it))
Here, we apply the
neg function to the list
[1, -3, 5, -7].
neg is a built-in function that does the same job as the unary minus
operator. This results in a iterator
If we try to print the interator we would just see the iterator object (a
map object), we must convert it to a list to see its elements:
[-1, 3, -1, 7]
map can take more than one iterator, provided the function object takes the same number of parameters. For example:
from operator import mul it = map(mul, [1, 2, 3, 4], [10, 100, 1000]) print(list(it))
This takes a number from each list, and multiplies them together: 1*10, 2*100, 3*1000. If the iterables have different lengths (as in this example) it stops when the shortest sequence is exhausted. So it prints:
[10, 200, 3000]
The reduce function
reduce function takes a sequence of values and reduces it to a single value by applying a function repeatedly and accumulating the result.
Here is a simple example to illustrate how it works:
from functools import reduce from operator import add x = reduce(add, [1, 2, 3, 4]) print(x)
What reduce does is take the first two elements and combine them using the function (
add in this case). It then takes the result and
combines it with the next element, and so on. It calculates
In other words, it sums all the values. In fact, there is a built in function
sum that does the same job - you should use that normally as
it is simpler and more efficient. This is just an illustration.
As a practical example, here is a list of strings (an excerpt from the Zen of Python):
zen = ['beautiful is better than ugly', 'explicit is better than implicit', 'simple is better than complex', 'complex is better than complicated', 'flat is better than nested', 'sparse is better than dense', 'readability counts']
Now imagine we want to count the total number of words in all these strings. We can use map/reduce:
mapto convert the list of strings into a sequence of number-of-words values for each string
reduceto convert the list of lengths to a single total number-of-words
Here is how this looks in code
from functools import reduce from operator import add def number_of_words(s): words = s.split(' ') return len(words) counts = map(number_of_words, zen) total = reduce(add, counts)
We could get rid of the
number_of_words function by using a lambda:
lambda x: len(x.split(' '))
You can also combine the
reduce into a single line. This is our code for finding the total number of words in a
list of sentences:
total = reduce(add, map(lambda x: len(x.split(' ')), zen))
If you think this is a bit too terse, you could leave it as separate lines. But if you are used to functional Python this code is fairly readable.
Counting unique words
Now suppose we want to count the number of unique words in the list of sentences. We could do it like this:
mapto convert each sentence into a set of words (a set will only contain one copy of each word, even if it appears more than once in the sentence).
reduceto combine the list of sets into a single set.
- Find the length of the set.
Here is the
def set_of_words(s): words = s.split(' ') return set(words) sets = map(set_of_words, zen)
Once again, we use
split to convert a string to a list of words. But this time we use
set to convert the list into a set. This creates
a list of sets (one set for each sentence).
Next we want to find the union of all these sets:
(((s1 | s2) | s3) | s4) etc...
Just as we can use the
add function instead of the + operator, we can use the
or_ function instead of the | operator (don’t forget
or_ from operator):
all_words = reduce(or_, sets) total = len(all_words)
Once again we can use a lambda and code the whole thing as one line:
total = len(reduce(or_, map(lambda x: set(x.split(' ')), zen))
Why bother with map/reduce
In this case, of course we could have simply concatenated all our input strings together, and removed the need to bother with map/reduce. However, imagine that instead of processing a few setences, we had a massive database with millions of sentences.
The first advantage of
map is that it uses lazy iteration, so we don’t have to load all that data into memory. We only ever need to store
one sentence in memory at a time.
The second advantage is that every
map operation is independent. That is the beauty of functional programming, we can map the data in any
order because there are no side effects. If we wanted, we could split our database into parts, and run
map on several different computers,
each processing its own section of the database. A final computer could run the reduce task, speeding the whole process up by many times.
Map/reduce is often used in big data applications because it allows for large scale parallelisation of data processing.