Iterator/iterable protocol

By Martin McBride, 2021-09-03

Tags: magic method iterable iterator generator itertools
Categories: object protocols

Python has two related types, iterators and iterables, that support sequences of values.

The most common use of these types is the for loop. A for loop works with any iterable, and will execute once for each element in the iterable:

k = [10, 20, 30]
for x in k:
    print(x)

This code works because lists are iterables, and Python for loops know how to work with iterables.

Iterables and iterators

An iterable is a Python object that we can iterate over. It is often an object that contains data, for example, a list. When we iterate over a list, we just get the list items, one by one.

An iterator is an object that does the iterating. For example, with a list:

The list object holds the data.
A list_iterator object does the iterating.

One way to think of this is that an iterable is like a book (it has a list of pages), an iterator is like a bookmark (it tells us where we are in the book).

If we have an iterable object, we can get an iterable by calling the built-in iter function:

k = [10, 20, 30]
it = iter(k)
print(type(it))  # <class 'list_iterator'>

For a list, the iter function returns a specific iterator, of type list_iterator, that has been initialised specifically to iterate over the list k.

The iter function calls the magic method __iter__ on the object, as we will see later.

We can get values from the iterator using the built in next function:

print(next(it)) # 10
print(next(it)) # 20
print(next(it)) # 30
print(next(it)) # Throws StopIteration

Each call to next returns the next value from the list. This works for the first three calls, returning 10, 20 then 30. Since the list is only 3 elements long, the fourth call has no value to return. It throws a StopIteration exception.

The next function calls the magic method __next__ on the object, as we will see later.

An iterator can only be used once. When it runs out of values, it is no longer useful, there is no way to reset it to the beginning again. But of course, we can use iter to get a new iterator if we want to loop over the values again.

The steps we just went through are exactly what a for loop does, under the hood:

We supply the for loop with an iterable (such as a list).
It gets an iterator from the iterable.
It fetches values from the iterator, one by one.
When the iterator throws a StopIteration, the loop terminates.

However, Python bypasses the iter and next functions, it just calls the __iter__ and __next__ methods on the objects instead.

Why do we have separate iterables and iterators?

You might be wondering why we have iterables and iterators. Wouldn't it be easier if iterables just had a next method that you could use directly?

There is a very good reason for this. The iterator keeps track of where it is in the list of items. We could merge the iterator function into the iterable, and it would work in most cases. But what if we ever wanted to iterate over the same object twice, at the same time?

This is not quite as far fetched as it might sound. Here is an example:

k = [10, 20, 30]
for x in k:
    for y in k:
        print(x, y)

This code loop over k, and each time through the loop loops through k again. It prints every possible combination of pairs of values from k:

The reason this code works is that each for loop creates its own iterator. Although both iterators are working on the same list, they each keep track of where they are, so everything works out. If the iterator was maintained by the list itself, both loops would be incrementing the same iterator, so things would go wrong.

To go back to the bookmark analogy, suppose two people were both sharing the same book (iterable) - maybe one reads it in the mornings, the other reads it in the evenings. They would each need a separate bookmark (iterator), because at any given time they might each be on a different page of the book. There is nothing wrong with them both reading the same book, but if they tried to share the same bookmark things would go very wrong.

Every iterator is an iterable

As we have seen, Python for loops expect an iterable, and they use that iterable to obtain an iterator from the iterable, and loops over it.

But what if, for some reason, you had an iterator that you wanted to loop over? Well, you might think that would fail, because Python is asking an iterator to give it an iterator, and only iterables can do that.

That would be rather silly, of course, because you already have an iterator!

To avoid this nonsense, Python has a rule that every iterator must be an iterable too. So you can ask an iterator to give you an iterator. In most cases, it will simply return itself.

Here is an example:

k = [10, 20, 30]
it = iter(k)
it2 = iter(it)
print(it is it2) # True

Here, it is the iterator obtained from the list (an iterable) k. it2 is the iterator obtained from the iterator it. As the example shows, it2 is the same object as it. iter(it) just returns it.

Built-in iterables and iterators

An iterable is something that Python can iterate over. In other words, anything that you can use in a for loop.

This includes lists, tuples, strings, sets, dictionaries, arrays.

It also includes range objects. The function range(10) returns a range object that is an iterable and produces the sequence 0 to 9. That is how a basic for loop like this works:

for i in range(10): # Range returns a range object
    print(i)

range returns a range object. The for loop iterates over the range object.

The Python itertools module provides a useful set of additional iterators.

Creating your own iterables and iterators

Python also provides generators. A generator looks a lot like a function, but instead of returning a value, it creates an iterator. The body of the generator defines the sequence of values that the iterator will provide. If you require a specific iterator that isn't provided by itertools, you will usually be able to implement it using a generator. That will normally be the simplest option.

Here is an example of a generator that creates a geometric progression of length n. A geometric progression is one where each term is equal to the previous term multiplied by some value a. So if a is 2 and n is 6 the sequence would be:

2 4 8 16 32 64

Here is how we use a generator to create such an iterable:

def geometric(a, n):
    current = 1
    for i in range(n):
        current *= a
        yield(current)

for x in geometric(3, 5):
    print(x)

The generator geometric is similar to a function, but it has a yield statement to send back a value. Unlike a return statement of a function, the yield statement doesn't end the generator, it carries on creating more values until the loop ends.

Calling geoemetric returns a generator object, which is a type of iterator. When we loop over this iterator in a for loop, it will create a sequence of the first 5 powers of 3.

Alternatively, we can create your own iterable and iterator objects, as described below.

Creating your own iterators

To create an iterator, we should normally make a class that implements the __iter__ and __next__ functions:

__iter__ should return the object itself (see the discussion above).
__next__ should return the next item from the sequence, or throw a StopIteration exception when the sequence is ended.

Here is the geometric iterator implemented as a class:

class Geometric:
    def __init__(self, a, n):
        self.a = a
        self.n = n
        self.current = 1

    def __iter__(self):
        return self

    def __next__(self):
        if self.n <= 0:
            raise StopIteration
        self.current *= self.a
        self.n -= 1
        return self.current

for x in Geometric(3, 5):
    print(x)

Here we have implemented the two required functions to create an iterator. The basic logic is identical to the generator example.

Notice that Geometric is just a basic class. It doesn't inherit some special base class that makes it into an iterator. The class implements __iter__ and __next__, so Python treats it as an iterator. This is an example of duck typing.

The previous generator implementation is considerably shorter than this version and easier to read and understand. You should normally use a generator where possible. You would only need to use an iterator class if you need it to do extra things a generator can't handle.

Creating your own iterable

Iterators are fine for generating sequences. However, if you want to iterate over a data structure you will normally need to implement an iterable and an iterator.

As we saw earlier, if you want to iterate over a data structure, you will usually need a separate iterator.

As an example, we will make a class that holds an IP address as a sequence of 4 integers:

class IPAddress:
    def __init__(self, a, b, c, d):
        self.address = [a, b, c, d]

    def __iter__(self):
        return IPAddressIterator(self)

This class simply holds the 4 parts of the IP address in a list address. The __iter__ method returns an IPAddressIterator, passing itself as a parameter. We also need to write the iterator class:

class IPAddressIterator:
    def __init__(self, obj):
        self.obj = obj
        self.position = 0

    def __iter__(self):
        return self

    def __next__(self):
        if self.position >= 4:
            raise StopIteration
        value = self.obj.address[self.position]
        self.position += 1
        return value

This class stores the IPAddress as obj. Its __iter__ method returns itself.

The __next__ method uses position to store the current position in the IPAddress objects address. On the first call it will return address[0], then on the next call address[1], and so on. After 4 calls it will throw the StopIteration exception. This is fairly similar to the Geometric iterator, except that it is using an iterable as its source of values.

Using these two classes we can iterate over the values in an IP address:

for x in IPAddress(255, 0, 0, 1):
    print(x)

An alternative to the iterator protocol

There is an older style iteration protocol that can also be used to make objects iterable. To use this protocol, the object must implement:

__len__ that returns the length of the sequence.
__getitem(key)__ that returns the item identified by key, an integer in the range 0 to len - 1.

If an object is not iterable, and we use it in a for loop, Python will attempt to iterate the object by calling __getitem__ supplying key values that count up from 0. Here is an example:

class IPAddress:
    def __init__(self, a, b, c, d):
        self.address = [a, b, c, d]

    def __len__(self):
        return 4

    def __getitem__(self, i):
        return self.address[i]

for x in IPAddress(255, 0, 0, 1):
    print(x)

This implementation is quite a lot simpler. It doesn't require a separate iterator class because Python itself is keeping track of the current position in the object.

The only thing to bear in mind is that Python might make __getitem__ calls in any order. It won't necessarily present keys in the order 0, 1, 2, 3. So our code must be able to calculate the nth value directly, which makes it inefficient for values that can only be created sequentially.

If you found this article useful, you might be interested in the book NumPy Recipes or other books by the same author.