Martin McBride, 2021-09-03
Tags magic method iterable iterator generator itertools
Categories object protocols
Python has two related types, iterators and iterables, that support sequences of values.
The most common use of these types is the for loop. A for loop works with any iterable, and will execute once for each element in the iterable:
k = [10, 20, 30] for x in k: print(x)
This code works because lists are iterables, and Python for loops know how to work with iterables.
Iterables and iterators
An iterable is a Python object that we can iterate over. It is often an object that contains data, for example, a list. When we iterate over a list, we just get the list items, one by one.
An iterator is an object that does the iterating. For example, with a list:
listobject holds the data.
list_iteratorobject does the iterating.
One way to think of this is that an iterable is like a book (it has a list of pages), an iterator is like a bookmark (it tells us where we are in the book).
If we have an iterable object, we can get an iterable by calling the built-in
k = [10, 20, 30] it = iter(k) print(type(it)) # <class 'list_iterator'>
For a list, the
iter function returns a specific iterator, of type
list_iterator, that has been initialised specifically to iterate over the list
iter function calls the magic method
__iter__ on the object, as we will see later.
We can get values from the iterator using the built in
print(next(it)) # 10 print(next(it)) # 20 print(next(it)) # 30 print(next(it)) # Throws StopIteration
Each call to
next returns the next value from the list. This works for the first three calls, returning 10, 20 then 30. Since the list is only 3 elements long, the fourth call has no value to return. It throws a
next function calls the magic method
__next__ on the object, as we will see later.
An iterator can only be used once. When it runs out of values, it is no longer useful, there is no way to reset it to the beginning again. But of course, we can use
iter to get a new iterator if we want to loop over the values again.
The steps we just went through are exactly what a for loop does, under the hood:
- We supply the for loop with an iterable (such as a list).
- It gets an iterator from the iterable.
- It fetches values from the iterator, one by one.
- When the iterator throws a
StopIteration, the loop terminates.
However, Python bypasses the
next functions, it just calls the
__next__ methods on the objects instead.
Why do we have separate iterables and iterators?
You might be wondering why we have iterables and iterators. Wouldn't it be easier if iterables just had a
next method that you could use directly?
There is a very good reason for this. The iterator keeps track of where it is in the list of items. We could merge the iterator function into the iterable, and it would work in most cases. But what if we ever wanted to iterate over the same object twice, at the same time?
This is not quite as far fetched as it might sound. Here is an example:
k = [10, 20, 30] for x in k: for y in k: print(x, y)
This code loop over
k, and each time through the loop loops through
k again. It prints every possible combination of pairs of values from
10 10 10 20 10 30 20 10 20 20 20 30 30 10 30 20 30 30
The reason this code works is that each for loop creates its own iterator. Although both iterators are working on the same list, they each keep track of where they are, so everything works out. If the iterator was maintained by the list itself, both loops would be incrementing the same iterator, so things would go wrong.
To go back to the bookmark analogy, suppose two people were both sharing the same book (iterable) - maybe one reads it in the mornings, the other reads it in the evenings. They would each need a separate bookmark (iterator), because at any given time they might each be on a different page of the book. There is nothing wrong with them both reading the same book, but if they tried to share the same bookmark things would go very wrong.
Every iterator is an iterable
As we have seen, Python for loops expect an iterable, and they use that iterable to obtain an iterator from the iterable, and loops over it.
But what if, for some reason, you had an iterator that you wanted to loop over? Well, you might think that would fail, because Python is asking an iterator to give it an iterator, and only iterables can do that.
That would be rather silly, of course, because you already have an iterator!
To avoid this nonsense, Python has a rule that every iterator must be an iterable too. So you can ask an iterator to give you an iterator. In most cases, it will simply return itself.
Here is an example:
k = [10, 20, 30] it = iter(k) it2 = iter(it) print(it is it2) # True
it is the iterator obtained from the list (an iterable)
it2 is the iterator obtained from the iterator
it. As the example shows,
it2 is the same object as
iter(it) just returns
Built-in iterables and iterators
An iterable is something that Python can iterate over. In other words, anything that you can use in a for loop.
This includes lists, tuples, strings, sets, dictionaries, arrays.
It also includes range objects. The function
range(10) returns a range object that is an iterable and produces the sequence 0 to 9. That is how a basic for loop like this works:
for i in range(10): # Range returns a range object print(i)
range returns a range object. The for loop iterates over the range object.
itertools module provides a useful set of additional iterators.
Creating your own iterables and iterators
Python also provides generators. A generator looks a lot like a function, but instead of returning a value, it creates an iterator. The body of the generator defines the sequence of values that the iterator will provide. If you require a specific iterator that isn't provided by
itertools, you will usually be able to implement it using a generator. That will normally be the simplest option.
Here is an example of a generator that creates a geometric progression of length
n. A geometric progression is one where each term is equal to the previous term multiplied by some value
a. So if
a is 2 and
n is 6 the sequence would be:
2 4 8 16 32 64
Here is how we use a generator to create such an iterable:
def geometric(a, n): current = 1 for i in range(n): current *= a yield(current) for x in geometric(3, 5): print(x)
geometric is similar to a function, but it has a yield statement to send back a value. Unlike a return statement of a function, the yield statement doesn't end the generator, it carries on creating more values until the loop ends.
geoemetric returns a
generator object, which is a type of iterator. When we loop over this iterator in a for loop, it will create a sequence of the first 5 powers of 3.
Alternatively, we can create your own iterable and iterator objects, as described below.
Creating your own iterators
To create an iterator, we should normally make a class that implements the
__iter__should return the object itself (see the discussion above).
__next__should return the next item from the sequence, or throw a
StopIterationexception when the sequence is ended.
Here is the geometric iterator implemented as a class:
class Geometric: def __init__(self, a, n): self.a = a self.n = n self.current = 1 def __iter__(self): return self def __next__(self): if self.n <= 0: raise StopIteration self.current *= self.a self.n -= 1 return self.current for x in Geometric(3, 5): print(x)
Here we have implemented the two required functions to create an iterator. The basic logic is identical to the generator example.
Geometric is just a basic class. It doesn't inherit some special base class that makes it into an iterator. The class implements
__next__, so Python treats it as an iterator. This is an example of duck typing.
The previous generator implementation is considerably shorter than this version and easier to read and understand. You should normally use a generator where possible. You would only need to use an iterator class if you need it to do extra things a generator can't handle.
Creating your own iterable
Iterators are fine for generating sequences. However, if you want to iterate over a data structure you will normally need to implement an iterable and an iterator.
As we saw earlier, if you want to iterate over a data structure, you will usually need a separate iterator.
As an example, we will make a class that holds an IP address as a sequence of 4 integers:
class IPAddress: def __init__(self, a, b, c, d): self.address = [a, b, c, d] def __iter__(self): return IPAddressIterator(self)
This class simply holds the 4 parts of the IP address in a list
__iter__ method returns an
IPAddressIterator, passing itself as a parameter. We also need to write the iterator class:
class IPAddressIterator: def __init__(self, obj): self.obj = obj self.position = 0 def __iter__(self): return self def __next__(self): if self.position >= 4: raise StopIteration value = self.obj.address[self.position] self.position += 1 return value
This class stores the
__iter__ method returns itself.
__next__ method uses
position to store the current position in the
IPAddress objects address. On the first call it will return address, then on the next call address, and so on. After 4 calls it will throw the
StopIteration exception. This is fairly similar to the
Geometric iterator, except that it is using an iterable as its source of values.
Using these two classes we can iterate over the values in an IP address:
for x in IPAddress(255, 0, 0, 1): print(x)
An alternative to the iterator protocol
There is an older style iteration protocol that can also be used to make objects iterable. To use this protocol, the object must implement:
__len__that returns the length of the sequence.
__getitem(key)__that returns the item identified by
key, an integer in the range 0 to
len - 1.
If an object is not iterable, and we use it in a for loop, Python will attempt to iterate the object by calling
key values that count up from 0. Here is an example:
class IPAddress: def __init__(self, a, b, c, d): self.address = [a, b, c, d] def __len__(self): return 4 def __getitem__(self, i): return self.address[i] for x in IPAddress(255, 0, 0, 1): print(x)
This implementation is quite a lot simpler. It doesn't require a separate iterator class because Python itself is keeping track of the current position in the object.
The only thing to bear in mind is that Python might make
__getitem__ calls in any order. It won't necessarily present keys in the order 0, 1, 2, 3. So our code must be able to calculate the nth value directly, which makes it inefficient for values that can only be created sequentially.