Python informer

Improve your Python coding skills

Collections

A collection is a general name for types of object that contain a group of other items. Examples are lists, tuples, dictionaries and sets.

You can create your own collections. Our example Matrix class is a collection. In a later article we will look at a priority queue class, another collection.

Collection behaviour

There are various common types of behaviour that list, tuples etc share. We will probably want to emulate some of those behaviours in own collections.

  • You can find the length (number of elements) in a collection using the built in len function.
  • You can access (read, write or delete) elements using [] notation. Some collections, such as lists, use integers or slices to select elements. Some, such as dictionaries, can use string or even tuple values as indices.
  • You can loop over elements in a collection using a for loop.
  • You can check if a collection contains an element using the in operator.

We can do all these things with our own custom collections, by defining the relevant magic methods as defined below. Of course, not every collection has to implement ever behaviour.

We will extend our Matrix class as as example.

Supporting len

We can support the built in len function by adding a __len__ method to our Matrix class:

class Matrix:

    def __init__(self, a, b, c, d):
        self.data = [a, b, c, d]

    def __str__(self):
        return '[{}, {}][{}, {}]'.format(self.data[0],
                                         self.data[1],
                                         self.data[2],
                                         self.data[3])

    def __len__(self):
        return 4


a = Matrix(1, 2, 3, 4)
print(len(a))

Since our Matrix class always has exactly 4 elements, our __len__ method always returns 4. In most cases, you will need to determine the current size of your collection and return that.

Getting and setting elements

We can get items in our collection, using the list style [] notation. We just need to define __getitem__ like this:

class Matrix:

    def __init__(self, a, b, c, d):
        self.data = [a, b, c, d]

    def __str__(self):
        return '[{}, {}][{}, {}]'.format(self.data[0],
                                         self.data[1],
                                         self.data[2],
                                         self.data[3])

    def __getitem__(self, i):
        return self.data[i]


a = Matrix(1, 2, 3, 4)
print(a[1])

When we print a[1], Python calls the __getitem__ method with i set to 1, which returns the value of element 1 of the matrix data.

We can set an item by defining the __setitem__ method and adding it to the Matrix class:

    def __setitem__(self, i, value):
        self.data[i] = value

Here is the code to test this:

a[2] = 10
print(a)

Which sets element 2 to value 10 (it does this by calling __setitem__ with i set to 2 and value set to 10). The ode prints the new value of a which is:

[1, 2][10, 4]

Deleting items

Many collections allow you to delete an item using the following syntax:

del a[1]   #Remove element at position 1

Our Matrix class, by definition, always has exactly 4 elements, so it doesn’t support delete. If it did we could implement it like this:

    def __delitem__(self, i):
        del self.data[i]

Now if we use del a[1] on a Matrix object a, it will remove element 1 from the self.data list, leaving just 3 elements. However, other functions in the Matrix class rely on there being 4 elements - which is fine, because the matrix is supposed to have 4 elements. For example __str__ accesses element index 3, which will no longer exist after running del. This means that print won’t work for Matrix after running del.

The code above is for illustration, it is how you might implement del for other collections. But the Matrix class, by its nature, doesn’t support del.

For loops

Our Matrix class automatically supports for loops:

a = Matrix(1, 2, 3, 4)
for x in a:
    print(x)

This works because the Matrix object supports __getitem__. Python implements the for loop by calling __getitem__ with values 0, then 1, then 2 etc, and loops while ever a valid result is obtained. As soon as the call throws an exception, the loop will end.

This works well for simple cases like Matrix. The alternative is to define the __iter__ method, which must return an iterator that accesses the elements. In our case, since our data is held in a list, we can use the built in iter function to obtain an iterator for the data, and return that. Here is our __iter__ method:

    def __iter__(self):
        return iter(self.data)

Using __iter__ might be slightly more efficient, as it uses the list’s built in iterator. You can also override __iter__ if you need special behaviour.

The in operator

Matrix supports the in operator (and the not in operator):

a = Matrix(1, 2, 3, 4)
print(3 in a, 5 in a)

The default implementation works by iterating the object, effectively using a for loop. It will use __iter__ if it exists, otherwise it will use __getitem__.

You can provide your own implementation by defining a __contains__ method, that accepts a value and returns True if the value is in the collection, or False otherwise. In our case, we can use the in operator on self.data:

    def __contains__(self, value):
        return value in self.data

The full code

Here is the complete code for our Matrix collection implementation:

class Matrix:

    def __init__(self, a, b, c, d):
        self.data = [a, b, c, d]

    def __str__(self):
        return '[{}, {}][{}, {}]'.format(self.data[0],
                                         self.data[1],
                                         self.data[2],
                                         self.data[3])

    def __getitem__(self, i):
        return self.data[i]

    def __setitem__(self, i, value):
        self.data[i] = value

    def __iter__(self):
        return iter(self.data)

    def __contains__(self, value):
        return value in self.data