Collections

By Martin McBride, 2021-08-28
Tags: collection len iter in
Categories: magic methods


A collection is a general name for types of object that contain a group of other items. Examples are lists, tuples, dictionaries and sets.

You can create your own collections. Our example Matrix class is a collection. In a later article we will look at a priority queue class, another collection.

Collection behaviour

There are various common types of behaviour that list, tuples etc share. We will probably want to emulate some of those behaviours in own collections.

  • You can find the length (number of elements) in a collection using the built-in len function.
  • You can access (read, write or delete) elements using [] notation. Some collections, such as lists, use integers or slices to select elements. Some, such as dictionaries, can use string or even tuple values as indices.
  • You can loop over elements in a collection using a for loop.
  • You can check if a collection contains an element using the in operator.

We can do all these things with our own custom collections, by defining the relevant magic methods as defined below. Of course, not every collection has to implement every behaviour.

We will extend our Matrix class as as example.

Supporting len

We can support the built-in len function by adding a __len__ method to our Matrix class:

class Matrix:

    def __init__(self, a, b, c, d):
        self.data = [a, b, c, d]

    def __str__(self):
        return '[{}, {}][{}, {}]'.format(self.data[0],
                                         self.data[1],
                                         self.data[2],
                                         self.data[3])

    def __len__(self):
        return 4


a = Matrix(1, 2, 3, 4)
print(len(a))

Since our Matrix class always has exactly 4 elements, our __len__ method always returns 4. In most cases, you will need to determine the current size of your collection and return that.

Getting and setting elements

We can get items in our collection, using the list style [] notation. We just need to define __getitem__ like this:

class Matrix:

    def __init__(self, a, b, c, d):
        self.data = [a, b, c, d]

    def __str__(self):
        return '[{}, {}][{}, {}]'.format(self.data[0],
                                         self.data[1],
                                         self.data[2],
                                         self.data[3])

    def __getitem__(self, i):
        return self.data[i]


a = Matrix(1, 2, 3, 4)
print(a[1])

When we print a[1], Python calls the __getitem__ method with i set to 1, which returns the value of element 1 of the matrix data.

We can set an item by defining the __setitem__ method and adding it to the Matrix class:

    def __setitem__(self, i, value):
        self.data[i] = value

Here is the code to test this:

a[2] = 10
print(a)

Which sets element 2 to value 10 (it does this by calling __setitem__ with i set to 2 and value set to 10). The code prints the new value of a which is:

[1, 2][10, 4]

Deleting items

Many collections allow you to delete an item using the following syntax:

del a[1]   #Remove element at position 1

Our Matrix class, by definition, always has exactly 4 elements, so it doesn't support delete. If it did we could implement it like this:

    def __delitem__(self, i):
        del self.data[i]

Now if we use del a[1] on a Matrix object a, it will remove element 1 from the self.data list, leaving just 3 elements. However, other functions in the Matrix class rely on there being 4 elements - which is fine, because the matrix is supposed to have 4 elements. For example __str__ accesses element index 3, which will no longer exist after running del. This means that print won't work for Matrix after running del.

The code above is for illustration, it is how you might implement del for other collections. But the Matrix class, by its nature, doesn't support del.

For loops

Our Matrix class automatically supports for loops:

a = Matrix(1, 2, 3, 4)
for x in a:
    print(x)

This works because the Matrix object supports __getitem__. Python implements the for loop by calling __getitem__ with values 0, then 1, then 2 etc, and loops while ever a valid result is obtained. As soon as the call throws an exception, the loop will end.

This works well for simple cases like Matrix. The alternative is to define the __iter__ method, which must return an iterator that accesses the elements. In our case, since our data is held in a list, we can use the built-in iter function to obtain an iterator for the data, and return that. Here is our __iter__ method:

    def __iter__(self):
        return iter(self.data)

Using __iter__ might be slightly more efficient, as it uses the list's built-in iterator. You can also override __iter__ if you need special behaviour.

The in operator

Matrix supports the in operator (and the not in operator):

a = Matrix(1, 2, 3, 4)
print(3 in a, 5 in a)

The default implementation works by iterating the object, effectively using a for loop. It will use __iter__ if it exists, otherwise it will use __getitem__.

You can provide your own implementation by defining a __contains__ method, that accepts a value and returns True if the value is in the collection, or False otherwise. In our case, we can use the in operator on self.data:

    def __contains__(self, value):
        return value in self.data

The full code

Here is the complete code for our Matrix collection implementation:

class Matrix:

    def __init__(self, a, b, c, d):
        self.data = [a, b, c, d]

    def __str__(self):
        return '[{}, {}][{}, {}]'.format(self.data[0],
                                         self.data[1],
                                         self.data[2],
                                         self.data[3])

    def __getitem__(self, i):
        return self.data[i]

    def __setitem__(self, i, value):
        self.data[i] = value

    def __iter__(self):
        return iter(self.data)

    def __contains__(self, value):
        return value in self.data

See also

If you found this article useful, you might be interested in the book NumPy Recipes or other books by the same author.

Join the PythonInformer Newsletter

Sign up using this form to receive an email when new content is added:

Popular tags

2d arrays abstract data type alignment and angle animation arc array arrays bar chart bar style behavioural pattern bezier curve built-in function callable object chain circle classes clipping close closure cmyk colour combinations comparison operator comprehension context context manager conversion count creational pattern data science data types decorator design pattern device space dictionary drawing duck typing efficiency ellipse else encryption enumerate fill filter font font style for loop formula function function composition function plot functools game development generativepy tutorial generator geometry gif global variable gradient greyscale higher order function hsl html image image processing imagesurface immutable object in operator index inner function input installing iter iterable iterator itertools join l system lambda function latex len lerp line line plot line style linear gradient linspace list list comprehension logical operator lru_cache magic method mandelbrot mandelbrot set map marker style matplotlib monad mutability named parameter numeric python numpy object open operator optimisation optional parameter or pandas partial application path pattern permutations pie chart pil pillow polygon pong positional parameter print product programming paradigms programming techniques pure function python standard library radial gradient range recipes rectangle recursion reduce regular polygon repeat rgb rotation roundrect scaling scatter plot scipy sector segment sequence setup shape singleton slice slicing sound spirograph sprite square str stream string stroke structural pattern subpath symmetric encryption template tex text text metrics tinkerbell fractal transform translation transparency triangle truthy value tuple turtle unpacking user space vectorisation webserver website while loop zip zip_longest