Martin McBride, 2020-03-08
Tags tuple named tuple dictionary
Categories python language intermediate python
Tuples are a great way to create ad hoc data structures. Named tuples extend this idea by allowing the values within a tuple to be referred to by name. We will start with a quick recap on tuples before looking at named tuples
Tuples are often used to group several related data values. For example we could a colour is often represented by 3 values: red, green and blue. We can store this as a tuple like this:
background = (1, 0.5, 0) # (r, g, b)
We can use tuple packing to return a colour value from a function:
def get_background_color(): # Do something to get r, g, b values return r, g, b color = get_background_color()
There are two ways to access the values in a tuple - unpacking and indexing:
r, g, b = color green = color
The first line unpacks the three elements of
color into the variables
b. The second line gets the second element of
color and stores it in the variable
Both these method rely on you remembering the order of the elements in the tuple. That is fine for RGB colours because they have a natural order. It becpomes more difficult if you have a record holding, for example, employee information. It could include first name, surname, employee number, job title, etc, but there is no obvious way to be sure what order they are stored in, and it would be very easy to make a mistake.
Wouldn't it be great if you could access the fields by name?
Defining a named tuple
namedtuple give you the ability to name the individual elements of a tuple. You can use the names when you define the tuple, and also to access the tuple elements.
Here is how we define a
namedtuple. Note that we need to import it from
from collections import namedtuple Color = namedtuple('Color', ['red', 'green', 'blue'])
This creates a new class that implements a specific type of
namedtuple. The first parameter of the
namedtuple call is the name of the new class - we are going to call it 'Color'. The second parameter is a list of the field names. We are defining 3 fields, called 'red', 'green', and 'blue'.
namedtuple function doesn't return a named tuple. It actually returns a factory function that creates new named tuples of class
Color. We assign this factory method to a variable
Color. It is common to use the same name for the class and the factory, but you don't have to.
You can define the fields of a named tuple using a string instead of a list, for example:
Color = namedtuple('Color', 'red, green, blue'])
This has exactly the same effect as the list in the previous example. The identifiers in the string can be separated by whitespace, or commas, or both.
Creating and using named tuples
Having defined our named tuple, we can create instances of it using the factory function
Color, like this:
color1 = Color(red=1, green=0.5, blue=0) print(color1) color2 = Color(blue=1, red=1, green=0) print(color2)
Color(red=1, green=0.5, blue=0) Color(red=1, green=0, blue=1)
Unlike a normal tuple, this constructor used named arguments. We can define the 'red', 'green' and 'blue' values in any order, we don't need to worry about what order they are stored within the tuple itself.
We can also access the elements by name, like this:
print(color1.red) # Prints 1
Again, we can access the elements of the named tuple without needing to remember the order they are stored in.
There are some restrictions on the names of fields. The first restriction is that all field names must be valid Python identifiers - that is to say, they must be names that would be valid as variable names, ie:
- They must be a combination of letters a-z, letters A-Z, digits 1-9 and underscore characters.
- The must not cannot start with a digit.
- They must not be Python keywords (if, for, def etc).
The second restriction is that the names must not start with an underscore. This is because names that start with underscores are reserved for special named tuple utility functions, see below.
Named tuples have all the features of regular tuples
Named tuples are a special type of tuple, with extra features. But you can still use them like normal tuples as well if you prefer. You can use positional or named parameters to create the tuple. The positional order is red, green, blue as per the original definition of Color.
color2 = Color(blue=1, red=1, green=0) print(color2) color3 = Color(0, 0, 1) # Positional, must be in order r, g, b print(color3) color4 = Color(0.5, blue=1, green=0.5) # Positional red, then named blue, green print(color4)
Color(red=1, green=0, blue=1) Color(red=0, green=0, blue=1) Color(red=0.5, green=0.5, blue=1)
You can access elements using names or indices:
print(color2.red) # Named red field print(color2.) # Indexed field 0, which is red because order is r, g, b
You can also use unpacking, again the order is r, g, b due to the definition of Color:
r, g, b = color4
Named tuples come with a few additional methods that can be quite useful. The method names all start with underscores - that is why field names aren't allowed to start with underscores. Doing this avoids any of the built in methods clashing with field names you might want to use.
_make(iter) will create a new named tuple instance from a sequence or any other iterable. For example:
k = [1, 0, 0.5] color5 = Color._make(k) print(color5) #Color(red=1, green=0, blue=0.5)
Of course you could alternatively use:
k = [1, 0, 0.5] color5 = Color(*k)
_asdict() creates a
dict object with the fields and values:
color = Color(1, 0, 0.5) d = color._asdict() print(d)
dict([('red', 1), ('green', 0), ('blue', 0.5)])
This is useful if you want to iterate over the keys.
Note that Python versions 3.1 to 3.7 return an
OrderedDict, a special version of
dictthat has preserves the order of the entries (so the entries will always be in the order defined for the named tuple: red, green, blue in this case. As of Python 3.8 ordinary
dictobjects are ordered automatically so an ordinary
_fields is a tuple containing the fields of the named tuple (but not the values):
color = Color(1, 0, 0.5) t = color._fields print(t)
_fields not a function, it is just a data member of the named tuple. The code above gives:
('red', 'green', 'blue')
_replace() creates a new named tuple, of the same type as the original, but with one or more of its values changed. For example:
k = [1, 0, 0.5] color = Color(1, 0, 0.5) color2 = color._replace(red=0.3, blue=0.6) print(color2)
This creates a new named tuple, replacing the red and blue fields with new value but leaving the green unchanged. So
Color(red=0.3, green=0, blue=0.6)
Under the hood
When you define a named tuple, Python actually creates a new class. The new class is created dynamically from the supplied field names. So our
Color named tuple class has data members called
blue. If we were to create a named tuple to hold an (x, y) screen coordinate, it would have data members named
This makes named tuples very efficient, compared to, say, a dictionary. We could use dictionaries to represent a colour:
color = dict(red=1, green=0.5, blue=0) print(color['blue'])
The main problem here is that every colour dictionary we define has to keep its own copies of the strings 'red', 'green' and 'blue'. Python might optimise this to some extent by reusing constant strings, but a dictionary object is still larger than a tuple.
A named tuple instance occupies more or less the same amount of memory as regular tuple.
Named tuples provide the convenience of using tuples as an ad hoc way of grouping related data items to create a record, but they have the added advantage of allowing you to create and read the individual fields by name as well as index.