For the Record
One of the principles of software engineering (albeit one most prominent in OO-leaning circles) is that of “encapsulation”. Large programs are built out of smaller units, and the more each of those units keeps to itself the better. Wikipedia (this century’s Webster’s dictionary) defines “Encapsulation” as:
the packing of data and functions into a single component.
But what is a “component”? In OO land, it’s a “class”, or an instance thereof: a big ball of data with lots of methods sticking out that you can interact with to retrieve, change, or use that data.
Artist’s depiction of a typical class.
Classes are the basis of object-oriented programming. They serve as a type, a collection of behavior, and a container for state. In this programmer’s decreasingly-humble opinion, anything with so many responsibilities, classes included, should generally be regarded with suspicion – especially in a language that offers the power to create less hairy alternatives.
Classes have 4 basic roles:
- Store and identify structured data
- Provide functions that operate on or with that data
- Facilitate dispatch of these functions (i.e. “methods”)
- Enable partial reuse of functionality (via inheritence)
The idea behind most object-oriented languages is that you can instantiate a class with some data to create an “object”, then forget about the data altogether and only interact with the object after that. In this theory, the user of a given api, which is often the person who wrote it, doesn’t need to know how the data is stored; they can rely on the class to have a method that does what is needed, and therefore will not need to know the details of the implementation.
The present opinion of the author is that this places an undue burden of prescience on the developer of the class. Data has a funny way of being needed at unexpected times in unexpected ways. The point that a user of a class (possibly the same person as that class’s author) has to add a method or else retrieve some data in a way that the class hasn’t been written to expect, is where the encapsulation breaks down.
Instead of combining data and behavior, we should decouple the two as far as possible. Data should flow between functions, and workflows should represent the composition of functions. This way, our architecture is made simpler and more flexible, and encapsulation is accomplished in a layered way, with functions wrapping functions.
Ok, so what are we doing here?
Back to the above list: Job #1 stands apart from the rest. Even the staunchest functional languages recognize the need to store related data together and tag it with what sort of data it is. They tend to use what they call “typeclasses” or “records” for this task.
A record is, in essence, a class with no methods. It is initialized with some data, and the data can be accessed, but not changed. For example, an address
record might have street
, city
, state
, country
, and code
fields. If you want to change any of these fields, you need to make a new address, because obviously an address in a different city is a different address.
Luckily, unlike most of the other things in this book, Python already has a record utility built in. It’s called namedtuple
, and you probably aren’t using them enough.
An ode to namedtuple
Namedtuple is from the endlessly-useful collections
library that ships as part of Python’s standard library. It is a nice bit of syntactic sugar for defining a class with some useful properties:
- All fields must be provided
- Fields can be passed as positional or keyword arguments
- Fields are immutable
This was surely the work of a functional programmer, because it’s just perfect for the job. Here’s how it looks:
# Creates an Address class
Address = namedtuple('Address', ['street', 'city', 'state', 'country', 'code'])
# Creates an Address instance ("record")
addr = Address('123 Fake St', 'Podunk', 'Ohio', 'USA', '90210')
# Creates an Address instance with keyword arguments
addr = Address(city='Podunk', street='123 Fake St', state='Ohio', code='90210', country='USA')
Namedtuple instances are actually instances of the class:
isinstance(addr, Address) # True
An error is thrown if you instantiate with the wrong number of arguments, to prevent mishaps:
addr = Address('123 Fake St', 'Podunk', 'Ohio', 'USA') # TypeError
And finally, an AttributeError
is thrown if you try to modify or add a field:
addr.street = '456 Fake St.' # AttributeError
addr.province = 'BC' # AttributeError
But where do I use it?
Start with a namedtuple
any time you start to write a class containing data, or any time you find yourself writing validation code against a dict or tuple. Here are some ideas:
Quickly validate user input
HelloWorldRequestData = namedtuple('HelloWorldRequestData', ['name', 'age'])
def hello(request):
try:
data = HelloWorldRequestData(**request.POST)
return "Hello! You are {name}, {age} years old".format(data.name, data.age)
except TypeError: # Too few or too many data points
return "You won't get anything if you don't tell me your name and age"
How about an error monad?
Re-jiggering an example from the last chapter to use records turns out to be a small win:
ValidatedData = namedtuple('ValidatedData', ['data', 'errors'])
def bind_validated_data(vd, fn):
result = fn(vd.data)
return ValidatedData(
data=result,
errors=dict(vd.errors, **result.errors))
def validate_name(data):
if not data.get('name'):
return ValidatedData(data, {'name': 'No name found'})
return ValidatedData(data, None)
def clean_phone(data):
phone = data.get('phone')
if phone:
data['phone'] = re.replace(r'[^0-9]', '')
return ValidatedData(data, None)
return ValidatedData(data, {'phone': 'Please provide a phone number'})
def validate(data):
return reduce( # Take note! This is a handy way to thread data through functions.
bind_validated_data,
[validate_name, clean_phone],
data)
Enforce well-behaved classes
You can subclass the classes contained by namedtuple, and inherit all the nice properties of immutability and a do-nothing constructor.
Here’s a linked list as a namedtuple subclass (note: don’t ever actually use python-based linked lists such as this if you value your stack and/or memory):
class List(namedtuple('Cell', ['first', 'rest'])):
def cons(self, item):
return List(item, self)
def seq(self):
yield self.first
if self.rest is not None:
for item in self.rest.seq():
yield item
l1 = List('a', None)
l2 = l1.cons('b').cons('c')
[x for x in l2.seq()] # ['c', 'b', 'a']
repr(l2) # Cell(first='c', rest=Cell(first='b', rest=Cell(first='a', rest=None)))
Of course, I could have just written:
List = namedtuple('Cell', ['first', 'rest'])
def cons(cell, item):
return List(item, cell)
def seq(cell):
yield cell.first
if cell.rest is not None:
for item in seq(cell.rest):
yield item
l1 = List('a', None)
l2 = cons(cons(l1, 'b'), 'c')
# or: l2 = reduce(cons, ['b', 'c'], l1)
[x for x in seq(l2)] # ['c', 'b', 'a']
Which of these is better is mostly a matter of taste – I think anyone would agree that they accomplish the same thing in nearly the same way. This is one way that immutable records co-operate with functional programming styles: methods receive no special treatment.
Next article: Multiple personalities