A different state of mind
In software development, and especially in circles frequented by functional programming wonks, “state” is a dirty word, spoken in hushed tones, always accompanied by a grimace. But what is this “state”?
State is a combination of a value and time. Some would call it a conflation of the two. Consider the following:
class MyStrawman(object):
def __init__(self):
self.contents = []
def add_a_thing(self, thing):
self.contents.append(thing)
strawman = MyStrawman()
send_to_oz(strawman)
print strawman.contents
Can you tell me the contents of strawman
now? If you guessed ['brain']
then you probably didn’t get the point of the story; the point is that we can’t know what strawman contains without knowing exactly what send_to_oz
is doing (and it’s actually ['diploma']
).
This probably isn’t enough to convince you that state is something you should wield with care, so instead, think back to the last time you were working with a large codebase trying to find a bug, which was happening because some global or static variable somewhere was the incorrect value, and you just couldn’t find where that value was set. Common wisdom would tell you to rewrite the code to avoid global variables, but what if you went further? What if you could get right at the root of the problem, and eliminate an entire category of bugs from your software. This is the other half of functional programming.
“Pure” functional paradigms attempt eschew state entirely, but that requires some baked-in language support to be sane. The best thing you can reasonably do in Python is to follow this one weird rule:
Write more pure functions.
A “pure” function is one that is “referentially transparent”, which is another way of saying, entirely deterministic. A pure function will always return the same value for a given input, no matter where or when or from what context it is called.
Not all your functions can be pure. A function that reads a file or writes to a database or accepts user input or returns a random number can’t be pure, and you can’t very well write useful software if it doesn’t interact with the outside world at some point. But for most of the work that your software does writing pure functions is probably pretty straightforward – at least when working with simple data structures.
One thing that might stand in our way is that Python, for various sensible reasons, implements any type more complex than a number or a string as a mutable object
. For example, lists, dicts, and sets are all mutable; if you append
to a list, or update
a dict, you change the object and lose access to the original.
Instead of thinking in terms of “updating” objects, to write pure functions that work with them, we’ll need to think in terms of creating new objects. Doing this with tuples is mandatory; tuples in Python are always immutable. For lists and sets, you can simply create new lists or sets using the list
or set
collectors, or the overloaded arithmatic operators.
The dict
constructor takes extra arguments, so you can use it to add items to a dict without changing the original:
my_mutable_dict = {'a': 1, 'b': 2}
my_constant_dict = {'a': 1, 'b': 2}
my_mutable_dict.update({'c': 3}) # Changes existing dict
new_dict = dict(my_constant_dict, **{'c': 3}) # Returns new dict
another_new_dict = dict(my_constant_dict, c=3)
Failing that, write functions that behave purely
It’s important to consider performance in this approach. It’s fine for small collections, like configuration objects, but copying large amounts of data each time you want to append an item to a list is probably a bad idea.
You can get around this by simply writing functions that pretend to be pure; if you only perform mutating actions on variables created within the scope of the function, it will look pure from the outside. For example, to append a bunch of data to a list passed as an argument, you can copy the list once and then use append thereafter, returning the copy:
def add_lines_to_list(incoming_list, some_file):
outgoing_list = list(incoming_list)
for line in some_file:
outgoing_list.append(line)
return outgoing_list
By creating a copy, whoever is using your function won’t have to worry about having their original list changed. If you’re working with large enough amounts of data to make this approach problematic, you probably know it.
Next article: Lose the Loops