I’m on Python 3.7, working on a project where my colleague is on 3.6. This has quickly revealed an issue with dictionaries in 3.7 that causes serious bugs if you’re not aware of the problem and careful about handling it.
Dictionaries are the Python class for storing key/value pairs, one of the most useful and fundamental data types, especially in Python.
Since the beginning of recorded programming history, dictionaries have been unordered, meaning they store their data in an effectively random order:
zoo = {'birds': 10, 'lions': 1, 'zebras': 5, }
print(zoo)
To see what order a dictionary stored things in before 3.7, you can use the hash()
function, which returns a number that indicates the order in which the object will be stored in on your system:
print(f'zebras: {hash("zebras")}')
print(f'lions: {hash("lions")}')
print(f'birds: {hash("birds")}')
This is unintuitive, but by storing data this way, dictionaries are extremely fast to search. Unlike a list
, which is very slow to search, because Python searches a list by starting at the first item and going through the list until it finds the desired item; for a large list this can take a long time. A dictionary, on the other hand, calls the hash()
function for the item, gets that unique number, and effectively jumps right to the spot where it's stored.
When teaching dictionaries to new programmers (and sometimes even to experienced programmers) I've had to spend a lot of time explaining why and how they work the way they do, and imploring people to remember to never rely on the order of items in a dictionary.
In Python 3.7 dictionaries now remember their order. This is quite a nice feature, and I find myself now using dictionaries even more than I did before, because they act like an ordered list, yet are still extremely fast to search. It also makes them more intuitive and removes one potential source of bugs, where people would write code assuming the dictionary was ordered when it wasn’t.
For the last week I've been running into this. I'm working on a script that will be used for a project kickoff, and which makes heavy use of dictionaries being ordered to make the output usable.
Everything works great for me, but when my colleague on 3.6 runs the script, everything is in scrambled order, rendering the output almost useless. This is a very easy bug to run into unless you are also testing your script on a pre-3.7 version of Python.
Based on this experience, I've decided that for the foreseeable future I'm not going to use the built-in dictionary. Instead, the collections
module has an OrderedDict
class that acts just like a dictionary, but remembers the order no matter what version you are running (even Python 2.7 has this class).
You can't use the dictionary literal ({}
) for an OrderedDict, you have to use the class name, after which everything else is the same:
from collections import OrderedDict
mydict = OrderedDict()
mydict['birds'] = 5
mydict['lions'] = 1
mydict['zebras'] = 5
print(mydict)
If you take this approach you still need to be careful: I've found myself unthinkingly using a literal to create a dictionary and having the problem pop up yet again.
Even though I like having my dictionaries be ordered, this kind of change is disruptive enough that it seems like something that should go into a .0 release rather than a dot release, though I suppose that wouldn't really change this impact. In any case, now that this change has happened, be cautious with your use of dictionaries!