Memoization in Python: easier than what it should be
I was writing some DB-access-intensive Python application and felt the need to cache function results. After fiddling with some dictionaries, I felt there was some underlying pattern I wasn’t spotting. Then it hit me. I was just memoizing function calls.
So I came up with this decorator.
import functools
import cPickle
def memoize(fctn):
memory = {}
@functools.wraps(fctn)
def memo(*args,**kwargs):
haxh = cPickle.dumps((args, sorted(kwargs.iteritems())))
if haxh not in memory:
memory[haxh] = fctn(*args,**kwargs)
return memory[haxh]
if memo.__doc__:
memo.__doc__ = "\n".join([memo.__doc__,"This function is memoized."])
return memo
Python is really easy to write.
23 August 2008 – Update: Corrected the bug pointed out by John.
25 August 2008 – Update: Improved as per reddit comments’s suggestions.

August 22nd, 2008 at 5:57 pm
Very nice. This is almost a mirror image of the memoize function I wrote a few days ago, except I think yours is a bit more robust (mine has no kwargs support, for example).
Where mine shines is with recursive functions. With my first effort, if I tried to memoize the fibonacci function, it would calculate fib(n) the inefficient way and take a million years, then memoize the topmost result and return it. Unfortunately, memoizing a recursive function in this way is impossible, since the function body contains a reference to the non-memoized function.
My solution (which is not exactly great) required a change to the type of input that memoize takes. This is hard to explain and would certainly have been beyond my understanding a week ago so I’ll let my mediocre code speak for me: http://pcj600.googlepages.com/memoize.py
As you can see, the function fibonacci_constructor MAKES the fibonacci function USING a function passed as a parameter for its recursive calls. This way, memoize can step in and tell the function to do recursion using a different function – namely, the memoized version.
It’s a bit ugly and fails to opaque its functionality since it demands a second order function that you’d never write normally. What would be really nice is if we could do a find + replace on the function body, replacing all references to itself with references to the memoized version, so that there’d be no need to pass in the recursive function as a parameter. I bet this is possible with Lisp macros, but I’m still a beginner with Lisp …
August 22nd, 2008 at 6:13 pm
There is a bug on the 4th line:
>>> print list.sort.__doc__
L.sort(cmp=None, key=None, reverse=False) — stable sort *IN PLACE*;
cmp(x, y) -> -1, 0, 1
list.sort sorts in place, so the second item in your haxh tuple will always be None. Fix it by changing that line to:
haxh = tuple([args,sorted(kwargs.iteritems())])
sorted will return an iterator (which when put in to the tuple() is changed in to a list)
>>> args = (1,2,3,)
>>> kwargs={’a':5,’b':6,’aa’:9}
>>> tuple([args,sorted(kwargs.iteritems())])
((1, 2, 3), [('a', 5), ('aa', 9), ('b', 6)])
August 22nd, 2008 at 6:52 pm
I went ahead and ran this out of curiosity as soon as I got my hands on a Python interpreter and it does indeed work recursively. The subtlety is that using memoize as a decorator redefines what the function refers to, so the recursive calls are indeed to the memoized version. Disregard my previous comment; your version is ten times more elegant than mine!
August 22nd, 2008 at 7:36 pm
I am new to python and in general very interested in how the concept of memoization applies to parsers and other areas. Could you provide a working example of the function above for context? Thanks in advance!
August 22nd, 2008 at 8:03 pm
Isn’t list(kwargs.iteritems()).sort() going to have a value of None ?
August 22nd, 2008 at 9:44 pm
list(kwargs.iteritems()).sort()
does not return a list. Instead .sort() sorts the list in place and returns None.
August 23rd, 2008 at 1:49 am
John (et al.): Bug fixed! Thank you!
Jonas Gorauskas: I came up with this memoization code to cache some DB calls, as mentioned in the post. I had to run through 40k+ rows in Excel and do some auxiliary queries to construct insert statements. Those auxiliary queries would always return the same result for the same arguments. Memoization just popped into my mind.
August 25th, 2008 at 12:05 am
I’ve updated my post to reflect many of the improvements presented on reddit comments.
Thank you, all! =)
August 25th, 2008 at 1:16 am
Illustrative example:
@memoize def prt(*args,**kwargs): print "Calculated!" return (args,kwargs) a = [1,2,3] print prt(a,one=10, c='cee', d2='two') a.append(1) print prt(a, one=10, c='cee', d2='two') print prt(a, one=10, d2='two', c='cee') a.pop() print prt(a, c='cee',one=10, d2='two')