Chances are that you never touched and maybe haven't even heard about Python's weakref
module. While it might not be commonly used in your code, it's fundamental to inner workings of many libraries, frameworks and even Python itself. So, in this article we will explore what it is, how is it helpful, and how you could incorporate it into your code as well.
The Basics
To understand weakref
module and weak references, we first need a little intro to garbage collection in Python.
Python uses reference counting as a mechanism for garbage collection - in simple terms - Python keeps a reference count for each object we create and the reference count is incremented whenever the object is referenced in code; and it's decremented when an object is de-referenced (e.g. variable set to None
). If the reference count ever drop to zero, the memory for the object is deallocated (garbage-collected).
Let's look at some code to understand it a little more:
import sys
class SomeObject:
def __del__(self):
print(f"(Deleting {self=})")
obj = SomeObject()
print(sys.getrefcount(obj)) # 2
obj2 = obj
print(sys.getrefcount(obj)) # 3
obj = None
obj2 = None
# (Deleting self=<__main__.SomeObject object at 0x7d303fee7e80>)
Here we define a class that only implements a __del__
method, which is called when object is garbage-collected (GC'ed) - we do this so that we can see when the garbage collection happens.
After creating an instance of this class, we use sys.getrefcount
to get current number of references to this object. We would expect to get 1
here, but the count returned by getrefcount
is generally one higher than you might expect, that's because when we call getrefcount
, the reference is copied by value into the function's argument, temporarily bumping up the object's reference count.
Next, if we declare obj2 = obj
and call getrefcount
again, we get 3
because it's now referenced by both obj
and obj2
. Conversely, if we assign None
to these variables, the reference count will decrease to zero, and eventually we will get the message from __del__
method telling us that the object got garbage-collected.
Well, and how do weak references fit into this? If only remaining references to an object are weak references, then Python interpreter is free to garbage-collect this object. In other words - a weak reference to an object is not enough to keep the object alive:
import weakref
obj = SomeObject()
reference = weakref.ref(obj)
print(reference) # <weakref at 0x734b0a514590; to 'SomeObject' at 0x734b0a4e7700>
print(reference()) # <__main__.SomeObject object at 0x707038c0b700>
print(obj.__weakref__) # <weakref at 0x734b0a514590; to 'SomeObject' at 0x734b0a4e7700>
print(sys.getrefcount(obj)) # 2
obj = None
# (Deleting self=<__main__.SomeObject object at 0x70744d42b700>)
print(reference) # <weakref at 0x7988e2d70590; dead>
print(reference()) # None
Here we again declare a variable obj
of our class, but this time instead of creating second strong reference to this object, we create weak reference in reference
variable.
If we then check the reference count, we can see that it did not increase, and if we set the obj
variable to None
, we can see that it immediately gets garbage-collected even though the weak reference still exist.
Finally, if try to access the weak reference to the already garbage-collected object, we get a "dead" reference and None
respectively.
Also notice that when we used the weak reference to access the object, we had to call it as a function (reference()
) to retrieve to object. Therefore, it is often more convenient to use a proxy instead, especially if you need to access object attributes:
obj = SomeObject()
reference = weakref.proxy(obj)
print(reference) # <__main__.SomeObject object at 0x78a420e6b700>
obj.attr = 1
print(reference.attr) # 1
When To Use It
Now that we know how weak references work, let's look at some examples of how they could be useful.
A common use-case for weak references is tree-like data structures:
class Node:
def __init__(self, value):
self.value = value
self._parent = None
self.children = []
def __repr__(self):
return "Node({!r:})".format(self.value)
@property
def parent(self):
return self._parent if self._parent is None else self._parent()
@parent.setter
def parent(self, node):
self._parent = weakref.ref(node)
def add_child(self, child):
self.children.append(child)
child.parent = self
root = Node("parent")
n = Node("child")
root.add_child(n)
print(n.parent) # Node('parent')
del root
print(n.parent) # None
Here we implement a tree using a Node
class where child nodes have weak reference to their parent. In this relation, the child Node
can live without parent Node
, which allows parent to be silently removed/garbage-collected.
Alternatively, we can flip this around:
class Node:
def __init__(self, value):
self.value = value
self._children = weakref.WeakValueDictionary()
@property
def children(self):
return list(self._children.items())
def add_child(self, key, child):
self._children[key] = child
root = Node("parent")
n1 = Node("child one")
n2 = Node("child two")
root.add_child("n1", n1)
root.add_child("n2", n2)
print(root.children) # [('n1', Node('child one')), ('n2', Node('child two'))]
del n1
print(root.children) # [('n2', Node('child two'))]
Here instead, the parent keeps dictionary of weak references to its children. This uses WeakValueDictionary
- whenever an element (weak reference) referenced from the dictionary gets dereferenced elsewhere in the program, it automatically gets removed from the dictionary too, so we don't have manage lifecycle of dictionary items.
Another use of weakref
is in Observer design pattern:
class Observable:
def __init__(self):
self._observers = weakref.WeakSet()
def register_observer(self, obs):
self._observers.add(obs)
def notify_observers(self, *args, **kwargs):
for obs in self._observers:
obs.notify(self, *args, **kwargs)
class Observer:
def __init__(self, observable):
observable.register_observer(self)
def notify(self, observable, *args, **kwargs):
print("Got", args, kwargs, "From", observable)
subject = Observable()
observer = Observer(subject)
subject.notify_observers("test", kw="python")
# Got ('test',) {'kw': 'python'} From <__main__.Observable object at 0x757957b892d0>
The Observable
class keeps weak references to its observers, because it doesn't care if they get removed. As with previous examples, this avoids having to manage the lifecycle of dependant objects. As you probably noticed, in this example we used WeakSet
which is another class from weakref
module, it behaves just like the WeakValueDictionary
but is implemented using Set
.
Final example for this section is borrowed from weakref
docs:
import tempfile, shutil
from pathlib import Path
class TempDir:
def __init__(self):
self.name = tempfile.mkdtemp()
self._finalizer = weakref.finalize(self, shutil.rmtree, self.name)
def __repr__(self):
return "TempDir({!r:})".format(self.name)
def remove(self):
self._finalizer()
@property
def removed(self):
return not self._finalizer.alive
tmp = TempDir()
print(tmp) # TempDir('/tmp/tmp8o0aecl3')
print(tmp.removed) # False
print(Path(tmp.name).is_dir()) # True
This showcases one more feature of weakref
module, which is weakref.finalize
. As the name suggest it allows executing a finalizer function/callback when the dependant object is garbage-collected. In this case we implement a TempDir
class which can be used to create a temporary directory - in ideal case we would always remember to clean up the TempDir
when we don't need it anymore, but if we forget, we have the finalizer that will automatically run rmtree
on the directory when the TempDir
object is GC'ed, which includes when program exits completely.
Real-World Examples
The previous section has shown couple practical usages for weakref
, but let's also take a look at real-world examples - one of them being creating a cached instance:
import logging
a = logging.getLogger("first")
b = logging.getLogger("second")
print(a is b) # False
c = logging.getLogger("first")
print(a is c) # True
The above is basic usage of Python's builtin logging
module - we can see that it allows to only associate a single logger instance with a given name - meaning that when we retrieve same logger multiple times, it always returns the same cached logger instance.
If we wanted to implement this, it could look something like this:
class Logger:
def __init__(self, name):
self.name = name
_logger_cache = weakref.WeakValueDictionary()
def get_logger(name):
if name not in _logger_cache:
l = Logger(name)
_logger_cache[name] = l
else:
l = _logger_cache[name]
return l
a = get_logger("first")
b = get_logger("second")
print(a is b) # False
c = get_logger("first")
print(a is c) # True
And finally, Python itself uses weak references, e.g. in implementation of OrderedDict
:
from _weakref import proxy as _proxy
class OrderedDict(dict):
def __new__(cls, /, *args, **kwds):
self = dict.__new__(cls)
self.__hardroot = _Link()
self.__root = root = _proxy(self.__hardroot)
root.prev = root.next = root
self.__map = {}
return self
The above is snippet from CPython's collections
module. Here, the weakref.proxy
is used to prevent circular references (see the doc-strings for more details).
Conclusion
weakref
is fairly obscure, but at times very useful tool that you should keep in your toolbox. It can be very helpful when implementing caches or data structures that have reference loops in them, such as doubly linked lists.
With that said, one should be aware on weakref
support - everything said here and in the docs is CPython specific and different Python implementations will have different weakref
behavior. Also, many of the builtin types don't support weak references, such as list
, tuple
or int
.