Every Python developer is familiar with the self
argument, which is present in every* method declaration of every class. We all know how to use it, but do you really know what it is, why it's there and how it works under the hood?
What We Already Know
Let's start with what we already know: self
- the first argument in methods - refers to the class instance:
class MyClass:
┌─────────────────┐
▼ │
def do_stuff(self, some_arg): │
print(some_arg) ▲ │
│ │
│ │
│ │
│ │
instance = MyClass() │ │
instance.do_stuff("whatever") │
│ │
└───────────────────────────────┘
Also, this argument doesn't actually have to be called self
- it's just a convention. You could use for example this
as is common in other languages (but don't).
The above code is probably natural and obvious since you've been using since forever, but we've given the .do_stuff()
only one argument (some_arg
), yet the method declares two (self
and , some_arg
), which doesn't make sense. The arrows in the snippet show that self
got translated into the instance
, but how did it really get there?
instance = MyClass()
MyClass.do_stuff(instance, "whatever")
What Python does internally, is conversion from instance.do_stuff("whatever")
to MyClass.do_stuff(instance, "whatever")
. We could end it here and just call it a "Python magic", but if we want to actually understand what's going on under the hood, we need to understand what Python methods are and how they relate to functions.
Class Attributes/Methods
In Python, there's no such thing as "method" object - in reality methods are just regular functions. The difference between function and method is that methods are defined in a namespace of a class making them an attribute of said class.
These attributes are stored in class dictionary __dict__
, which we can access directly or using vars
builtin function:
MyClass.__dict__["do_stuff"]
# <function MyClass.do_stuff at 0x7f132b73d550>
vars(MyClass)["do_stuff"]
# <function MyClass.do_stuff at 0x7f132b73d550>
Most common way to access them would be the "class method"-way:
print(MyClass.do_stuff)
# <function MyClass.do_stuff at 0x7f132b73d550>
Here we accessed the function using a class attribute, which as expected prints that do_stuff
is a function of MyClass
. We can however access it also using the instance attribute:
print(instance.do_stuff)
# <bound method MyClass.do_stuff of <__main__.MyClass object at 0x7ff80c78de50>
In this case though, we get back a "bound method" rather than the raw function. What Python does for us here, is that it binds the class attribute to the instance, creating what's called a "bound method". This "bound method" is a wrapper around the underlying function that has the instance
already inserted as a first argument (self
).
Therefore, methods are plain functions that have class instance (self
) prepended to their other arguments.
To understand how does that happen, we need to take a look at descriptor protocol.
Descriptor Protocol
Descriptors are the mechanism behind methods (among other things). They're objects (classes) that define __get__()
, __set__()
, or __delete__()
method(s). For the purpose of understanding how self
works, we will only consider the __get__()
, which has a signature:
descr.__get__(self, instance, type=None) -> value
But what does __get__()
method actually do? It allows us to customize an attribute lookup in classes - or in other words - customize what happens when class attribute is accessed using dot notation. This is very useful considering that methods are really just attributes of a class. This means that we can use the __get__
method to create a "bound method" of a class.
To make it little easier to understand, let's demonstrate this by implementing a "method" using descriptor. First we create a pure-Python implementation of a function object:
import types
class Function:
def __get__(self, instance, objtype=None):
if instance is None:
return self
return types.MethodType(self, instance)
def __call__(self):
return
The above Function
class implements __get__
which makes it a descriptor. This dunder method receives class instance in instance
argument - if this argument is None
, we know that the __get__
method was called directly from a class (e.g. MyClass.do_stuff
), so we just return self
. If it was however called from class instance such as instance.do_stuff
, then we return types.MethodType
, which is a way of creating "bound method" manually.
Additionally, we also provide __call__
dunder method. While __init__
is invoked when class is called to initialize an instance (e.g. instance = MyClass()
), the __call__
is invoked when the instance is called (e.g. instance()
). We need this because self
in types.MethodType(self, instance)
must be callable.
Now that we have our own function implementation, we can use it to bind a method to a class:
class MyClass:
do_stuff = Function()
print(MyClass.__dict__["do_stuff"]) # __get__ not invoked
# <__main__.Function object at 0x7f229b046e50>
print(MyClass.do_stuff) # __get__ invoked, but "instance" is None, "self" is returned
print(MyClass.do_stuff.__get__(None, MyClass))
# <__main__.Function object at 0x7f229b046e50>
instance = MyClass()
print(instance.do_stuff) # __get__ invoked and "instance" is not None, "MethodType" is returned
print(instance.do_stuff.__get__(instance, MyClass))
# <bound method ? of <__main__.MyClass object at 0x7fd526a33d30>
By giving the MyClass
an attribute do_stuff
of type Function
, we roughly simulate what Python does when you define a method in class' namespace.
To summarise, upon attribute access such as instance.do_stuff
, do_stuff
is looked up in attribute dictionary (__dict__
) of instance
. If do_stuff
defines __get__
method, then do_stuff.__get__
is invoked, ultimately calling:
# For class invocation:
print(MyClass.__dict__['do_stuff'].__get__(None, MyClass))
# <__main__.Function object at 0x7f229b046e50>
# For instance invocation:
print(MyClass.__dict__['do_stuff'].__get__(instance, MyClass))
# Alternatively:
print(type(instance).__dict__['do_stuff'].__get__(instance, type(instance)))
# <bound method ? of <__main__.MyClass object at 0x7fd526a33d30>
Which - as we now know - will return a bound method - a callable wrapper around the original function, which has the self
prepended to its arguments!
If you want to explore this further, you can similarly implement static and class methods - examples of how to do that can be found in docs here.
Why It's There, Though?
We now know how does it work, but a more philosophical questions stands - "Why does it have to appear in method definitions?"
The explicit self
method argument is controversial design choice, but it's a choice in favour of simplicity.
Python's self
is embodiment of the "worse is better" design philosophy - described here. The priority of this design philosophy is "simplicity" defined as:
The design must be simple, both in implementation and interface. It is more important for the implementation to be simple than the interface...
That's exactly the case with self
- a simple implementation, at the expense of interface, where method signature doesn't match its invocation.
There are more reasons why we have explicit self
or rather why it has to stay. Some of them are described in blog post by Guido van Rossum, in response to proposal calling for its removal.
Closing Thoughts
Python abstracts away a lot of complexity, but digging into low level details and intricacies can be - in my opinion - very valuable for getting greater understanding of how the language works, which can come in handy when things break and high level troubleshooting/debugging isn't enough.
Additionally, understanding descriptors can actually be quite practical as there are some use cases for them. While most of the time you will only really need the @property
descriptor, there are situations where custom ones make sense, such as ones in SLQAlchemy or e.g. custom validators.