Every Python developer is familiar with the
self argument, which is present in every* method declaration of every class. We all know how to use it, but do you really know what it is, why it's there and how it works under the hood?
What We Already Know
Let's start with what we already know:
self - the first argument in methods - refers to the class instance:
class MyClass: ┌─────────────────┐ ▼ │ def do_stuff(self, some_arg): │ print(some_arg) ▲ │ │ │ │ │ │ │ │ │ instance = MyClass() │ │ instance.do_stuff("whatever") │ │ │ └───────────────────────────────┘
Also, this argument doesn't actually have to be called
self - it's just a convention. You could use for example
this as is common in other languages (but don't).
The above code is probably natural and obvious since you've been using since forever, but we've given the
.do_stuff() only one argument (
some_arg), yet the method declares two (
self and ,
some_arg), which doesn't make sense. The arrows in the snippet show that
self got translated into the
instance, but how did it really get there?
instance = MyClass() MyClass.do_stuff(instance, "whatever")
What Python does internally, is conversion from
MyClass.do_stuff(instance, "whatever"). We could end it here and just call it a "Python magic", but if we want to actually understand what's going on under the hood, we need to understand what Python methods are and how they relate to functions.
In Python, there's no such thing as "method" object - in reality methods are just regular functions. The difference between function and method is that methods are defined in a namespace of a class making them an attribute of said class.
These attributes are stored in class dictionary
__dict__, which we can access directly or using
vars builtin function:
MyClass.__dict__["do_stuff"] # <function MyClass.do_stuff at 0x7f132b73d550> vars(MyClass)["do_stuff"] # <function MyClass.do_stuff at 0x7f132b73d550>
Most common way to access them would be the "class method"-way:
print(MyClass.do_stuff) # <function MyClass.do_stuff at 0x7f132b73d550>
Here we accessed the function using a class attribute, which as expected prints that
do_stuff is a function of
MyClass. We can however access it also using the instance attribute:
print(instance.do_stuff) # <bound method MyClass.do_stuff of <__main__.MyClass object at 0x7ff80c78de50>
In this case though, we get back a "bound method" rather than the raw function. What Python does for us here, is that it binds the class attribute to the instance, creating what's called a "bound method". This "bound method" is a wrapper around the underlying function that has the
instance already inserted as a first argument (
Therefore, methods are plain functions that have class instance (
self) prepended to their other arguments.
To understand how does that happen, we need to take a look at descriptor protocol.
Descriptors are the mechanism behind methods (among other things). They're objects (classes) that define
__delete__() method(s). For the purpose of understanding how
self works, we will only consider the
__get__(), which has a signature:
descr.__get__(self, instance, type=None) -> value
But what does
__get__() method actually do? It allows us to customize an attribute lookup in classes - or in other words - customize what happens when class attribute is accessed using dot notation. This is very useful considering that methods are really just attributes of a class. This means that we can use the
__get__ method to create a "bound method" of a class.
To make it little easier to understand, let's demonstrate this by implementing a "method" using descriptor. First we create a pure-Python implementation of a function object:
import types class Function: def __get__(self, instance, objtype=None): if instance is None: return self return types.MethodType(self, instance) def __call__(self): return
Function class implements
__get__ which makes it a descriptor. This dunder method receives class instance in
instance argument - if this argument is
None, we know that the
__get__ method was called directly from a class (e.g.
MyClass.do_stuff), so we just return
self. If it was however called from class instance such as
instance.do_stuff, then we return
types.MethodType, which is a way of creating "bound method" manually.
Additionally, we also provide
__call__ dunder method. While
__init__ is invoked when class is called to initialize an instance (e.g.
instance = MyClass()), the
__call__ is invoked when the instance is called (e.g.
instance()). We need this because
types.MethodType(self, instance) must be callable.
Now that we have our own function implementation, we can use it to bind a method to a class:
class MyClass: do_stuff = Function() print(MyClass.__dict__["do_stuff"]) # __get__ not invoked # <__main__.Function object at 0x7f229b046e50> print(MyClass.do_stuff) # __get__ invoked, but "instance" is None, "self" is returned print(MyClass.do_stuff.__get__(None, MyClass)) # <__main__.Function object at 0x7f229b046e50> instance = MyClass() print(instance.do_stuff) # __get__ invoked and "instance" is not None, "MethodType" is returned print(instance.do_stuff.__get__(instance, MyClass)) # <bound method ? of <__main__.MyClass object at 0x7fd526a33d30>
By giving the
MyClass an attribute
do_stuff of type
Function, we roughly simulate what Python does when you define a method in class' namespace.
To summarise, upon attribute access such as
do_stuff is looked up in attribute dictionary (
__get__ method, then
do_stuff.__get__ is invoked, ultimately calling:
# For class invocation: print(MyClass.__dict__['do_stuff'].__get__(None, MyClass)) # <__main__.Function object at 0x7f229b046e50> # For instance invocation: print(MyClass.__dict__['do_stuff'].__get__(instance, MyClass)) # Alternatively: print(type(instance).__dict__['do_stuff'].__get__(instance, type(instance))) # <bound method ? of <__main__.MyClass object at 0x7fd526a33d30>
Which - as we now know - will return a bound method - a callable wrapper around the original function, which has the
self prepended to its arguments!
If you want to explore this further, you can similarly implement static and class methods - examples of how to do that can be found in docs here.
Why It's There, Though?
We now know how does it work, but a more philosophical questions stands - "Why does it have to appear in method definitions?"
self method argument is controversial design choice, but it's a choice in favour of simplicity.
self is embodiment of the "worse is better" design philosophy - described here. The priority of this design philosophy is "simplicity" defined as:
The design must be simple, both in implementation and interface. It is more important for the implementation to be simple than the interface...
That's exactly the case with
self - a simple implementation, at the expense of interface, where method signature doesn't match its invocation.
There are more reasons why we have explicit
self or rather why it has to stay. Some of them are described in blog post by Guido van Rossum, in response to proposal calling for its removal.
Python abstracts away a lot of complexity, but digging into low level details and intricacies can be - in my opinion - very valuable for getting greater understanding of how the language works, which can come in handy when things break and high level troubleshooting/debugging isn't enough.
Additionally, understanding descriptors can actually be quite practical as there are some use cases for them. While most of the time you will only really need the
@property descriptor, there are situations where custom ones make sense, such as ones in SLQAlchemy or e.g. custom validators.