What is Python's "self" Argument, Anyway?

Every Python developer is familiar with the self argument, which is present in every* method declaration of every class. We all know how to use it, but do you really know what it is, why it's there and how it works under the hood?

What We Already Know

Let's start with what we already know: self - the first argument in methods - refers to the class instance:

class MyClass:
                  ▼                 │
    def do_stuff(self, some_arg):   │
        print(some_arg)  ▲          │
                         │          │
                         │          │
                         │          │
                         │          │
instance = MyClass()     │          │
instance.do_stuff("whatever")       │
    │                               │

Also, this argument doesn't actually have to be called self - it's just a convention. You could use for example this as is common in other languages (but don't).

The above code is probably natural and obvious since you've been using since forever, but we've given the .do_stuff() only one argument (some_arg), yet the method declares two (self and , some_arg), which doesn't make sense. The arrows in the snippet show that self got translated into the instance, but how did it really get there?

instance = MyClass()

MyClass.do_stuff(instance, "whatever")

What Python does internally, is conversion from instance.do_stuff("whatever") to MyClass.do_stuff(instance, "whatever"). We could end it here and just call it a "Python magic", but if we want to actually understand what's going on under the hood, we need to understand what Python methods are and how they relate to functions.

Class Attributes/Methods

In Python, there's no such thing as "method" object - in reality methods are just regular functions. The difference between function and method is that methods are defined in a namespace of a class making them an attribute of said class.

These attributes are stored in class dictionary __dict__, which we can access directly or using vars builtin function:

# <function MyClass.do_stuff at 0x7f132b73d550>

# <function MyClass.do_stuff at 0x7f132b73d550>

Most common way to access them would be the "class method"-way:

# <function MyClass.do_stuff at 0x7f132b73d550>

Here we accessed the function using a class attribute, which as expected prints that do_stuff is a function of MyClass. We can however access it also using the instance attribute:

# <bound method MyClass.do_stuff of <__main__.MyClass object at 0x7ff80c78de50>

In this case though, we get back a "bound method" rather than the raw function. What Python does for us here, is that it binds the class attribute to the instance, creating what's called a "bound method". This "bound method" is a wrapper around the underlying function that has the instance already inserted as a first argument (self).

Therefore, methods are plain functions that have class instance (self) prepended to their other arguments.

To understand how does that happen, we need to take a look at descriptor protocol.

Descriptor Protocol

Descriptors are the mechanism behind methods (among other things). They're objects (classes) that define __get__(), __set__(), or __delete__() method(s). For the purpose of understanding how self works, we will only consider the __get__(), which has a signature:

descr.__get__(self, instance, type=None) -> value

But what does __get__() method actually do? It allows us to customize an attribute lookup in classes - or in other words - customize what happens when class attribute is accessed using dot notation. This is very useful considering that methods are really just attributes of a class. This means that we can use the __get__ method to create a "bound method" of a class.

To make it little easier to understand, let's demonstrate this by implementing a "method" using descriptor. First we create a pure-Python implementation of a function object:

import types

class Function:
    def __get__(self, instance, objtype=None):
        if instance is None:
            return self
        return types.MethodType(self, instance)

    def __call__(self):

The above Function class implements __get__ which makes it a descriptor. This dunder method receives class instance in instance argument - if this argument is None, we know that the __get__ method was called directly from a class (e.g. MyClass.do_stuff), so we just return self. If it was however called from class instance such as instance.do_stuff, then we return types.MethodType, which is a way of creating "bound method" manually.

Additionally, we also provide __call__ dunder method. While __init__ is invoked when class is called to initialize an instance (e.g. instance = MyClass()), the __call__ is invoked when the instance is called (e.g. instance()). We need this because self in types.MethodType(self, instance) must be callable.

Now that we have our own function implementation, we can use it to bind a method to a class:

class MyClass:
    do_stuff = Function()

print(MyClass.__dict__["do_stuff"])  # __get__ not invoked
# <__main__.Function object at 0x7f229b046e50>

print(MyClass.do_stuff)  # __get__ invoked, but "instance" is None, "self" is returned
print(MyClass.do_stuff.__get__(None, MyClass))
# <__main__.Function object at 0x7f229b046e50>

instance = MyClass()
print(instance.do_stuff)  #  __get__ invoked and "instance" is not None, "MethodType" is returned
print(instance.do_stuff.__get__(instance, MyClass))
# <bound method ? of <__main__.MyClass object at 0x7fd526a33d30>

By giving the MyClass an attribute do_stuff of type Function, we roughly simulate what Python does when you define a method in class' namespace.

To summarise, upon attribute access such as instance.do_stuff, do_stuff is looked up in attribute dictionary (__dict__) of instance. If do_stuff defines __get__ method, then do_stuff.__get__ is invoked, ultimately calling:

# For class invocation:
print(MyClass.__dict__['do_stuff'].__get__(None, MyClass))
# <__main__.Function object at 0x7f229b046e50>

# For instance invocation:
print(MyClass.__dict__['do_stuff'].__get__(instance, MyClass))
# Alternatively:
print(type(instance).__dict__['do_stuff'].__get__(instance, type(instance)))
# <bound method ? of <__main__.MyClass object at 0x7fd526a33d30>

Which - as we now know - will return a bound method - a callable wrapper around the original function, which has the self prepended to its arguments!

If you want to explore this further, you can similarly implement static and class methods - examples of how to do that can be found in docs here.

Why It's There, Though?

We now know how does it work, but a more philosophical questions stands - "Why does it have to appear in method definitions?"

The explicit self method argument is controversial design choice, but it's a choice in favour of simplicity.

Python's self is embodiment of the "worse is better" design philosophy - described here. The priority of this design philosophy is "simplicity" defined as:

The design must be simple, both in implementation and interface. It is more important for the implementation to be simple than the interface...

That's exactly the case with self - a simple implementation, at the expense of interface, where method signature doesn't match its invocation.

There are more reasons why we have explicit self or rather why it has to stay. Some of them are described in blog post by Guido van Rossum, in response to proposal calling for its removal.

Closing Thoughts

Python abstracts away a lot of complexity, but digging into low level details and intricacies can be - in my opinion - very valuable for getting greater understanding of how the language works, which can come in handy when things break and high level troubleshooting/debugging isn't enough.

Additionally, understanding descriptors can actually be quite practical as there are some use cases for them. While most of the time you will only really need the @property descriptor, there are situations where custom ones make sense, such as ones in SLQAlchemy or e.g. custom validators.