Real Multithreading is Coming to Python - Learn How You Can Use It Now | Martin Heinz

Python is 32 years old language, yet it still doesn't have proper, true parallelism/concurrency. This is going to change soon, thanks to introduction of a "Per-Interpreter GIL" (Global Interpreter Lock) which will land in Python 3.12. While release of Python 3.12 is some months away, the code is already there, so let's take an early peek at how we can use it to write truly concurrent Python code using sub-interpreters API.

Sub-Interpreters

Let's first explain how this "Per-Interpreter GIL" solves Python's lack of proper concurrency.

Simply put, GIL or Global Interpreter Lock is a mutex that allows only one thread to hold the control of the Python interpreter. This means that even if you create multiple threads in Python (e.g. using threading module) only one thread at the time will run.

With introduction of "Per-Interpreter GIL", individual Python interpreters don't share the same GIL anymore. This level of isolation allows each of these sub-interpreters to run really concurrently. This means, that we can bypass Python's concurrency limitations by spawning additional sub-interpreters, where each of them will have its own GIL (global state).

For more in-depth explanation see PEP 684 which describes this feature/change.

Setup

To use this new, bleeding edge feature, we need to install up-to-date Python version, that requires building from source:


# https://devguide.python.org/getting-started/setup-building/#unix-compiling
git clone https://github.com/python/cpython.git
cd cpython

./configure --enable-optimizations --prefix=$(pwd)/python-3.12
make -s -j2
./python
# Python 3.12.0a7+ (heads/main:22f3425c3d, May 10 2023, 12:52:07) [GCC 11.3.0] on linux
# Type "help", "copyright", "credits" or "license" for more information.

Where is it? (C-API)

We have the latest and greatest version installed, so how do we use the sub-interpreters? Can we simply import it? Well, unfortunately, not yet.

As pointed-out in PEP-684:

...this is an advanced feature meant for a narrow set of users of the C-API.

The features of Per-Interpreter GIL are - for now - only available using C-API, so there's no direct interface for Python developers. Such interface is expected to come with PEP 554, which - if accepted - is supposed to land in Python 3.13, until then we will have to hack our way to the sub-interpreter implementation.

So, while there is no documentation for it, or documented module we could import, there are bits and pieces in CPython codebase that show us a lot about how to use it.

We have 2 options here:

We can use _xxsubinterpreters module which is implemented in C, hence the weird name. Because it's implemented in C, you can't easily inspect the code (at least not in Python).
Or we can take advantage of CPython's test module which has sample Interpreter (and Channel) classes used for testing.


# Choose one of these:
import _xxsubinterpreters as interpreters
from test.support import interpreters

For the most part, in the following examples, we will be using the second option.

We've found the sub-interpreters, but we will also need to borrow some helper functions from Python's test module that we will use to pass code to sub-interpreter:


from textwrap import dedent
import os
# https://github.com/python/cpython/blob/
#   15665d896bae9c3d8b60bd7210ac1b7dc533b093/Lib/test/test__xxsubinterpreters.py#L75
def _captured_script(script):
    r, w = os.pipe()
    indented = script.replace('\n', '\n                ')
    wrapped = dedent(f"""
        import contextlib
        with open({w}, 'w', encoding="utf-8") as spipe:
            with contextlib.redirect_stdout(spipe):
                {indented}
        """)
    return wrapped, open(r, encoding="utf-8")


def _run_output(interp, request, channels=None):
    script, rpipe = _captured_script(request)
    with rpipe:
        interp.run(script, channels=channels)
        return rpipe.read()

Putting the interpreters module and the above helpers together, we can spawn our first sub-interpreter:


from test.support import interpreters

main = interpreters.get_main()
print(f"Main interpreter ID: {main}")
# Main interpreter ID: Interpreter(id=0, isolated=None)

interp = interpreters.create()

print(f"Sub-interpreter: {interp}")
# Sub-interpreter: Interpreter(id=1, isolated=True)

# https://github.com/python/cpython/blob/
#   15665d896bae9c3d8b60bd7210ac1b7dc533b093/Lib/test/test__xxsubinterpreters.py#L236
code = dedent("""
            from test.support import interpreters
            cur = interpreters.get_current()
            print(cur.id)
            """)

out = _run_output(interp, code)

print(f"All Interpreters: {interpreters.list_all()}")
# All Interpreters: [Interpreter(id=0, isolated=None), Interpreter(id=1, isolated=None)]
print(f"Output: {out}")  # Result of 'print(cur.id)'
# Output: 1

One way to spawn and run a new interpreter is to use the create function and then pass the interpreter to the _run_output helper function along with code we want to execute.

An easier way is to simply...


interp = interpreters.create()
interp.run(code)

...use the run method of an interpreter.

However, if we try to run either of the above 2 code snippets, we will receive the following error:


Fatal Python error: PyInterpreterState_Delete: remaining subinterpreters
Python runtime state: finalizing (tstate=0x000055b5926bf398)

To avoid it, we also need to clean up any dangling interpreters:


def cleanup_interpreters():
    for i in interpreters.list_all():
        if i.id == 0:  # main
            continue
        try:
            print(f"Cleaning up interpreter: {i}")
            i.close()
        except RuntimeError:
            pass  # already destroyed

cleanup_interpreters()
# Cleaning up interpreter: Interpreter(id=1, isolated=None)
# Cleaning up interpreter: Interpreter(id=2, isolated=None)

Threading

While running the code with above helper functions works, it might be more convenient to use familiar interface in threading module:


import threading

def run_in_thread():
    t = threading.Thread(target=interpreters.create)
    print(t)
    t.start()
    print(t)
    t.join()
    print(t)

run_in_thread()
run_in_thread()

# <Thread(Thread-1 (create), initial)>
# <Thread(Thread-1 (create), started 139772371633728)>
# <Thread(Thread-1 (create), stopped 139772371633728)>
# <Thread(Thread-2 (create), initial)>
# <Thread(Thread-2 (create), started 139772371633728)>
# <Thread(Thread-2 (create), stopped 139772371633728)>

Here we pass the interpreters.create function to the Thread which automatically spawns the new sub-interpreter inside a thread.

We can also combine the 2 approaches and pass the helper function to the threading.Thread:


import time

def run_in_thread():
    interp = interpreters.create(isolated=True)
    t = threading.Thread(target=_run_output, args=(interp, dedent("""
            import _xxsubinterpreters as _interpreters
            cur = _interpreters.get_current()

            import time
            time.sleep(2)
            # Can't print from here, won't bubble-up to main interpreter

            assert isinstance(cur, _interpreters.InterpreterID)
            """)))
    print(f"Created Thread: {t}")
    t.start()
    return t


t1 = run_in_thread()
print(f"First running Thread: {t1}")
t2 = run_in_thread()
print(f"Second running Thread: {t2}")
time.sleep(4)  # Need to sleep to give Threads time to complete
cleanup_interpreters()

Here we also demonstrate how to use the _xxsubinterpreters module instead of one in test.support. We also sleep for 2 seconds in each thread to simulate some "work". Notice, that we don't even bother calling join on the threads, we simply clean up the interpreters when they complete.

Channels

If we dig through the CPython test module a little more, we will also find that there is an implementation of RecvChannel and SendChannel classes which resemble the channels known from Golang. To use them:


# https://github.com/python/cpython/blob/
#   15665d896bae9c3d8b60bd7210ac1b7dc533b093/Lib/test/test_interpreters.py#L583
r, s = interpreters.create_channel()

print(f"Channel: {r}, {s}")
# Channel: RecvChannel(id=0), SendChannel(id=0)

orig = b'spam'
s.send_nowait(orig)
obj = r.recv()
print(f"Received: {obj}")
# Received: b'spam'

cleanup_interpreters()
# Need clean up, otherwise:

# free(): invalid pointer
# Aborted (core dumped)

This example shows how we can create a channel with receiver (r) and sender (s) end. We can then pass data to the sender using send_nowait and read it on the other side with recv function. This channel is really just another sub-interpreter - so same as before - we need to do clean up when we're done with it.

Digging Deeper

And finally, if we want to mess with or tweak the sub-interpreter options, which are generally set in C code, then we can use the code from test.support module, more specifically run_in_subinterp_with_config:


import test.support

def run_in_thread(script):
    test.support.run_in_subinterp_with_config(
        script,
        use_main_obmalloc=True,
        allow_fork=True,
        allow_exec=True,
        allow_threads=True,
        allow_daemon_threads=False,
        check_multi_interp_extensions=False,
        own_gil=True,
    )

code = dedent(f"""
            from test.support import interpreters
            cur = interpreters.get_current()
            print(cur)
            """)

run_in_thread(code)
# Interpreter(id=7, isolated=None)
run_in_thread(code)
# Interpreter(id=8, isolated=None)

This function is a Python API for this C function. It provides some sub-interpreter options such own_gil which specifies whether the sub-interpreter should have its own GIL.

Conclusion

I'm very happy to see that this change is finally coming to CPython and I want to highlight the amazing work and perseverance by Eric Snow, who made this happen.

With that said - as you could see here - the API isn't exactly easy to use, so unless you have C expertise or very urgent need for sub-interpreters, you might be better off waiting for proper support (hopefully) in Python 3.13. We have waited for many years, what's one more, right? Alternatively, you could try extrainterpreters project which provides a friendlier Python API to sub-interpreters.

While hard to use for average Python developer, I'm sure tool/library developers will put this to good use, and we will see performance improvements popping up in many libraries that can leverage sub-interpreters through the C-API.