Debugging Crashes and Deadlocks in Python using PyStack

There are certain bugs and issues that are very hard to troubleshoot. Just ask yourself, "How would I debug deadlock, segmentation fault, crashing application, or a hanging process?"

Now there's a tool to help you will all of that - PyStack is a powerful new debugger that - according to docs - "uses forbidden magic to let you inspect the stack frames of a running Python process or a Python core dump."

So, let's put that statement to a test and explore how to use PyStack to track down and troubleshoot all the above-mentioned issues and bugs...

Why?

There are many debugging tools out there, including interactive debuggers in your IDE, so you might be wondering, "Why is this even needed?"

While traditional debuggers or even simple print statements are often enough for debugging, there are bugs and issues that are much harder to track down and troubleshoot.

One such category are deadlocks or any application that gets stuck. When you have hanging process, you might ask yourself, "Is it doing something? Or is it stuck/dead/deadlocked?" and PyStack can help answer that.

Another category is hybrid applications - that is - applications that combine C/C++ and Python. That includes Python extension modules, but also libraries such as NumPy or TensorFlow which many of us use every day. Well, how would you debug NumPy crashing with segfault?

And then there are weird issues that happen in specific circumstances, e.g. under heavy load; after apps has been running for X hours, etc.

But aren't there tools for this already? Well, yes, but actually no.

Only real alternative is GDB which isn't exactly user-friendly, and it can't debug both Python and C/C++ at the same time.

There are also 2 Python tools austin and py-spy, those are however profilers and not debuggers.

Now, what is so special about PyStack and why it's superior to the above-mentioned tools? For a start, unlike GDB, PyStack only reads memory (optionally stops the process), and doesn't modify it. GDB can modify your code, which can be a big no-no.

It can also inspect all core dump files - which is a big statement, because core dumps can be wild. It's also able to prints local variables as part of stacktrace. And it automatically downloads debugging information (if not available) for the particular distro. That's just a sample of reasons why PyStack is very cool.

Setup

Before we start debugging, we need to do some setup:


sudo apt update
sudo apt install systemd-coredump python3-pip python3.10-venv

echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope

python3 -m venv ./venv
source ./venv/bin/activate

pip install pystack pytest-pystack

One of the features of PyStack is ability to inspect core dumps, so we need to enable core dump generation in our system - the simplest way to achieve that is to install systemd-coredump. All the core dumps will then be automatically stored in /var/lib/systemd/coredump/. Also, to be able to trace a process, PyStack needs to send ptrace syscalls to the traced process, we can temporarily enable that with echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope.

Finally, we need to install pystack itself, as well as the Pytest plugin for it.

Debugging

PyStack has 2 ways of debugging a program - either by attaching to a running process or by analyzing core dump of a crashed process.

Let's start with the former. We will use the following snippet that simply sleeps forever, simulating a hanging process:


# wait.py
from time import sleep

def wait():
    some_var = "data"
    other_var = [1, 2, 3]
    while True:
        sleep(5)
        print("Sleeping...")

wait()

We can run it in the background and then run pystack remote against its PID to see what's happening in the code:


nohup python wait.py &
# [1] 44000

pystack remote 44000 --locals --no-block
# Traceback for thread 44000 (python) [] (most recent call last):
#     (Python) File "/home/.../wait.py", line 18, in <module>
#         wait()
#     (Python) File "/home/.../wait.py", line 15, in wait
#         sleep(5)
#       Locals:
#         other_var: [1, 2, 3]
#         some_var: "data"

We already know that the program hangs on the sleep(5) call and the above output confirms it. It also shows us the local variables which can provide important context.

Here's also a screenshot, so that you can see nice color highlighting:

Unfortunately, the above example might not work on your Mac or Windows, because PyStack only supports Linux. It however works just fine in Docker!

Let's demonstrate that with another code snippet, this time actual deadlock of 2 threads:


import threading
import time

l1 = threading.Lock()
l2 = threading.Lock()

def f1(name):
    print('thread', name, 'about to lock l1')
    with l1:
        print('thread', name, 'has lock l1')
        time.sleep(0.3)
        print('thread', name, 'about to lock l2')
        with l2:
            print('thread', name, 'run into deadlock')

def f2(name):
    print('thread', name, 'about to lock l2')
    with l2:
        print('thread', name, 'has lock l2')
        time.sleep(0.3)
        print('thread', name, 'about to lock l1')
        with l1:
            print('thread', name, 'run into deadlock')


if __name__ == '__main__':
    t1 = threading.Thread(target=f1, args=['t1', ])
    t2 = threading.Thread(target=f2, args=['t2', ])

    t1.start()
    t2.start()
    t1.join()
    t2.join()

To make this into a container image, you can use the following Dockerfile:


FROM python:3.10
COPY deadlock.py .

RUN pip install pystack pytest-pystack

CMD python3 -u deadlock.py

To then build it, run it and debug it:


docker build -t python-pystack -f Dockerfile .
docker run --cap-add=SYS_PTRACE --name python-pystack --rm python-pystack

docker exec -it python-pystack /bin/bash
root@438137c1086c:/# ps aux

# USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
# root           1  0.0  0.0   2484   576 ?        Ss   15:20   0:00 /bin/sh -c python3 -u deadlock.py
# root           7  0.0  0.0 159348  7784 ?        Sl   15:20   0:00 python3 -u deadlock.py
# root          10  0.0  0.0   6056  3592 pts/0    Ss   15:20   0:00 /bin/bash
# root          16  0.0  0.0   8652  3204 pts/0    R+   15:20   0:00 ps aux

root@438137c1086c:/# pystack remote 7 --locals --no-block

When we run the container, we pass --cap-add=SYS_PTRACE to docker run - this gives the container privileges to trace a process. Next, we exec into it and find the process we want to trace. After that, we run pystack remote and get the following output:

Core Files

Sometimes we aren't so lucky to have the process running to be able to debug it live. Sometimes the application just crashes with nothing useful in logs. In those cases we might need to inspect the core dump of the process.

If you're unfamiliar with core files (core dumps), then here's definition:

The core dump refers to the recording of the state of the program, i.e. its resources in memory and processor.

Essentially, a snapshot of the process when it crashed - produced when the familiar message Segmentation fault (core dumped) occurs.

To analyze a core dump, we need to force Python to crash, for example by seg-faulting or aborting:


# crash.py
import os
import ctypes

def crash():
    some_var = "data"
    other_var = [1, 2, 3]
    print(some_var)
    ctypes.string_at(0)  # Segmentation Fault (core dumped)
    # os.abort()         # Aborted (core dumped)

crash()

If we then run this script, we will get the familiar message:


python crash.py
# Segmentation fault (core dumped)

cp /var/lib/systemd/coredump/core.python.1000.49...e5d.44204.1688657327000000.zst core.zst

unzstd core.zst
# core.zst            : 4292608 bytes

pystack core ./core --locals
# ...
pystack core ./core --native
# ...

And we can find the core file in the /var/lib/systemd/coredump/. It's in .zst format, which is a compressed archive. To uncompress it, we use unzstd. We can then analyze the core file with pystack core ..., which gives the following output:

Notice that here we again see all the local variables, which is very cool because we're working with a crashed program, so you can't just call __repr__ on variables.

While debugging a live process is often more convenient, you can also force the process to crash with kill -SIGABRT <PID> which will generate a core file.

As was mentioned in the beginning, you're most likely to find use for PyStack when using libraries like NumPy or PyTorch that have modules written in C/C++. So, let's look at a real NumPy bug that causes a segfault:


# pip install numpy
from multiprocessing import shared_memory
import numpy as np

def np_array_from_shm_name(name, shape=(2,)):
    shm = shared_memory.SharedMemory(name=name)
    np_array = np.ndarray(shape, buffer=shm.buf, dtype=np.int32)
    np_array[:] = (0, 0)
    return np_array

if __name__ == "__main__":
    shm = shared_memory.SharedMemory(create=True, size=8)
    np_array = np_array_from_shm_name(shm.name)
    np_array[:] = (0, 0)  # segfaults
    shm.unlink()

This is a code snippet adapted from this GitHub issue. If you run it, you will get a segfault and analyzing the generated core file with PyStack gives much more information for the issue, which would be very hard to gather without it:

Pytest

During initial setup we installed pytest-pystack along with pystack itself, so let's put it to some use. This Pytest plugin allows us to automatically run pystack remote on the process running our tests in case the particular test exceeds given timeout.

For demonstration, will use following two test cases:


# examples_test.py
import time
import threading
import deadlock

def test_sleeping_test():
    print("start")
    time.sleep(6)
    print("finished")

def test_deadlock():
    t1 = threading.Thread(target=deadlock.f1, args=['t1', ])
    t2 = threading.Thread(target=deadlock.f2, args=['t2', ])

    t1.start()
    t2.start()
    t1.join()
    t2.join()

First one simply sleeps for a couple of seconds, while the other reuses the deadlock example from earlier. To run the test suite with PyStack plugin enabled, use:


pytest -s --pystack-threshold=2 --pystack-args='--locals' --pystack-output-file='./pystack.log'

This will execute the tests as usual, but if any of them exceeds 2 seconds, then PyStack will inspect the process and output the trace into specified file. I won't paste the generated file here, because it's quite long, but you can view it in this Gist.

Closing Thoughts

While PyStack is a tool with quite a narrow use case, and you won't need it when debugging normal issues, it can be invaluable when issue like segfault or deadlock actually arises.

Being able to figure out what a running program is up to or what a program was doing when it crashed is incredibly useful and there really aren't any other tools that can do it so well. Also, if you're already using a profiler (e.g. py-spy or austin) or other tools that inspect stack information, then you should give PyStack a shot because these tools complement each other nicely.