Recipes and Tricks for Effective Structural Pattern Matching in Python

If you're a Python developer, then you're probably aware that match/case statement got introduced to the language in 3.10. But even though it looks like basic switch statement which we all know from other languages - in Python - it's much more than just an alternative if syntax.

In this article we will explore advanced features of match/case syntax - or as it's properly called - structural pattern matching. As well as tips and tricks for using it effectively, including recipes that will help you use it to it's full potential.

RegEx Matching

The match/case syntax provides a lot of matching patterns out-of-the-box, but unfortunately there's currently no native way to match against regular expressions. We can however, implement it pretty easily thanks to the fact that structural pattern matching uses == (__eq__) to evaluate the match. Therefore, all we need is a class that implements custom __eq__ method:


import re
from dataclasses import dataclass

@dataclass
class RegexEqual(str):
    string: str
    match: re.Match = None

    def __eq__(self, pattern):
        self.match = re.search(pattern, self.string)
        return self.match is not None

print(bool(RegexEqual("Something") == "^S.*ing$"))  # True

match RegexEqual("Something to match"):
    case "^...match":
        print("Nope...")
    case "^S.*ing$":
        print("Closer...")
    case "^S.*match$":
        print("Yep!")

The above could also be further extended to allow us to access RegEx capture groups, so that we can capture them into variables as part of the matching:


@dataclass
class RegexEqual(str):
    ...

    def __getitem__(self, group):
        return self.match[group]

match RegexEqual("Something to match"):
    case "^Some(.*ing).*$" as capture:
        print(f"Captured: '{capture[1]}'")  # Captured: 'thing'

JSON Processing

Common use case for match/case is efficient matching of JSON structures in form of Python's dictionaries. This can be done with mapping pattern which is triggered by case {...}: ... like so:


orders = [
    {"statusCode": 200, "id": 1345347, "price": 235.80, "items": ["HDD", "CPU", "Headphones", "Webcam"]},
    {"statusCode": 500, "id": 0, "price": 0, "items": []},
    {"statusCode": 202, "id": 3453, "price": 30.80, "items": ["Thumb Drive"]},
    {"statusCode": 404, },
]

def process_json(response: dict):
    match response:
        case {"statusCode": 200, "id": _, "price": _, "items": [*products]}:  # Capture list
            print(f"Order contains following products: {products}")
        case {"statusCode": code, "id": _, "price": _, "items": _} if code >= 400:  # Capture and guard
            print(f"Failed with status code: {code}")
        case {"statusCode": _, "price": _, "items": _}:
            print("Missing required field: ID")
        case {"statusCode": code, **fields}:  # Destructure rest of the dictionary
            print(f"Code: {code}, data: {fields}")

for order in orders:
    process_json(order)

# Order contains following products: ['HDD', 'CPU', 'Headphones', 'Webcam']
# Failed with status code: 500
# Missing required field: ID
# Code: 404, data: {}

I think that the above code nicely demonstrates the versatility of the feature:

  • In first case we can see that you can use variable capture to capture a subpattern,
  • In the second one we can see it paired with a guard, which can be handy when matching REST API response codes
  • Third one shows that you can use it to validate that all dict/JSON fields are present
  • Finally, in the 4th case we demonstrate that you can alternatively destructure part of the mapping so that you don't have to list out all individual fields

Set Membership

Similarly to RegEx matching shown earlier, we don't have an option to match against sets of values. We can however, once again take advantage of __eq__ and implement our own set-matching class:


from types import SimpleNamespace

class InSet(set):
    def __eq__(self, elem):
        return elem in self

Produce = SimpleNamespace(
    fruit=InSet({"apple", "banana", "peach"}),
    vegetable=InSet({"cucumber", "lettuce", "onion"})
)

food = "cucumber"

match food:
    case Produce.fruit:
        print(f"{food} is a fruit.")
    case Produce.vegetable:
        print(f"{food} is a vegetable.")

# cucumber is a vegetable.

The above example uses SimpleNamespace to wrap the individual sets into single data container. You might try to "simplify" this and use case fruit and case vegetable directly, that however won't work as it triggers capture pattern, which means that it would assign the value of food into fruit and vegetable respectively. To avoid this X.some_var must be used because dots always trigger the value pattern.

Matching Builtin Types

It's pretty common to write a conditionals in Python that test what the variable type is. You can use structural pattern matching for this, there's however a gotcha:


some_var = "not a float"

match some_var:
    case float:  # Wrong! - matches any subject, because Python sees float as a variable
        print(f"'{some_var}' is float")

# Prints: 'not a float' is float

some_var = 3.14

match some_var:
    case float():  # Correct!
        print(f"{some_var} is float")

# Prints: 3.14 is float

As with previous example about set membership, if you use something like case float or case int, you will trigger capture pattern, therefore effectively overriding builtin float function and making it a variable. Instead, you have to use case float(), which triggers a class pattern and tests whether the variable in match ... is of the specified type.

The above however only works for 9 builtin types, namely: bool, bytearray, bytes, dict, float, frozenset, int, list, set, str and tuple. If you want to match against other builtin types you will need to use Abstract Base Class, such as the ones listed here, e.g. case collections.abc.Iterable(): ....

We now know how to test type of variable, but what if we want to test just a raw type?


import builtins

type_ = int

# Matches raw types, not their instances
match type_:
    case builtins.str:
        print(f"{type_} is a String.")
    case builtins.int:
        print(f"{type_} is an Integer.")
    case _:
        print("Invalid type.")

# Prints: <class 'int'> is an Integer.

In that case we have to use Python's builtins module that gives us direct access identifiers of Python, which includes types such as builtins.str, builtins.dict or builtins.complex.

Matching Positional Arguments

By default, when using the class pattern to match a class such as case MyClass(key="value"): ..., you're required to use keyword arguments. That can however, be little verbose in case your class has many arguments. To solve this we can use __match_args__:


class Location:
    __match_args__ = ('country', 'city')

    def __init__(self, country, city):
        self.country = country
        self.city = city

def test_positional_args(location):
    match location:
        case Location("Germany", "Berlin"):
            print("Hallo Berlin!")
        case Location(_, "London"):
            print("There's London in multiple countries...")
        case Location("Canada", _):
            print("Hello Canada!")

test_positional_args(Location("Canada", "Toronto"))
# Prints: Hello Canada!
# Without __match_args__: TypeError: Location() accepts 0 positional sub-patterns (2 given)

__match_args__ class attribute allows us to specify tuple of instance attributes in order in which they will be used as positional arguments. Also, not all instance attributes have to be listed in __match_args__, consider putting only required ones in __match_args__ while leaving out the optional ones.

Additionally, if you use a dataclass, you get this feature out of the box, where order of definition is used for order of positional arguments:


from dataclasses import dataclass

@dataclass
class Location:
    country: str
    city: str

# __match_args__ present without explicitly defining it:
print(Location.__match_args__)
# ('country', 'city')

Soft Keywords

If you have old code base that happens to use case or match for variable names, then you might think that you'd have to refactor you code in order to upgrade to Python 3.10 which includes case and match keywords. That's however not the case, because both of these are "soft keywords", which means that they're considered a reserved words only in context where it makes sense.

Thanks to that, even the following (questionable/wild) code will work:


import re

support_ticket = "Support case no.: 152 is closed."
match = re.match(r"Support case no\.: (\d+) is (open|closed)\.", support_ticket)

match = match.groups() if match else None

match match:
    case case, "closed":
        print(f"Case {case} is done.")
    case case, "opened":
        print(f"Case {case} is still in progress.")
    case _:
        print(f"Case has unknown status")

Branch Reachability

While very powerful, structural pattern matching has its limitations and quirks, that you should be aware of.

One such limitation is branch reachability:


rows = [
    {"success": True, "value": 100},
    {"success": False, "value": 200},
    {"success": True, "value": 200},  # Should be matched by 3rd case
    {"success": False, "value": 200},
]

for row in rows:
    match row:
        case {"success": True, "value": _}:
            print("First")
        case {"success": _, "value": 200}:
            print("Second")
        # Unreachable, If we move it to the top, it will work correctly
        case {"success": True, "value": 200}:
            print("Third")
        case {"success": _, "value": _}:
            print("None matches")

# Prints:
# First
# Second
# First
# Second

In the example above, the third record in rows variable should clearly matched by the third case, but it isn't. Instead, it falls into the first one. To fix this we need to move the third case to the top and it will work as expected.

This just shows that order of cases matters and that you should be careful when writing these kinds of match/case statements because it can create hard to debug issues.

Hopefully, future versions of Python will include some level of code analysis that might catch at least some of these issues.

Exhaustiveness

Another hard to debug issue you might encounter stems from missing a valid branch/case - that is - when match doesn't cover all the possible cases. This can be mitigated by adding safety asserts like so:


from enum import Enum
from typing import NoReturn

class Color(Enum):
    RED = "Red"
    GREEN = "Green"
    BLUE = "Blue"

def exhaustiveness_check(value: NoReturn) -> NoReturn:
    assert False, 'This code should never be reached, got: {0}'.format(value)

def some_func(color: Color) -> str:
    match color:
        case Color.RED:
            return "Color is red."
        case Color.GREEN:
            return "Color is green."
    exhaustiveness_check(color)

some_func(Color.RED)

This will make the code throw an error at runtime, but that might be too little too late. Better solution is to use static type checker like mypy:


# python -m pip install -U mypy

def some_func(color: Color) -> str:
    match color:
        case Color.RED:
            return "Color is red."
        case Color.GREEN:
            return "Color is green."
    exhaustiveness_check(color)

some_func(Color.RED)
# `mypy examples.py` leads to:
# ...error: Argument 1 to "exhaustiveness_check" has incompatible type "Literal[Color.BLUE]"; expected "NoReturn"

After running the above code with mypy example.py we get an error telling us exactly which case is missing. See also mypy docs for details on exhaustiveness checking.

Similar feature is also available in Pyright, which is Microsoft's static type checker for Python.

Conclusion

Structural pattern matching in Python is very close in syntax to switch/case known from other languages, but it's much more than that - it's a control flow and destructuring tool.

To take full advantage of all its features, not just the advanced ones presented here, make sure to take a look at PEP 636 which provides an in-depth tutorial for the majority of its use cases.

Subscribe: