It's Time to Say Goodbye to These Obsolete Python Libraries

With every Python release, there are new modules being added and new and better ways of doing things get introduced. We all get used to using the good old Python libraries and to certain way of doing things, but it's time upgrade and make use of the new and improved modules and their features.

Pathlib

pathlib is definitely one of the bigger, recent additions to Python's standard library. It's been part of standard library since Python 3.4, yet a lot of people still use os module for filesystem operations.

pathlib has however many advantages over old os.path - while os module represents paths in raw string format, pathlib uses object-oriented style, which makes it more readable and natural to write:


from pathlib import Path
import os.path

# Old, Unreadable
two_dirs_up = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
# New, Readable
two_dirs_up = Path(__file__).resolve().parent.parent

The fact that paths are treated as objects rather than strings also makes it possible to create the object once and then lookup its attributes or make operations on it:


readme = Path("README.md").resolve()

print(f"Absolute path: {readme.absolute()}")
# Absolute path: /home/martin/some/path/README.md
print(f"File name: {readme.name}")
# File name: README.md
print(f"Path root: {readme.root}")
# Path root: /
print(f"Parent directory: {readme.parent}")
# Parent directory: /home/martin/some/path
print(f"File extension: {readme.suffix}")
# File extension: .md
print(f"Is it absolute: {readme.is_absolute()}")
# Is it absolute: True

The one feature that I love the most about pathlib though, is possibility to use the / ("division") operator to join paths:


# Operators:
etc = Path('/etc')

joined = etc / "cron.d" / "anacron"
print(f"Exists? - {joined.exists()}")
# Exists? - True

This makes handling of paths so easy and really is a chef’s kiss 👌.

With that said, it's important to note that pathlib is only replacement for os.path and not a whole os module. It however includes also functionality from glob module, so if you're used to using os.path in combination with glob.glob, then you can just forget that those 2 exist.

In the above snippets we presented some handy path manipulations and object attributes, but pathlib also includes all the methods that you're used to from os.path, such as:


print(f"Working directory: {Path.cwd()}")  # same as os.getcwd()
# Working directory: /home/martin/some/path
Path.mkdir(Path.cwd() / "new_dir", exist_ok=True)  # same as os.makedirs()
print(Path("README.md").resolve())  # same as os.path.abspath()
# /home/martin/some/path/README.md
print(Path.home())  # same as os.path.expanduser()
# /home/martin

For full mapping of os.path functions to new ones in pathlib see docs .

For more examples of how great pathlib is, check out nice write-up by Trey Hunner.

Secrets

Speaking of os module, another part of it that you should stop using is os.urandom. Instead, you should use new secrets module available since Python 3.6:


# Old:
import os

length = 64

value = os.urandom(length)
print(f"Bytes: {value}")
# Bytes: b'\xfa\xf3...\xf2\x1b\xf5\xb6'
print(f"Hex: {value.hex()}")
# Hex: faf3cc656370e31a938e7...33d9b023c3c24f1bf5

# New:
import secrets

value = secrets.token_bytes(length)
print(f"Bytes: {value}")
# Bytes: b'U\xe9n\x87...\x85>\x04j:\xb0'
value = secrets.token_hex(length)
print(f"Hex: {value}")
# Hex: fb5dd85e7d73f7a08b8e3...4fd9f95beb08d77391

Using os.urandom isn't actually the problem here though, the reason secrets module got introduced is because people were using random module for generating passwords and such, even though random module doesn't produce cryptographically safe tokens.

As per docs, random module should not be used for security purposes. You should use either secrets or os.urandom, but the secrets module is definitely preferable, considering that it's newer and includes some utility/convenience methods for hexadecimal tokens as well as URL safe tokens.

Zoneinfo

Until Python 3.9, there wasn't builtin library for timezone manipulation, so everyone was using pytz, but now we have zoneinfo in standard library, so it's time to switch!


from datetime import datetime
import pytz  # pip install pytz

dt = datetime(2022, 6, 4)
nyc = pytz.timezone("America/New_York")

localized = nyc.localize(dt)
print(f"Datetime: {localized}, Timezone: {localized.tzname()}, TZ Info: {localized.tzinfo}")

# New:
from zoneinfo import ZoneInfo

nyc = ZoneInfo("America/New_York")
localized = datetime(2022, 6, 4, tzinfo=nyc)
print(f"Datetime: {localized}, Timezone: {localized.tzname()}, TZ Info: {localized.tzinfo}")
# Datetime: 2022-06-04 00:00:00-04:00, Timezone: EDT, TZ Info: America/New_York

The datetime module delegates all timezone manipulation to abstract base class datetime.tzinfo. This abstract base class needs a concrete implementation - before introducing this module that would most likely come from pytz. Now that we have zoneinfo in standard library we can use that instead.

Using zoneinfo however has one caveat - it assumes that there's time zone data available on the system, which is the case on UNIX systems. If your system doesn't have timezone data though, then you should use tzdata package which is a first-party library maintained by the CPython core developers, which contains IANA time zone database.

Dataclasses

An important addition to Python 3.7 was dataclasses package which is a replacement for namedtuple.

You might be wondering why would you need to replace namedtuple? So, these are some reasons why you should consider switching to dataclasses:

  • Can be mutable,
  • By default provides __repr__, __eq__, __init__, __hash__ magic methods,
  • Allows to specify default values,
  • Supports inheritance.

Additionally, dataclasses also support __frozen__ and __slots__ (from 3.10) attributes to give feature parity with named tuples.

And switching really shouldn't be too difficult, as you only need to change the definitions:


# Old:
# from collections import namedtuple
from typing import NamedTuple
import sys

User = NamedTuple("User", [("name", str), ("surname", str), ("password", bytes)])

u = User("John", "Doe", b'tfeL+uD...\xd2')
print(f"Size: {sys.getsizeof(u)}")
# Size: 64

# New:
from dataclasses import dataclass

@dataclass()
class User:
    name: str
    surname: str
    password: bytes

u = User("John", "Doe", b'tfeL+uD...\xd2')

print(u)
# User(name='John', surname='Doe', password=b'tfeL+uD...\xd2')

print(f"Size: {sys.getsizeof(u)}, {sys.getsizeof(u) + sys.getsizeof(vars(u))}")
# Size: 48, 152

In the above code we also included a size comparison, as that's one of the bigger differences between the namedtuple and dataclasses. As you can see, named tuples have significantly smaller size, which is due to dataclasses using dict to represent attributes.

As for the speed comparison, the access time for attributes should be mostly the same, or not significant enough to matter unless you plan to create millions of instances:


import timeit

setup = '''
from typing import NamedTuple
User = NamedTuple("User", [("name", str), ("surname", str), ("password", bytes)])
u = User("John", "Doe", b'')
'''

print(f"Access speed: {min(timeit.repeat('u.name', setup=setup, number=10000000))}")
# Access speed: 0.16838401100540068

setup = '''
from dataclasses import dataclass

@dataclass(slots=True)
class User:
    name: str
    surname: str
    password: bytes

u = User("John", "Doe", b'')
'''

print(f"Access speed: {min(timeit.repeat('u.name', setup=setup, number=10000000))}")
# Access speed: 0.17728697300481144

If the above persuaded you switch to dataclasses, but you're stuck in 3.6 or earlier you can grab a backport from https://pypi.org/project/dataclasses/.

Conversely, if you don't want to switch and really want to use named tuples for some reason, then you should at very least NamedTuple from typing module instead of the one from collections:


# Bad:
from collections import namedtuple
Point = namedtuple("Point", ["x", "y"])

# Better:
from typing import NamedTuple
class Point(NamedTuple):
    x: float
    y: float

Finally, if you don't use either namedtuple nor dataclasses you might want to consider going straight to Pydantic.

Proper Logging

This isn't a recent addition to standard library, but it bears repeating - you should use proper logging instead of print statements. It's fine to use print if you're debugging an issue locally, but for any production-ready program that will run without user intervention, proper logging is a must.

Especially considering that setting up Python logging is as easy as:


import logging
logging.basicConfig(
    filename='application.log',
    level=logging.WARNING,
    format='[%(asctime)s] {%(pathname)s:%(lineno)d} %(levelname)s - %(message)s',
    datefmt='%H:%M:%S'
)

logging.error("Some serious error occurred.")
# [12:52:35] {<stdin>:1} ERROR - Some serious error occurred.
logging.warning('Some warning.')
# [12:52:35] {<stdin>:1} WARNING - Some warning.

Just the simple configuration above will give you superior debugging experience in comparison to print statements. On top of that you can further customize the logging library to log to different places, change log levels, automatically rotate logs, etc. For examples on how to set up all of that see my previous article Ultimate Guide to Python Debugging.

f-strings

Python includes quite a few ways to format strings including C-style formatting, f-strings, template strings or .format function. One of them - f-strings - the formatted string literals - are just superior, though. They're more natural to write, more readable, and the fastest of the previously mentioned options.

Therefore, I think there's no point arguing or explaining why you should use them. There are however a couple cases where f-strings cannot be used:

Only reason to ever use % formatting is for logging:


import logging

things = "something happened..."

logger = logging.getLogger(__name__)
logger.error("Message: %s", things)  # Evaluated inside logger method
logger.error(f"Message: {things}")  # Evaluated immediately

In the example above if you use f-strings the expression would be evaluated immediately, while with C-style formatting, substitution will be deferred until it's actually needed. This is important for grouping of messages, where all messages with the same template can be recorded as one. That doesn't work with f-strings, because the template is populated with data before it's passed to logger.

Also, there are things that f-strings simply cannot do. For example populating template at runtime - that is, dynamic formatting - that's the reason f-strings are referred to as literal string formatting:


# Dynamically set both the template and its parameters
def func(tpl: str, param1: str, param2: str) -> str:
    return tpl.format(param=param1, param2=param2)

some_template = "First template: {param1}, {param2}"
another_template = "Other template: {param1} and {param2}"

print(func(some_template, "Hello", "World"))
print(func(another_template, "Hello", "Python"))

# Dynamically reuse same template with different parameters.
inputs = ["Hello", "World", "!"]
template = "Here's some dynamic value: {value}"

for value in inputs:
    print(template.format(value=value))

Bottom line is though, use f-strings wherever possible because they're more readable and more performant, be aware though that there are cases where other formatting style are still preferred and/or necessary.

Tomllib

TOML is widely used configuration format and is especially important to Python's tooling and ecosystem, because if its usage for pyproject.toml configuration files. Until now, you'd have to use external libraries to manage TOML files, but starting with Python 3.11, there will be builtin library named tomllib which is based on tomli package.

So, as soon as you switch to Python 3.11, you should get into habit of using import tomllib instead of import tomli. It's one less dependency to worry about!


# import tomli as tomllib
import tomllib

with open("pyproject.toml", "rb") as f:
    config = tomllib.load(f)
    print(config)
    # {'project': {'authors': [{'email': 'contact@martinheinz.dev',
    #                           'name': 'Martin Heinz'}],
    #              'dependencies': ['flask', 'requests'],
    #              'description': 'Example Package',
    #              'name': 'some-app',
    #              'version': '0.1.0'}}

toml_string = """
[project]
name = "another-app"
description = "Example Package"
version = "0.1.1"
"""

config = tomllib.loads(toml_string)
print(config)
# {'project': {'name': 'another-app', 'description': 'Example Package', 'version': '0.1.1'}}

Setuptools

Last one is more of a deprecation notice:

> As Distutils is deprecated, any usage of functions or objects from distutils is similarly discouraged, and Setuptools aims to replace or deprecate all such uses.

It's time to say goodbye to distutils package and switch to setuptools. setuptools docs provide guidance on how you should replace usages of distutils. Apart from that, also the PEP 632 provides migration advice for parts of distutils not covered by setuptools.

Conclusion

Every new Python release brings new features, so I'd recommend checking "New Modules", "Deprecated Modules" and "Removed Modules" sections in Python release notes, which is a great way to stay up to date with major changes to Python's standard library. This way you can continuously incorporate new features and best practices into your projects.

Now you might think that making all these changes and upgrades would require a lot of effort. In reality, you might be able to run pyupgrade on your project and upgrade the syntax to the latest Python version automatically where possible.

Subscribe: