Python is a popular choice for automating anything and everything, that includes automating system administration tasks or tasks that require running other programs or interacting with operating system. There are however, many ways to achieve this in Python, most of which are arguably bad, though.
So, in this article we will look at all the options you have in Python for running other processes - the bad; the good; and most importantly, the right way to do it.
The Options
Python has way too many builtin options for interfacing with other programs, some of them better, some of them worse, and honestly I don't like any of them. Let's quickly glance over each option and see when (if ever) it makes sense to use the particular module.
Native Tools
General rule of thumb should be to use native functions instead of directly calling other programs or OS commands. So, first let's look at the native Python options:
pathlib
- If you need to create or delete file/directory; check if file exists; change permissions; etc., there's absolutely no reason to run system commands, just usepathlib
, it has everything you need. When you start usingpathlib
, you will also realise that you can forget about other Python modules, such asglob
, oros.path
.tempfile
- Similarly, if you need a temporary file just usetempfile
module, don't mess with/tmp
manually.shutil
-pathlib
should satisfy most of your file-related needs in Python, but if you need for example to copy, move,chown
,which
or create archive, then you should turn toshutil
.signal
- in case you need to use signal handlers.syslog
- for an interface to Unixsyslog
.
If none of the above builtin options satisfy your needs, only then it makes sense to start interacting with OS or other programs directly...
OS Module
Starting from the worst options - os
module - it provides low-level functions for interacting with OS - many of which have been superseded by functions in other modules.
If you simply wanted to call some other program, you could use os.system
function, but you shouldn't. I don't even want to give you an example, because you simply should not use it.
While os
should be not be your first choice, there are a couple functions that you might find useful:
import os
print(os.getenv('PATH'))
# /home/martin/.local/bin:/usr/local/sbin:/usr/local/bin:...
print(os.uname())
# posix.uname_result(sysname='Linux', nodename='...', release='...', version='...', machine='x86_64')
print(os.times())
# posix.times_result(user=0.01, system=0.0, children_user=0.0, children_system=0.0, elapsed=1740.63)
print(os.cpu_count())
# 16
print(os.getloadavg())
# (2.021484375, 2.35595703125, 2.04052734375)
old_umask = os.umask(0o022)
# Do stuff with files...
os.umask(old_umask) # restore old umask
# Only if you need better random numbers than pseudo-random numbers from 'random' module:
from base64 import b64encode
random_bytes = os.urandom(64)
print(b64encode(random_bytes).decode('utf-8'))
# C2F3kHjdzxcP7461ETRj/YZredUf+NH...hxz9MXXHJNfo5nXVH7e5olqLwhahqFCe/mzLQ==
Apart from the function shown above, there are also functions for creating fd
(file descriptors), pipes, opening PTY, chroot
, chmod
, mkdir
, kill
, stat
, but I'd like discourage you from using them as there are better options. There's even section in docs that shows how to replace os
with subprocess
module, so don't even think about using os.popen
, os.spawn
or os.system
.
Same also goes for using os
module for file/path operations - please don't. Here's a whole section on how to use pathlib
instead of os.path
and other path-related functions.
Most of the remaining functions in os
module are direct interface to OS (or C language) API, e.g. os.dup
, os.splice
, os.mkfifo
, os.execv
, os.fork
, etc. If you need to use all of those, then I'm not sure whether Python is the right language for the task...
Subprocess Module
A second - little better - options that we have in Python is subprocess
module:
import subprocess
p = subprocess.run('ls -l', shell=True, check=True, capture_output=True, encoding='utf-8')
# 'p' is instance of 'CompletedProcess(args='ls -la', returncode=0)'
print(f'Command {p.args} exited with {p.returncode} code, output: \n{p.stdout}')
# Command ls -la exited with 0 code
# total 36
# drwxrwxr-x 2 martin martin 4096 apr 22 12:53 .
# drwxrwxr-x 42 martin martin 20480 apr 22 11:01 ..
# ...
As stated in docs:
The recommended approach to invoking subprocesses is to use the run()
function for all use cases it can handle.
In most cases it should be enough for you to use subprocess.run
, passing in kwargs to alter its behavior, e.g. shell=True
allows you to pass the command as a single string, check=True
causes it throw exception if exit code is not 0
, and capture_output=True
populates the stdout
attribute.
While subprocess.run()
is the recommended way to invoke processes, there are other (unnecessary, deprecated) options in this module: call
, check_call
, check_output
, getstatusoutput
, getoutput
. Generally, you should use only run
and Popen
:
with subprocess.Popen(['ls', '-la'], stdout=subprocess.PIPE, encoding='utf-8') as process:
# process.wait(timeout=5) # Returns only code: 0
outs, errs = process.communicate(timeout=5)
print(f'Command {process.args} exited with {process.returncode} code, output: \n{outs}')
# Pipes
import shlex
ls = shlex.split('ls -la')
awk = shlex.split("awk '{print $9}'")
ls_process = subprocess.Popen(ls, stdout=subprocess.PIPE)
awk_process = subprocess.Popen(awk, stdin=ls_process.stdout, stdout=subprocess.PIPE, encoding='utf-8')
for line in awk_process.stdout:
print(line.strip())
# .
# ..
# examples.py
# ...
First example above shows Popen
equivalent of previously shown subprocess.run
. However, you should only use Popen
when more flexibility is needed than run
provides, e.g. in the second example you can see how you can pipe output of one command into another, effectively running ls -la | awk '{print $9}'
. You can also see that we used shlex.split
, which is a convenience function that splits the string into array of tokens that can be passed into Popen
or run
without using shell=True
.
When using Popen
you can additionally use terminate()
, kill()
and send_signal()
for more interactions with the process.
In the previous examples we didn't really do any error handling, but there's a lot that can go wrong when running other processes. For simple-ish scripting, check=True
is probably enough as it will cause CalledProcessError
to be raised as soon as the subprocess runs into a non-zero return code, so your program will fail fast and loud, which is good. If you also set timeout
argument, then you can also get TimeoutExpired
exception, but generally, all of the exceptions in subproccess
module inherit from SubprocessError
, so if you want to catch exceptions, then you can simply watch for SubprocessError
.
The Right Way
Zen of Python states that:
There should be one-- and preferably only one --obvious way to do it.
But, so far, we've seen quite a few ways, all in Python's builtin modules, which one is the right one though? In my opinion... none of them...
While I love Python's standard library, I believe one of its missing "batteries" is a better subprocess
module.
If you find yourself orchestrating lots of other processes in Python, then you should at least take a look at sh
library:
# https://pypi.org/project/sh/
# pip install sh
import sh
# Run any command in $PATH...
print(sh.ls('-la'))
ls_cmd = sh.Command('ls')
print(ls_cmd('-la')) # Explicit
# total 36
# drwxrwxr-x 2 martin martin 4096 apr 8 14:18 .
# drwxrwxr-x 41 martin martin 20480 apr 7 15:23 ..
# -rw-rw-r-- 1 martin martin 30 apr 8 14:18 examples.py
# If command is not in PATH:
custom_cmd = sh.Command('/path/to/my/cmd')
custom_cmd('some', 'args')
with sh.contrib.sudo:
# Do stuff using 'sudo'...
...
When we invoke sh.some_command
, sh
library tries to look for builtin shell
command or a binary in your $PATH
with that name. If it finds such command, it will simply execute it for you. If the command is not in $PATH
, then you can create instance of Command
and call it that way. In case you need to use sudo
, you can use the sudo
context manager from contrib
module. So simple and straight-forward, right?
To write output of a command to a file you only need to provide _out
argument to the function:
sh.ip.address(_out='/tmp/ipaddr')
# Same as 'ip address > /tmp/ipaddr'
The above also shows how to invoke subcommands - just use dots.
And finally, you can also use pipes (|
) by using _in
argument:
print(sh.awk('{print $9}', _in=sh.ls('-la')))
# Same as "ls -la | awk '{print $9}'"
print(sh.wc('-l', _in=sh.ls('.', '-1')))
# Same as "ls -1 | wc -l"
As for the error handling, you can simply watch for ErrorReturnCode
or TimeoutException
exceptions:
try:
sh.cat('/tmp/doesnt/exist')
except sh.ErrorReturnCode as e:
print(f'Command {e.full_cmd} exited with {e.exit_code}')
# Command /usr/bin/cat /tmp/doesnt/exist exited with 1
curl = sh.curl('https://httpbin.org/delay/5', _bg=True)
try:
curl.wait(timeout=3)
except sh.TimeoutException:
print("Command timed out...")
curl.kill()
Optionally, if your process terminates from a signal you will receive SignalException
, you can check for specific signal with e.g. SignalException_SIGKILL
(or _SIGTERM
, _SIGSTOP
, ...).
This library also has builtin logging support, all you have to do is turn it on:
import logging
# Turn on default logging:
logging.basicConfig(level=logging.INFO)
sh.ls('-la')
# INFO:sh.command:<Command '/usr/bin/ls -la', pid 1631463>: process started
# Change log level:
logging.getLogger('sh').setLevel(logging.DEBUG)
sh.ls('-la')
# INFO:sh.command:<Command '/usr/bin/ls -la', pid 1631661>: process started
# DEBUG:sh.command:<Command '/usr/bin/ls -la'>: starting process
# DEBUG:sh.command.process:<Command '/usr/bin/ls -la'>.<Process 1631666 ['/usr/bin/ls', '-la']>: started process
# ...
The above examples should cover most use cases, but if you're trying to more advanced/obscure, then do check out tutorials or FAQ in library docs, which has additional examples.
Closing Thoughts
I want to stress again - you should always prefer native Python functions instead of resorting to using system commands. Also, always prefer using 3rd party client libraries such as kubernetes-client
or Cloud provider's SDK instead of running CLI commands directly. That - in my opinion - applies even if you're coming from SysAdmin background and are more comfortable with shell
than Python. And finally, while Python is a great and much more robust language than shell
, if you need to string together too many other programs/commands, maybe, just maybe you should just write shell
script instead.