Almost every application requires some form of authentication, password handling or use of secure credentials such as API keys. You might not be security expert, but you should know how to deal with all these passwords and credentials securely to keep your application users' credentials and data protected as well as your own API keys and various token.
Keeping these security elements safe includes, generating them, verifying them, storing them securely and protecting them from adversaries. So, in this article we will explore Python libraries, tools and concepts that will help as with exactly that!
Prompting For Password
Let's start simple - you have basic Python application with command line interface. You need to ask user for password. You could use input()
, but that would show the password in terminal, to avoid that you should use getpass
instead:
import getpass
user = getpass.getuser()
password = getpass.getpass()
# Do Stuff...
getpass
is a very simple package that allows you to prompt user for password as well as get their username by extracting current user's login name. Be aware though that not every system supports hiding of passwords. Python will try to warn you about that, so just read warnings in command line.
Generating
Sometimes it might be preferable to generate a password rather than prompt user for one. For example if you want to set initial password that gets changed upon first login.
There isn't any library for generating passwords, but implementing it isn't difficult:
import string
import secrets
length = 15
# Choose wide set of characters, but consider what your system can handle
alphabet = string.ascii_letters + string.digits + string.punctuation
password = ''.join(secrets.choice(alphabet) for i in range(length))
The passwords generated using above code will be strong, but very hard to remember. If it's just an initial, temporary password or short-lived token then it's fine, but if user should is it for longer, then it's more appropriate to use passphrase instead.
We could build a passphrase generator like we did with simple passwords above, but why bother when there's library available for this. This library is called xkcdpass
after famous XKCD about password strength, and it does exactly what the comic describes - generates strong passphrase made of words:
# pip install xkcdpass
from xkcdpass import xkcd_password as xp
word_file = xp.locate_wordfile()
words = xp.generate_wordlist(wordfile=word_file, min_length=5, max_length=10)
for i in range(4):
print(xp.generate_xkcdpassword(words, acrostic="python", numwords=6, delimiter="*"))
# punch*yesterday*throwback*heaviness*overnight*numbing
# plethora*yesterday*thigh*handlebar*outmost*natural
# pyromania*yearly*twisty*hyphen*overstuff*nuzzle
# pandemic*yearly*theology*hatching*overlaid*neurosis
This snippet starts by finding a word/dictionary file on your system such as /usr/dict/words
and picks all the words of the specified length, from which it then generates a word list used for generating the passphrase. The generator itself has a few arguments which we can use to customize the passphrase. Apart from obvious ones like number of words and length, it also has acrostic parameter, which is a word whose characters will be used as first letters of words in the passphrase (sounds complicated? well, see the example passphrases above).
If you really wanted to build this yourself, instead of adding a dependency to your project, you can use this recipe in Python docs.
Hashing
Now that we asked user for password or generated it for them, what do we do with it? We might want to store it somewhere in database, but as you probably (hopefully) know, you should never store a password in its plaintext format. Why is that?
Well, passwords should never be stored in a recoverable format, whether plain text or encrypted. They should be hashed using a cryptographically-strong one-way function. This way if someone gets hold of the passwords in database, they will have very hard time recovering any actual passwords, because only way to recover any password from hash is to brute-force it - that is - taking possible plaintext passwords, hashing them with same algorithm and comparing results with the entries in database.
To make the brute-forcing more difficult, additionally salt should be used. Salt is a random string stored alongside the hashed password. It gets appended to the password before hashing, making it more random and therefore harder to guess (using rainbow tables).
However, with modern hardware that can attempt billions of hashes per second, making the password hard to guess isn't enough, therefore slow hash functions are used for password hashing making it much more inefficient for attacker to brute-force a password.
(Note: the above greatly over-simplifies logic and reasons for using these hash functions. For more thought-out explanation see for example this article.)
There are quite a few libraries and individual hashing algorithms out there, but the above requirements narrow our choice down significantly. The go to solution for hashing in Python should be passlib
as it provides proper algorithms, as well as high-level interface usable even by people who aren't well-versed with cryptography.
# pip install passlib
from passlib.hash import bcrypt
from getpass import getpass
print(bcrypt.setting_kwds)
# ('salt', 'rounds', 'ident', 'truncate_error')
print(bcrypt.default_rounds)
# 12
hasher = bcrypt.using(rounds=13) # Make it slower
password = getpass()
hashed_password = hasher.hash(password)
print(hashed_password)
# $2b$13$H9.qdcodBFCYOWDVMrjx/uT.fbKzYloMYD7Hj2ItDmEOnX5lw.BX.
# \__/\/ \____________________/\_____________________________/
# Alg Rounds Salt (22 char) Hash (31 char)
print(hasher.verify(password, hashed_password))
# True
print(hasher.verify("not-the-password", hashed_password))
# False
In this snippet we use bcrypt
as our algorithm of choice, as it's one of the most popular and well tested hashing algorithms. First we inspect its possible settings and check what is the default number of rounds used by the algorithm. We then modify the hasher to use higher number of rounds (cost factor) making the hashing slower and therefore, hashes harder to crack. This number should be the largest possible that doesn't cause intolerable delay for your users (~300ms). passlib
updates default rounds value periodically, so your don't necessarily need to change this value.
With hasher ready we prompt user for password and hash it. At this point we could store it in database, here for demonstration purposes, we go ahead and verify it against original plaintext password.
From the above code, we can see that whole usage of passlib
boils down to hash
and modify
methods of our algorithm of choice. If you however wanted more control over schemes, rounds, etc., then you can use CryptContext
class:
from passlib.context import CryptContext
ctx = CryptContext(schemes=["bcrypt", "argon2", "scrypt"],
default="bcrypt",
bcrypt__rounds=14)
password = getpass()
hashed_password = ctx.hash(password)
print(hashed_password)
# $2b$14$pFTXqnHjn91C8k8ehbuM.uSJM.H5S0l7vkxE8NxgAiS2LiMWMziAe
print(ctx.verify(password, hashed_password))
print(ctx.verify("not-the-password", hashed_password))
This context object allows us to work with multiple schemes, setting defaults or configuring cost factors. If your application authentication is simple, then this is probably not necessary, but in case you require ability to use multiple hashing algorithms, deprecate them, re-hash hashes or similar advanced tasks, then you might want to look into full CryptContext
integration tutorial.
Another reason why might you want to use CryptContext
is if you need to deal with operating system passwords such as the ones in /etc/shadow
. For that you can use preconfigured contexts available in passlib.hosts
, for more details see example here.
For completeness let me also list a couple other libraries available, including their (different) use-cases:
- bcrypt is a library and algorithm which we used above. This is the same code which is used by
passlib
and there's not really a reason to use this low-level library. - crypt is a Python standard library module that provides functions that could be used for password hashing. The algorithms provided are however dependent on your system, and the ones listed in docs aren't as strong as the ones shown above.
- hashlib is another builtin module. This one however includes strong hashing functions suitable for password hashing. Interface of this library makes the functions more customizable and therefore requires a bit more knowledge to use properly (securely). You could absolutely use functions from this module, such as
hashlib.scrypt
for hashing your passwords. - hmac, the last hashing module Python standard library has to offer is just not suitable for password hashing. HMAC is used to verify integrity and authenticity of message and doesn't have the properties required for password hashing.
Little side-note: With all the newly acquired knowledge about proper ways to store passwords, let's imagine that you forgot your password to some service. You click on "Forgot password?" on the website and instead of a recovery link, they send you your actual password. That means that they store your password in plaintext, and it also means that you should run away from that service (and if you used same password in other places, then change it).
Storing Securely
In the previous section we assumed that the intention was to store other users' credentials, but what about your own passwords that you're using to login into remote systems?
Leaving passwords in code is obviously terrible choice, as it's lying there in plaintext for anyone to see and you're also running risk of accidentally pushing it to git repo. A little better option would be to store it in environment variables. You can create .env
file, add it to .gitignore
, populate it with credentials needed for current project. You can then use dotenv
package to get all these variables into your application like so:
# pip install python-dotenv
import os
from os.path import join, dirname
from dotenv import load_dotenv
dotenv_path = join(dirname(__file__), ".env")
load_dotenv(dotenv_path)
API_KEY = os.environ.get("API_KEY", "default")
print(API_KEY)
# a3491fb2-000f-4d9f-943e-127cfe29c39c
This snippet first builds path to the .env
file using os.path
functions, which is then used to load the environment variables using load_dotenv()
. If your .env
file is in current directory like in the example above, then you can simplify the code and just call load_dotenv(find_dotenv())
which automatically finds the environment file. When the file is loaded, all that's left is to retrieve individual variables using os.environ.get
.
Alternatively, if you don't want to pollute your environment with application variables and secrets you can load them directly like this:
from dotenv import dotenv_values
config = dotenv_values(".env")
print(config)
# OrderedDict([('API_KEY', 'a3491fb2-000f-4d9f-943e-127cfe29c39c')])
The above solution is fine, but we can do better. Instead of storing passwords in unprotected file, we can instead use system's keyring, which is an application that can store secure credentials in encrypted file in your home directory. This file by default uses your user account login password for encryption, so it gets automatically unlocked when you login and you therefore don't have worry about extra password.
To use keyring credentials in Python applications, we can use library called keyring
:
# pip install keyring
import keyring
import keyring.util.platform_ as keyring_platform
print(keyring_platform.config_root())
# /home/username/.config/python_keyring # Might be different for you
print(keyring.get_keyring())
# keyring.backends.SecretService.Keyring (priority: 5)
NAMESPACE = "my-app"
ENTRY = "API_KEY"
keyring.set_password(NAMESPACE, ENTRY, "a3491fb2-000f-4d9f-943e-127cfe29c39c")
print(keyring.get_password(NAMESPACE, ENTRY))
# a3491fb2-000f-4d9f-943e-127cfe29c39c
cred = keyring.get_credential(NAMESPACE, ENTRY)
print(f"Password for username {cred.username} in namespace {NAMESPACE} is {cred.password}")
# Password for username API_KEY in namespace my-app is a3491fb2-000f-4d9f-943e-127cfe29c39c
In the above code, we start by checking location of keyring config file, which is the place where you can make some configuration adjustments if needed. We then check the active keyring and proceed with adding a password into it. Each entry has 3 attributes - service, username and password, where service acts as a namespace, which in this case would be a name of an application. To create and retrieve an entry, we can just use set_password
and get_password
respectively. In addition to that, also get_credential
can be used - it returns a credential object which has an attribute for username and password.
Closing Thoughts
Even if you're not security specialist, you're still responsible for basic security features of applications you build. This includes taking good care of users' data and especially passwords, so hopefully some of these examples and recipes will help you to do that.
Beyond the approaches and techniques shown in this article, the best way to handle passwords is to avoid using them altogether by delegating the authentication to OIDC provider (e.g. Google or GitHub) or by replacing them with key-based authentication and encryption, which we will dive into in the next article.