Fairy tales in password hashing with scrypt

Update: the author of scrypt said that he's added a warning on top of scryptenc.h.

TL;DR: scrypt is a password-based key derivation function that is often used as a password hashing scheme. libscrypt is considered the official implementation of scrypt, but it's actually a file encryption API with the scrypt function never exposed directly. If one misuses libscrypt for password hashing they may end up with extremely weak hashes.

Password-based key derivation functions (PBKDF) are algorithms that take a password and some non-secret data (e.g., salt), and output one or more secret keys that can be further used in other crypto constructions. These functions are generally made deliberately slow so as to frustrate brute-force attack or dictionary attack on the input password.

scrypt is considered a state-of-the-art PBKDF, but its most common use is as a password hashing scheme, even though it wasn't originally designed for this purpose. The author of scrypt has never released a standalone library for the scrypt function; he's released only a utility, confusingly named scrypt, that uses the scrypt function to implement an encryption API.

This makes a lot of senses, as scrypt was originally built for Tarsnap, the online file backup service built by the same person. It's also, however, a crypto disaster waiting to happen: if one uses the file encryption API as a password hashing scheme they may end up with extremely weak hashes. Since the introduction of scrypt I've seen a few of such misuses, all of which were made by otherwise competent programmers.

The scrypt encryption API is declared as follows:

 * scryptenc_buf(inbuf, inbuflen, outbuf, passwd, passwdlen,
 *     maxmem, maxmemfrac, maxtime):
 * Encrypt inbuflen bytes from inbuf, writing the resulting inbuflen + 128
 * bytes to outbuf.
int scryptenc_buf(const uint8_t *, size_t, uint8_t *,
    const uint8_t *, size_t, size_t, double, double);

The intended usage of this function is to derive a key from the password using scrypt (with a randomly generated salt,) then use the derived key to encrypt the input buffer. I've found that many people that want to use scrypt as a password hashing function end up using scryptenc_buf.

In the first case the programmer used scryptenc_buf to encrypt an empty string using the input password as the key, and saved the output as the password hash. There's no badness: although the hash isn't compatible with scrypt anymore, it still has the same strength. It went downhill very fast from here though.

The second programmer used the same function, but she encrypted the input password with a static key. Since scryptenc_buf generates a random salt for each invocation, each password is probably encrypted with a unique key (derived from the random salt and the static key,) but it's still pretty bad: if anyone obtains the static key they can recover all passwords from their hashes. The developer knew that encrypting password is a bad idea, but she was confused by the API. After all the hashes looked random. As the saying goes, bad crypto is usually indistinguishable from good crypto.

The third programmer, for some reason that I've forgotten, wanted to use a single salt for all users, so he modified scryptenc_buf as follows:

scryptenc_buf(const uint8_t * inbuf, size_t inbuflen, uint8_t * outbuf, const uint8_t * passwd,
    size_t passwdlen, const uint8_t * salt, size_t saltlen,
    size_t maxmem, double maxmemfrac, double maxtime);

Like the second programmer, he encrypted the input password with a static key. With a static salt, all passwords are encrypted under one single key. The encryption algorithm used in scryptenc_buf is AES in counter mode with a zero IV. Did you spot the vulnerability? It's the classic keystream reuse: knowing a single password leads to the recovery of all passwords from their hashes. If one can register as a user they can recover all passwords. How comes hashing password introduces such a deadly vulnerability? I run out of people to blame.

The last case was a group of Java programmers. One of them wrote a JNI wrapper on top of libscrypt. The wrapper accepted a memcost parameter, which should be the same as \(N\) in the original scrypt paper, but somehow its author wanted it to be \(\log{N}\). When another programmer called this function he passed, however, \(N\), so the actual memcost parameter became super big. This mismatch should be caught easily, as libscrypt would return an error code if it couldn't allocate the required memory. Checking return value for errors seems not to be, however, a popular pattern in the Java world.

As a result no password was hashed, and all stored hashes were a series of zero bytes. Anyone could sign in to anybody else's accounts using any passwords. Fortunately, they brought my consulting team in very early in the development process, and I found the bug before they had any real users. Moral of the story? Always look at the output of your crypto.

If I develop a crypto library, I'll conduct user studies like how they do it in usability research. Give developers the library and ask them to conduct a specific task. Rinse and repeat until nobody would misuse it.