Introduction

Passwords are a fact of life now, and they're the first line of defence when it comes to protecting your accounts online.

But how are passwords stored? What happens if an attacker steals the list of passwords? Here we find out how passwords are stored as hashes to make it more difficult to steal them, as well as some of the ways attackers get around this problem.

A password in a modern computer system is simply a series of characters. These can be arranged to form something memorable to a human, but could also come from a special device or piece of software for managing passwords.

When we log in to a computer, or a remote service, we give our username and password. That password is then checked against a stored version. If the username and password match, you're assumed to be the owner of that account.

Just like with ancient watch words, we have an immediate problem: eavesdropping. If someone can monitor your conversation, they can make a note of your password and then later pretend to be you. This is one of the reasons secure communication is so important. But, for now, let's assume we have solved that issue.

The typical password scenario requires that everyone has their own password and don't share. There must also be a list of passwords somewhere for us to be able to check them when people log-in. This list of passwords is an issue. The first ever case of a password system being compromised involved the password list being stolen, and every year we hear of more cases of stolen passwords, with many hundreds of lists being discovered shared and traded by hackers. If someone steals the list of passwords, they have access to all of the accounts! This is where hashing is used to protect passwords.

Hashing

The process of "hashing" is easy to explain, but difficult to correctly implement. Luckily, the hard work has been done and we have plenty of algorithms to choose from.

If we put some data into a hashing algorithm, we get some apparently random data back. The same input always leads to the same output, but you can't work-out the input from the output and any small change int he input leads to drastic changes in the output.

In password storage, this is used to protect the secret passwords of users. Rather than keep a list of passwords to check when someone logs in, which could be stolen, we store a list of hashes of passwords. So, if my password was "swordfish", I would find that stored against my username was some random-looking data. Since you can't undo the hashing and turn the random data back into a password, it's pretty secure, but how is it checked? Well, when I try to log in, the system takes the password I give it and hashes it, then compares the output to the stored hash value. If they match, I must have known the password!

Undoing Hashes

It's not /entirely/ true to say that we can't reverse a hash. We could try every possible password, putting each through the hash function, to see which ones match. This is called a brute force attack because it relies on nothing more than lots of time and computing power.

This is also why password choice is important. Usually, anyone trying to get into your account would be unable to keep guessing your password and hope to get in. Most systems would cause a small delay after an unsuccessful attempt, making it impossible to try every password in a reasonable amount of time. If the list of hashes is stolen, however, an attacker could try lots of passwords much faster because they can compare to the hash themselves.

It might seem that it's just as difficult to brute-force the password 'mittens' when we only know the hash as it would be to find, by brute force, the password 'sdfhg.28b!8GGG=' from its hash.

However, if we did try to brute-force these two, we would find the answer to the first one if we tried every word in the dictionary. It might take a while, but it could be done without too much difficulty. There are around 150,000 English words in common use -- let's say we include archaic and derivative words and make the total number of words 250,000. Assume it takes 1 second per hash. That means it would take around 3 days to generate a hash of every possible word, assuming a single thread and no parallelism. To find the second one, we would have to try every possible combination of letters, numbers and punctuation. Assume we limit the search to passwords of up to length 15 and there are 72 possible characters (26 lowercase, 26 uppercase, 10 digits and 10 punctuation marks), then the total number of possible passwords is: $72+72^2+72^3+\ldots +72^{15}=7.36×10^{27}$, which would take around 230,000,000,000,000,000,000 years at the rate of 1 hash per second.

Even worse, attackers have access to lists of previously calculated hashes, too. This means that there is probably already a database entry for 'mittens' and its hash, so 'reversing' it is really just a matter of looking up the answer. Attackers also use special tools to create lists of words with added numbers on the end, or replacing certain letters because of how often people use this to meet password requirements.

Conclusion

So, next time you pick a password, remember that it's not someone trying to guess that is the problem, but someone possibly using big databases of password-like words to reverse the hash!

DADA/Articles/st17_passwords.md

Introduction

Hashing

Undoing Hashes

Conclusion