Hashing
Hashing is an off-shoot of cryptography with very similar workings, but quite different requirements. In this article, we will look at how hashing helps us to securely store passwords, prove who wrote what, and even allow two people to talk openly about a secret without people listening in to know what it is!
Hash Functions
A hash function is an algorithm that maps data the message to an array of a fixed size (the hash value or digest).
The cool thing about hash functions are they are one way, that means that they are "impossible" to invert. This means that the only way to discover the message that was hashed would be a brute force search of all possible inputs.
Cryptographic Impossibility
In truth nothing is "impossible" in cryptography, given enough time and resources.
However, we have the concept of "Computationally Infeasible", IE it takes that much processing time and power to reverse the algorithm, that it may as well be impossible.
Characteristics of Hash functions
A Hash function should be deterministic, meaning that the input will always result in the same hash value.
Small changes to the input value will result in large changes to the Hashed value. This means it becomes impossible to correlate values between similar inputs and gain some insight into the message.
Collisions should be avoided. In hashing a collision is where two (or more) inputs give the same hashed value. In simple hash functions it is easy to get collisions, (although in some cases, such as storing data in a hash table this may not matter). While in modern cryptographic hashes it is extremely hard, or impossible :p, to find collisions.
Common Hashing Algorithms
Hash functions have evolved over time. Many have been proposed by developers, and some have been found to be insecure on modern hardware (like MD5 and SHA1). In this section we will have a quick run through the most common functions
Bcrypt
A common hashing function, that is the default in many Linux systems, and has libraries available for most programming languages.
Bcrypt also has the advantage of being a "slow" hashing function. Which can add some extra protection against brute force based attacks. While a fast algorithm like SHA-3 may be as secure mathematically, an attacker can generate many more hashes per second, making brute force faster.
SHA-2
The SHA family of hash functions (SHA-2 and versions of SHA with > 256 bits) are considered highly secure.
SHA-2 is widely used, and is considered a official crypto standard
The higher number or Bits in the hash (ie SHA-256, SHA-512) the greater the collision resistance of the hash.
While SHA-128 is considered weak, SHA-256 and above are considered strong enough for modern use
fast Hashing Function
This hashing function is classed as "fast". This makes them a bit more susceptible to brute force as we can try many more hashes.
However, they still have an important place outside of password storage, such as generating checksums.
SHA-3
Is an improved version of the SHA-2 algorithm.
Hashes generated with SHA-3 are considered more secure than SHA-2 hashes of the
same length.
They may also be known as
- Keccak
- SHAKE
SHA-3 is considered highly secure and is the recommended crypto standard in the US.
fast Hashing Function
This hashing function is classed as "fast". This makes them a bit more susceptible to brute force as we can try many more hashes.
The BLAKE family
BLAKE/ BLAKE2 are also considered to be secure
fast Hashing Function
This hashing function is classed as "fast". This makes them a bit more susceptible to brute force as we can try many more hashes.
"Insecure" hashing function
The following hash functions are considered insecure, and shouldn't be used for passwords (although they may have a use for checksums etc.)
- SHA-0
- MD4
- MD5
HASH SECURITY
So important I am going to SHOUT in BOLD
When it comes to using hashes (or any crypto) RTFM.
The world is constantly changing as new techniques are invented, and computers get faster.
This means that any of our hashing algorithms could become insecure.
Before using them check, to see if there are any recent notifications about the use of this particular algorithm.
Hash Functions in Programming
As well as passwords (see below) we use hash functions all the time.
-
Hash Table like data structures (like pythons dictionary) use hashes to store the data, giving us quick lookups.
-
Unique IDs that are used in things like Git, based on the hashed content of the repository.
-
NoSQL style databases often use hashes to generate the GUID.
Hashes and Passwords
One of the major uses of hash functions is storing passwords.
Read the story of the first computer password and the first computer password theft
First Password Theft #firstPassword
There are a few items worth discussing here:
If the system did use "knowledge based authentication" instead of a password, wouldn't it make that knowledge a password anyway? That is, if you had the password "swordfish1", is it really that different to choosing your mother's maiden-name? Is there a security difference between choosing a password that has no meaning and being asked for facts about you?
What does the theft of the password file tell us? Does it highlight any issues with storing collected passwords?
Passwords are a fact of life now, and they’re the first line of defence when it comes to protecting your accounts on-line. But how are passwords stored? What happens if an attacker steals the list of passwords? Here we find out how passwords are stored as hashes to make it more difficult to steal them, as well as some of the ways attackers get around this problem.
Hashing Passwords
A password in a modern computer system is simply a series of characters These can be arranged to form something memorable to a human, but could also come from a special device or piece of software for managing passwords.
When we log in to a computer, or a remote service, we give our username and password That password is then checked against a stored version. If the username and password match, you're assumed to be the owner of that account.
A pseudo code version of this might look like
userInput = getPassword()
#Networky Stuff
if userInput == storedPassword:
accessGranted()
else:
accessDenied()
But we have two problems here:
Eavesdropping.
If someone can monitor the process of sending the password to the server, then they could see the password itself, and later pretend to be you.
While its obvious that this can happen in an online context, where we have to transmit the password. We should also consider that keyloggers, or shoulder surfing exist, and someone can intercept your password as you type it.
Note
While we might not see the full password when someone types it, just watching the fingers on the keyboard can give clues. For example we can pick out when they hit the top row of the keyboard (symbols?), and try to infer patterns. Its a fun game to play.
Lets assume, for the moment that this is a solved problem: either you're logging in to a local machine, and nobody is looking over your shoulder; or you're logging in to a remote service over an encrypted channel. Either way, it's a separate problem and as interesting and full of caveats as it is, we'll focus on the problem of storing passwords for now.
### Storing the password
In our system there has to be a list of passwords somewhere. They have to be stored in order for us to check them.
This list of passwords is an issue. The first ever case of a password system being compromised involved the password list being stolen, and every year we hear of more cases of stolen passwords, with many hundreds of lists being discovered shared and traded by hackers. If someone steals the list of passwords, they have access to all of the accounts!
For example if our passwords are stored in plain text we could have a database that looks something like this2.
Username | Password |
---|---|
Eugene | god |
Margo | secret |
Lauren | sex |
Hal | love |
The problem with this is obvious. If the famous hacker "Crash Override" gets hold of the database file. The passwords for all the users are there in plain text.
Not just hackers
Also, think back to our discussions on the Insider threat.
Is it a good idea that your password is stored in plain text where any employee of the company might see it?
Imagine if Eugene is a disgruntled (or bored), if he can get access to the database, he knows everyone's password.
Hashing to protect passwords.
This is where hashing is used to protect passwords.
Rather than store the password in plain text, we instead store the hashed version of it. This (somewhat) resolves the problem of keeping passwords hanging around in plain text format for anyone to see. And if we have chosen a strong hashing algorithm, then any one that does get hold of the password list is going to have to go to some effort to get the plain-text password.
Our password database and algorithm for checking passwords now looks something like this
userInput = getPassword()
#Networky Stuff
hashedInput = hash(userInput)
if hashedInput == storedPassword:
accessGranted()
else:
accessDenied()
Username | Password |
---|---|
Eugene | a4757d7419ff3b48e92e90596f0e7548 |
Margo | 5ebe2294ecd0e0f08eab7690d2a6ee69 |
Lauren | 3c3662bcb661d6de679c636744c66b62 |
Hal | b5c0b187fe309af0f4d35982fd961d7e |
The Return of Crash Override
The benefits here are obvious. If our hacker (or Eugene) gets hold of the password list, he has no way of knowing what the original passwords were. So they are going to have to put in some hard work to get the original passwords.
Even Eugene knowing his own password (and the format of the hash) wont help, as there is no obvious relationship between the input and the hashed value.
Where to Hash #wheretohash
In the algorithm given above we do the hashing on the server. However, it could be possible to do the hash on the client then send the hashed password to the server. This would mean that if anyone managed to intercept the message, they would only have the hashed password.
Discuss this using the tag #wheretohash
Cracking Hashed Passwords
Lets now put on our Black Hats. We have the hashed list, how are we going to try to get those plain text passwords?
As hashes are "impossible" to reverse, our only option is to try to brute force the passwords. This means we have to run every possible password combination through the hashing algorithm until we find a match. This will then tell us the password for a given user (or the collision of a password for a user)
The pseudocode for cracking the hashes would look something like this.
for item in allOfTheThings:
if targetHash = hash(item):
return True
Some of the users may well have used common passwords. Without any other information, this doesn’t help because it’s just as difficult to brute force the password “swordfish” from its hash as it would be to find, by brute force, the password “sdfhg.28b!8GGG=” from its hash.
However, if we did try to brute-force these two, we would find the answer to the first one if we tried every word in the dictionary.
Trying pure brute force means we have to we would have to try every possible combination of letters,
numbers and punctuation. Assume we limit the search to passwords of up to length 15 and there are 72
possible characters (26 lowercase, 26 uppercase, 10 digits and 10 punctuation marks), then the total number of possible passwords is:
\(72+722+723+…+7225=7.36 \times 10^(1027)\) , which would take around 230000000000000000000
years at the rate of 1 hash per second.
Cracking Speeds
Gordon E Moores observations on computing speed are still not dead. With modern hardware a desktop computer can crack a several thousand hashes a second. GPU based cracking rigs have been known to hit a couple of Billion hashes a second (with trivial hashing algorithms)3 This reduces significantly (a hundred thousand or so), with more complex hash functions like Bcrypt.
So we don't want our hash functions to be too fast* to compute. Fast enough not to inconvenience the user but also strengthened against brute forcing.
Using password lists, it might take a while, but it could be done without too much difficulty. There are around 150,000 English words in common use – let’s say we include archaic and derivative words and make the total number of words 250,000. Assume it takes 1 second per hash. That means it would take around 3 days to generate a hash of every possible word, assuming a single thread and no parallelism. To find the second one.
Attackers have access to lists of previously calculated hashes too. This means that there is probably already a database entry for “mittens” and its hash, so “reversing” it is really just a matter of looking up the answer.
Crack Some hashes
See if you can find an on-line database of hashes and use it to break in to any of the accounts in our list of password hashes.
Rainbow Tables #rainbowtables
Read the article on Rainbow tables and discuss. We have only discussed their use in reversing hashes, but rainbow tables exist to speed up the decryption of mobile phone calls and other kinds of communication. What do you think are the best general steps to take to defend against this type of attack?
Understanding Rainbow Tables
http://www.drchaos.com/understanding-rainbow-tables/
Salts
If attackers can store databases of common passwords, even variations on common passwords and password schemes, how can we protect them? A Salt is used to dramatically increase the difficulty of reversing a hashed password that doesn’t rely on the user changing the complexity of their password.
To Salt a password we simply add something else (usually we append the salt to
the end of the text) to the password before it
is hashed. For example with a SALT of SALT our password of swordfish becomes
swordfishSALT, which is then hashed to 382d56fcea63e60bd01f0466f6a3858f
.
This means that using rainbow tables to look up pre-calculated hashes becomes "impossible". As we would have to generate a whole new table for each Salt that is used.
When picking a salt there are some considerations to be made:
-
The Salt should not be reused
We should have a different salt for every entry in our database. Otherwise we can simplify an attackers job. While it would take some time to generate a new rainbow table for each data breach, its not prohibitively time expensive.
-
The Salt should be random
Linking the salt to characteristics of the account is also a bad idea. Lets imagine we salt the password with the username. For example my salted password could be "Swordfish:aa9863"
This might seem like a good idea, as it means the attacker needs to brute force the hashes for every user on the system. However, there are plenty of common user names (for example admin) all we need to do is generate tables for those common names and we are good to go.
-
The Salt should be long
This goes back to the effort required. While it might take some time to create lookup tables for a 3 digit salt, gives 1000 different combinations. It wont take that much time....
However, A 16 Digit alphanumeric hash, takes us back to our "impossible" \(1 \times 10^{1026}\) permutations.
Storing Salts
As its main function is to increase the complexity of brute force attacks, the salt doesn't have to be kept secret. (It also needs to be stored somewhere, otherwise we wont know that it is when we need to use it)
A common approach is to store both the salt and the password as a single string. Containing details of both the hash and the password itself.
For example Linux stores the password details for all users in the file
/etc/shadow
The stored password information for the user kali with password kali looks like this.
kali:$6$bxUuAgeYfsaRVAA9$vbbptatuaIM/jrhIab7bIsmZQK87dZ3B6aGfvzHstVY9qZF3cJeUN8qomDkHgvLW9Kv3ImOrPvimSGrHdQXJI0:18529:0:99999:7:::
There is a lot of information here. With elements separated by :
But the interesting parts for us are the
first two items
- username
- password
The password information contains several parts separated by dollar symbols. \(id\)salt$hashed:
- The Hash Format \(n\)
- \(1\) == md5
- \(2\) == Blowfish
- \(5\) == SHA-245
- \(6\) == SHA-512
- The Salt
- The Hashed Password
So we can see from the example above the password details are
- Hash format (SHA-512)
- Salt
bxUuAgeYfsaRVAA9
- Password:
vbbptatuaIM/jrhIab7bIsmZQK87dZ3B6aGfvzHstVY9qZF3cJeUN8qomDkHgvLW9Kv3ImOrPvimSGrHdQXJI0
Storing Hashes #storingHashes
Its common practice to store the hash and salt as a single string, but it is sensible?
For example, in the Linux shadow file example above there is a lot of information that could help the bad guys break my password.
What are your views on this:
- What assumptions do you think are made to justify this decision
- What alternative approaches could you think of
Hashes for error checking
Hashes for authenticating messages
Now we have looked in more detail at hashes, we will investigate other ways they can be used to add proof of origin to messages or allow two parties to convince each other that the other knows the same secret as them without eavesdroppers from learning what the secret is or giving away the secret if the other party was lying.
A hash is a one-way function and we have already seen how we can use the equivalence of two hash values as a pretty good proxy for equivalence of the values that were hashed. Now we will see how we can apply this to ideas of “shared secrets” that can be verified between two parties without actually giving away the secret.
Hash based authentication codes
Research and discuss hash-based message authentication codes used in real products and services. Are there any uses that surprise you?
If you like to get technical, try reading about how such protocols are constructed to defend against a “length extension” attack. It is interesting to see that simply changing the order in which things are computed can change the security of a protocol even when the underlying components (in this case, the hashing algorithm) remain the same. In other words: having a cryptographically secure hash doesn’t mean its use will always lead to a cryptographically secure protocol.
-
Why is the password Always Swordfish. It could be due to the Marx Brothers ↩
-
Bonus points if you get the reference.... ↩
-
Everything you wanted to know about length extension attacks ↩