Wednesday, January 25, 2012

Password hashing And encryption

I happened to write this while I was diging how Magento stores password.
well magento stores hashed password separated by a two character salt in db. A sample looks like -

353dc2ba6108461cf3468184bdd0e174:QP
split = 353dc2ba6108461cf3468184bdd0e174:QP
magentoPass = split[0];
salt = split[1];

##Authenticate user entered manager.
# is used for concatanation
if( md(5) [split[1]#userenteredpasswd])==split[0])
{
    # User Authenticated
}


Now lets understand some cryptography Hashing and encryption.

Password hashing and encryption are two different things. Hashing is one way function while encryption is two way. Reverse encryption (called decryption) is possible but you can not reverse hash and generate your original string back.  You can find another string which has same hash value by brute force methods.

Collision - Hashing generates a fixed length output (called as message digest, checksum) for all input values which makes it possible to have two different strings (S1#S2) but H(S1)=H(S2). That is called collision. Probability of collision depends upon the algorithm and length of hashed output.

Collision



To secure passwords, cryptographic hash algorithms are used(MD5, SHA-1, SHA-2).
But Plain hashing is easily defeated using a dictionary attack, where an attacker just pre-hashes every word in a dictionary (or every combination of characters up to a certain length), then uses this new dictionary to look up hashed passwords. Salting, Key stretching, HMAC can be used to strengthen your hashing, will discuss that later. 

How hash algorithms work - Most algorithms(MD5, SHA-1, SHA-2, SHA-3) are based on Block Ciphers. Input string is divided into fixed length blocks and then padded, encrypted with keys, processed with complex operations (ADD/AND/XOR/ROTATE) sequentially to generate final output. That makes it hard to break. For example computing 40*11 is easy but factorizing 440 and generating 40 and 11 is a bit hard because of multiple possibilities. That is what it makes hard to break.

Salting -  

hash = md5(password + salt);
Instead of directly hashing the input you can add a pinch of salt to make it spicy :)
salt could be any digit random string, which you store along with the hashed value. You can store your passwords in db as hash:salt. For authentications when user enters a password, it's hash is calculated using entered password and salt(from db) and validated against the hash(from db).

Key Stretching - You add the salt and hash it for n times. You have to store that count(n) as well, along with salt.

hash = md5(password + salt); 
for (i = 0; i < 1000; i++) 
   hash = md5(hash + password + salt); 
}


Don't use


hash = md5(hash) 

in the loop, it will only increase the collision probability.

Common Hashing Algorithms-

CRC32 (Cyclic redundancy check) - Very simple hash function.Produces 32 bit long hash value and typically expressed as 8 digit hexadecimal string.
 
MD-5 - MD5 produces 128 bit(16 byte) hash value.  It is typically expressed as 32 digit hexadecimal string(each hexadecimal digit takes 4 bits). This algorithm has been broken and found vulnerable.

SHA-1 - SHA-1 produces 160 bit hash value. It is typically expressed as 40 digit hexadecimal string. It's approximately two-three times slower then MD5 algorithm This algorithm is also broken and found vulnerable.

SHA-2(SHA-512) - SHA-2 produces 512 bit hash value. typically expressed as 128 digit hexadecimal string. This is not yet broken.

HMAC - Unlike the other hashes mentioned above, HMAC (Hashed Message Authentication Code) is a key dependant hash.  HMACs are useful when authentication but not secrecy of a message is required.Current HMAC specification  is defined as

 # used for concatenation

H(key1 ∥ H(key2 ∥ message)). 


The cryptographic strength of the HMAC depends upon the size of the secret key that is used. The most common attack against HMACs is brute force to uncover the secret key. HMACs are substantially less affected by collisions than their underlying hashing algorithms alone. Therefore, HMAC-MD5 does not suffer from the same weaknesses that have been found in MD5.

How does  HMAC work?


Lets Define-
ipad = inner padding: the byte 0x36 repeated the same number of times as the block size
opad = outer padding: the byte 0x5c repeated the same number of times as the block size
text = the message we wish to compute the HMAC over.

The length of the key should be less than or equal to the block size (64 bytes for MD5 and SHA-1), though greater than the size of the message digest (16 bytes for MD5, 20 bytes for SHA-1). If the key length is greater than the block size, a (fixed length) hash of the key should be used.

To compute HMAC over the data 'text' the following steps are performed:

1. the key is appended with zero bytes until it equals the block size in length.
2. the key is XORed with ipad
3.text is appended to the result of 2
4.the hash algorithm is applied to the result of 3
5.the key is XORed with opad
6.the result of 4 is appended to the result of 5
7.the hash algorithm is applied to the result of 6.
 

How to break a Hash -
If you want to find a given plaintext for a certain hash there are two simple methods:
        - Hash each plaintext one by one, until you find the hash.
        - Hash each plaintext one by one, but store each generated hash in a sorted table so that you can  easily look the hash up later without generating the hashes again.
 Simple..  huh? well lets discuss that.

What is Rainbow Table-
Hash Function converts plain text into hashed output which cannot be dehashed. But how abut if there is a mapping table which stores every possible mapping from Hash Value to Plain Text, then you just need to see if your hash value exists in this table or not and you will get the plain text for that hash. That table is known as Rainbow table.  Obviouslly there is no such pre existing table so this is how you use Rainbow table to break a hash -

Inputs-
hash password(to be broken)
input password is 6 digit (numeric)
hash alogorithm used- Md5
Output-
input password


lets say my password is 493823
MD5("493823") -> "222f00dc4b7f9131c89cff641d1a8c50"
I have a Reduction function which takes first 6 numbers from the last generated hash. R("222f00dc4b7f9131c89cff641d1a8c50") -> "222004". 


This input is feed to MD5 and output is feed to Reduction function, and the cycle goes on. It represents a chain from starting input text to ending hash(you can choose to end at any hash). The table only stores the starting plaintext, and the final hash you choose to end with, and so a chain "containing" millions of hashes can be represented with only a single starting plaintext, and a single finishing hash.
Now we can use these chains to break the given hash password and find the input password.
The algorithm is:
  • start-Look for the hash in the list of final hashes, if it is there break out of the loop.
  • If it isn't there reduce the hash into another plaintext, and hash the new plaintext.
  • Goto the start point.
  • If the hash matches one of the final hashes, the chain for which the hash matches the final hash contains the original hash.







+Nishant


References- http://kestas.kuliukas.com/RainbowTables/



No comments:

Post a Comment