Hashing Function Basics

A hashing function is an algorithm that outputs a unique value based on an input. This value is basically impossible to deduce without knowing all of the data used for its creation. The operations used for calculating the hash value are complex and layered. This makes the reverse-calculation of the original data almost impossible even for the most powerful computers. For a further look, check out this article.

How does hashing work?

The best known and most used hash functions are from the SHA (Secure Hash Algorithm) family. The number after the dash indicates the length of the hash that these functions generate. A SHA-256 algorithm generates a hash of 256 bits, a SHA-384 of 384 bits, and so on. Let’s look at a specific example where we create a fingerprint for the words “hash” and “hesh”, which we pass to the MD5 algorithm (128-bit length).

MD5 (“hash”) = 0800FC577294C34E0B28AD2839435945
MD5 (“hesh”) = AD8E9EC499F16542D9AC8873DDEF9AFE

Shouldn’t the result be 128 characters long? This hashes are 32-characters long because they are in hexadecimal notation. This means that each character can take 16 values (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E or F). Since bits are binary (they can only be 0 or 1), we need four bits to represent each character (2*2*2*2=2^4=16). In other words, if we multiply the hex hash length (32) by the number of bits (4), we get 128 bits. Now, notice that merely changing one letter (in this case, substituting “a” for “e”) generated a completely different fingerprint. The only thing that we can deduce from those hashes is the encryption length of the algorithm that generated it.

Practical uses

It is possible to create a unique and relatively short hash for any arbitrarily long text. The original phrase is impossible to retrieve from the hash. Programmers use these functions to check data integrity and store passwords. In the first case, we calculate a hash for the data and store it in a safe place. When we need to verify that the data has not changed, we calculate the hash again. If the hashes do not match, it is clear that the integrity of the data was compromised.

In the second case, when the user sets a password, his computer calculates the hash and the database receives and stores that hash. When the user logs in afterwards, the password he enters is hashed and then compared to the one in storage. The advantage is that even if someone snoops the data while in transit, or is able to read the database, they will not be able to decipher the password.

More uses for the everyday user

A case where an ordinary user may encounter hashes is when they download software from a server. The developer or distributor lists the hash of the original file on their site. When a user needs to download the program from an untrusted source, such as a mirror site, he should get the hash from the original website. After downloading the file, he must calculate the hash for that file he got and compare them. If someone altered the file by, for example, adding malware (malicious code), the hashes will not coincide. For this, the user must always remember to get the original hash from a trusted source. If the hash comes from the download server or someone associated with it, the listed hash would be the one corresponding to the altered file.

The last case we will list is when a user needs to verify if the data already stored in his own computer or server has not been altered. For this purpose, the user can install an integrity verification program that calculates the hashes for each of the selected files at one point in time. The user can then store these hashes in an offline medium. For example, an USB stick, or on a secure server. When he then wants to verify that no one or nothing has changed a file, he can re-calculate the hash for said file and compare it with the one he stored. As always, any discrepancy would mean a change is present.