If you know some cryptography terminology, you should be familiar with concepts of encryption, hashing and signing. These concepts constitute the technology to provide human beings with secure means of information storage and transmission. Probably I missed something else, anyway these are the ones which you frequently confront everytime you deal with modern digital communication forms. Here I want to review them.

Modern-day cryptography heavily relies on assymetric, or public-key encryption algorithms, which use public key and private key. It is called assymetric because we should use two different keys to perform different functions. The pair of keys is generated with important property to be tied together mathematically: everytime someone encrypts a message with pre-shared public key, you, and only you, by design, can decrypt it with private key originating from the same key pair. This property makes it possible for anyone to communicate with you without being compromised. Attacker can’t get private key based only on what public key is. This scheme is vulnerable to MITM(man in the middle) attack where attacker switches the genuine public key with his own to listen on two endpoints' communication and intercepts it completely. He receives the public key belonging to Alice(very popular name in cryptography), generates his own pair of keys, substitutes Alice’s key with his public key and gives it to Bob, who is the person Alice wants to communicate with. Bob encrypts messages with MITM’s public key and sends it to MITM, now MITM can read the message because he owns the private key of the same pair.

Public-key encryption is opposed to symmetric key encryption where you have only one key to encrypt and decrypt data. Encryption algorithms of this type were known long before assymetric encryption emerged in post-WWII era. Caesar and Vigenere ciphers and Enigma cipher machine are probably most popular in this field, but they can’t be used for reliablle encryption nowadays. Anyway symmetric key encryption is still used till now. For example, AES is one of popular symmetric encryption algorithms. This scheme is also vulnerable to MITM, even more than public-key ciphers due to the methods the symmetric key has to be transmitted.

You might wonder why someone had to introduce public key cryptography if it still can’t protect you from MITM attacks. If you compromise the key of symmetric encryption algorithm, you can read and change messages between communication points without suspicion. Keeping it short, public-key encryption provides better key distribution (see https://en.wikipedia.org/wiki/Public-key_cryptography#Description).

The next important concept is hashing. Hashing is applying hash algorithm onto its argument(digital file, number, text, etc.) to produce random-looking output value of limited length. For example, SHA-256 generates 256-bit value, MD5 makes 128-bit value. Some criteria of a proper hashing algorithm:

  • It makes impossible(very often when you encounter this word in cryptography context it means “infeasible to do”) for anyone to uncover the value of argument based only on hash result;
  • It must not be based on random input, i.e. it should produce expected(deterministic) output everytime you use the same input;
  • It shouldn’t make hash collision too easy to do;
  • Any change to message should significantly change output hash value: MD5 output for “SHA-256” is d03884c5866de30f7185fc25e69d9b9e, while “SHA-255” makes 5446bcd0d10495c03c8ea0d95edf25ee. Don’t get confused if you try it out with several tools and get different result because different character encodings and “hidden symbols” like ASCII newline are to blame.

The most important property of any hashing algorithm is you can’t easily derive the data given to the hash function from its output value. The result should always be unpredictable. This is how one-way functions work.

Now here’s a bit of math: there is domain A which contains any possible input for hash algorithm, and there is limited domain B. Hash function maps data between the first and second domains. As domain B is limited, there is always going to be hash collision because we map infinite amount of data onto finite set. In other words, if hash function were to map words “cat”, “apple”, “sky”, “bee”… to “aaa”, “aab”, “aac”, we would get “cat”->“aab” and “sky”->“aab” inevitably. This is actually what hash collision is. It is the task of the applied algorithm to make hash collisions infeasible to generate.

Finally, signing is the procedure which allows third party to verify whether the information received is authentic and was not changed during transmission. This cryptographic routine correlates to authenticity and data integrity test, whereas encryption provides only confidentiality on its own.

Let’s imagine you are a software vendor who is about to distribute software updates. If you want to keep bad people from changing your update packets to contain malicious code on their way to your clients, they have to make sure the software packets are really released by you, and they weren’t changed at transmission.

You could do it this way:

  1. Generate a key pair on your side and transmit your public key to client. Signing process always takes place in context of assymetric encryption.

  2. Take a hash of software update files, and encrypt the hash with your private key, i.e. this is the step when you sign it.

  3. Encrypt software update files with either your client’s public key, or a symmetric key you already shared with him, and send encrypted data to him.

  4. Now your client decrypts the signature(the encrypted hash of update files) with your public key you made at 1st step, decrypts software files with either his private key, or preshared symmetric key, and checks the hash of update files. Yes, the client decrypts signature data with your public key, in signing process roles for keys in a key pair are reversed relative to normal encryption process.

Your client can now take the hash of decrypted software files and compare it with the hash received as signature. If they are equal, everything’s ok, otherwise someone is intercepting data traffic between you both. By design, it is impossible for an eavesdropper to forge your signature because he only knows your public key, and he can’t derive your private key with only that.

Encryption process provides you and the side you are talking to with confidentiality. No one can see what data you send to each other. Hashing ensures integrity and authenticity, or uniqueness at least. You could take hash for your whole filesystem and make out a table consisting of file name and file’s hash as entries. If any file were altered, you could compare hashes and see any changes, thus making sure there was no integrity breach. Also you can be highly confident none of your files would contain the same hash values, because it should be too hard to make hash collision with a decent hashing algorithm.

These concepts also correlate with CIA triad - Confidentiality, Integrity, Authenticity. It is impicitly used as criteria at developing secure data storage and communication systems.