08/06/2017

504 views

I decided that I really wanted to grok (fully understand) RSA and public key encryption. RSA encryption is enormously important
and is used for things like validating certificates in HTTPS requests, moving credit card information over the internet, and
encrypting messages in secure chat applications. This article is my attempt to put together a single source that describes
everything you need to know to **really** understand it.

We're going to start with a high level example where we
**encrypt**
a message using a **public key**, and then decrypt it using a **private key**. Then we'll move on to explain all the key math
concepts you'll need to understand to grok RSA encryption in its fullness. Be warned, there is quite a lot of math. After that
we'll talk about how we generate our public and private keys. To appreciate this you will find a basic understanding of some programming
language helpful. By the time you have finished reading this article you should have the understanding required to perform your own
encryption using RSA.

• High Level Overview

• One Way Functions

• Prime Factorisation

• Coprimes

• Euler's Totient

• Euler's Theorum

• Generating Our Keys

• One Way Functions

• Prime Factorisation

• Coprimes

• Euler's Totient

• Euler's Theorum

• Generating Our Keys

We start with two pairs of numbers, and we call each pair our keys. For the overview we are going to assume we already have our keys. First we take our **public key**, the pair
of numbers **(5, 14)**, and pass it to the person we want to be able to send us messages. They then use this key to encrypt their message. Before
they encrypt it however, they will have to turn the message into a number (or a list of numbers). This can be as simple as encoding each character in
your method as their set unicode number or something like that. Imagine for example they want to send the character **'B'** and they convert it to
a **2**. To encrypt their message they take this number and do the following:

The word **mod**, in case you are wondering, is the modulus operation for getting the remainder of division by the two numbers. In the above it
is essentially the same as getting the remainder of **32** divided by **14**. Our friend then sends us the number **4** and we
use our **private key**, the pair **(11, 14)**, to convert it back to its original value. We do that by doing the following;

Once we have this number back we just have to convert it back into characters again. **2** becomes the letter **'B'** again. If we
want to send our friend a message back they will have to send us their public key so we can do the same process the other way around. This
is secure because we never send the detail required to **decrypt** the message to anyone, so it can never be stolen. And for reasons we'll
discuss later, working out the details for decrypting the messages without the private key is almost impossible (at least in any sensible
amount of time). Simple right? Well maybe. If you're anything like me you will have loads of questions at this point. How do we get those
keys? And how exactly does that work? We'll get to that. But first there are a few concepts you'll want to understand.

A one way function can easily compute an output from some input, but reversing the process and getting the input back from the output is hard. Easy and hard are obviously relative terms, but in the context of functions we tend to mean that easy processes have low complexity and hard processes have high complexity. The one way function we use in RSA is multiplying two very large prime numbers, which is easy. The reverse process is very hard, and is called prime factorisation.

Every number can be created as the product of a single unique combination of prime numbers. If the number is a prime number, then its single unique combination of primes is simply itself. We call this unique combination a numbers prime factors. Below are a few examples:

4 = 2 × 2

5 = 5

6 = 2 × 3

54 = 2 × 3 × 3 × 3

210 = 2 × 3 × 5 × 7

126945253 = 11261 × 11273

5 = 5

6 = 2 × 3

54 = 2 × 3 × 3 × 3

210 = 2 × 3 × 5 × 7

126945253 = 11261 × 11273

The process of taking a number and finding its prime factors is called, guess what, prime factorisation. Prime factorisation is a very hard problem. So hard in fact that it could take a computer 600,000 years to work out the prime factors of a 300 digit number. If you had the prime factors however, it would only take a tiny fraction of a second to multiply them.

Coprimes are numbers that share no prime factors, or are not divisible by any of the same numbers except **1**. For example
**4** and **9** are coprimes because the prime factors of **4** are **2 × 2** and the prime factors of **9**
are **3 × 3**. A prime numbers is coprime with all other numbers except those that are divisible by itself. This
also means that a prime number is coprime with all numbers below its value. It is also worth noting that **1** is coprime
to all numbers because it cannot share prime factors with other numbers (because it has none) and it is not divisible by any of
the same numbers except itself.

A number's **totient** is the number of whole positive numbers below it that do not share any prime factors. In other words
the totient of a number is the number of coprimes below its value. The totient of a number **n** is often written **Φ(n)**.
Working out a number's totient is a hard problem, but in RSA encryption we rely on two quirks to make it easy for us. The totient
of a prime number is always just the prime number minus one. As described earlier all numbers below a prime are coprimes with it.
Also, the results of totient calculations are multiplicative, which means that:

This means that if **a** and **b** are primes we can also work it out like so:

By knowing the prime factors of a part of our keys in advance we can easily work out its totient value. This is extremely useful for us when we generate our public and private keys.

The totient of 12 is the number of numbers below 12 that are coprime.

11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1

The totient of 12 is 4.

The totient of 7 is the number of numbers below 7 that are coprime.

6, 5, 4, 3, 2, 1

The totient of 7 is 6, i.e. 7-1 because 7 is prime.

11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1

The totient of 12 is 4.

The totient of 7 is the number of numbers below 7 that are coprime.

6, 5, 4, 3, 2, 1

The totient of 7 is 6, i.e. 7-1 because 7 is prime.

Euler's theorum is key to understanding how RSA encryption works.
It explains why we can convert between our message and the encrypted version using our keys and the **mod** operator.
What Euler's theorum says is that if we have two coprimes **a** and **n**, the following is always true:

For any two coprimes **a** and **n**, **a** to the power of the totient of **n** divided by **n** leaves a remainder of one.
An interesting little quirk you might think, but this turns out to be extremely useful.

At the beginning we described passing our **public key** to another person so that they can encrypt and send us messages. We then
described using our **private key** to decrypt those messages. So our public key **(e, n)** and private key **(d, n)** are both a
pair of values. We use the letter **e** for our public key as an abbreviation of **encrypt**, and the letter **d** in our
private key as an abbreviation for **decrypt**. So let's go through how we get these keys. First we pick two huge prime numbers
with more than a hundred digits each and call them **p** and **q**. **n** then becomes the product of **p** and **q**:

Now we need the totient of **n** to work out the public and private components of our key **e** and **d**. Remembering our trick
from earlier, we can get the totient of **n** because we know its prime factors **p** and **q**:

We can now use our totient **Φ(n)** to find **e** for our public key. To find **e** we just needs some value that is a coprime
of (shares no common divisors but one) **Φ(n)** but that is greater than **1** and less than **Φ(n)**.

And lastly we need to find **d** for our private key that we keep to ourselves. To get **d** we need to find a value that satisfies the
following condition:

The value of **d** can be quite difficult to find. As such there are a number of different algorithms that are used to find the value of
**d**. Although you can find it manually you will want to use one of these algorithms, for example the
extended Euclidean algorithm.

p, q = 2 large primes

n = pq

Φ(n) = (p - 1)(q - 1)

e = coprime of n where 1 < e < Φ(n)

d = some d where de mod Φ(n) = 1

Encrypted Message C = M^{e} mod n

Decrypted Message M = C^{d} mod n

n = pq

Φ(n) = (p - 1)(q - 1)

e = coprime of n where 1 < e < Φ(n)

d = some d where de mod Φ(n) = 1

Encrypted Message C = M

Decrypted Message M = C

The point of all this is to transfer a message that cannot be read by anyone except the intended recipient. To use a common metaphor, its like sending someone a lock that they can use to lock their message in a box and send it to us. But we only send them the lock without the key. If anyone were to try and intercept this box on the way back to its intended recipient they would have no way of opening it.

In RSA the lock is **(d, n)** and the key is **(e, n)**. If an impostor wants to find **e** they would need to work out
the prime factors of **n** and go through the process of generating the keys fresh. But as we discussed, this is a very hard
problem indeed. Chances are they wouldn't live long enough for a computer to give them the answer.

I hope you've found this break-down helpful. If you don't grok it and have questions please pop them in the comments below this article and I will try and get back to you. Your questions may also help me to revise the article and make it clearer.