39. Cryptography
[status: content-mostly-written]
We will discuss a narrow slice of cryptography. We will start with a weak approach, and then look at how you could get a stronger encryption with key exchange and a pseudorandom number generator.
39.1. Preliminary: ASCII values
A key aspect of some of the techniques we will use is that each character correponds to an integer value. In English such integers are called the ASCII values of characters, and they are sequential for sequential letters. ASCII stands for “American Standard Code for Information Interchange”.
To familiarize yourself with the ASCII codes for each character take a look at the man page for ascii:
man ascii
and pay attention to the decimal and hex encoding of the various characters. You will recognize digits (ascii values 48 to 57), upper case letters (65 to 90) and lower case letters (97 to 122). There are also many standard keyboard symbols as well as control characters.
In Python you convert from a character to its ASCII value with
ord(ch)
. You can convert the other way from an ascii value to a
char with chr(num)
. For example, in the python interpreter you
could type:
>>> print(ord('e'))
>>> print(ord('r'))
>>> print(ord('r') - ord('e'))
>>> print(chr(97))
>>> print(chr(68))
## now look at the difference between digits and their ascii values
>>> print(chr(51))
>>> print(chr(51) - chr(49))
39.2. Weak crypto
We will look at two types of weak cyphers: Caesar and substitution cyphers. We will write programs that encode text with these cyphers, and then we will look at how to write programs to attack the cyphers and possibly decrypt messages.
39.2.1. A simple Caesar encryptor
In a Caesar code we shift each letter in the alphabet by a given number. If the shift is 5 then ‘a’ becomes ‘f’, ‘l’ becomes ‘q’, ‘x’ becomes ‘c’ and so forth. The world ‘hello’ rotated by 5 becomes ‘mjqqt’, and ‘hello’ rotated by 13 becomes ‘uryyb’.
This program will rotate its input by 13 characters. Our program will
shift the ascii values and then produce the corresponding characters.
Type the program caesar.py
in Listing 39.2.1 into a file
caesar.py
.
#! /usr/bin/env python3
import sys
def main():
shift = 13
encrypted_line = ''
fname = ''
if len(sys.argv) > 2:
print('error: usage is "%s [filepath]"' % sys.argv[0])
sys.exit(1)
elif len(sys.argv) == 2:
fname = sys.argv[1]
f = open(fname, 'r')
else:
fname = 'standard-input'
f = sys.stdin
## now go through the file, one character at a time
for c in f.read():
if c < 'a' or c > 'z': # we only handle 'a' <= c <= 'z'
encrypted_line = encrypted_line + c # add it with no encryption
continue
ascii_value = ord(c)
## find the new character by shifting this character
crypt_char = ascii_value + shift
if crypt_char > ord('z'):
## if we go beyond z we wrap back around to 'a'
crypt_char = (crypt_char - ord('a')) % 26 + ord('a')
# print(c, '->', crypt_char)
encrypted_line = encrypted_line + chr(crypt_char)
print('encrypted: ', encrypted_line)
main()
A very simple exercise: change caesar.py
to take an argument
which is the shift number for the Caesar cipher, that way it does
not have to be 13. You would see if sys.argv[1]
is set, and if
so use int(sys.argv[1])
as your offset.
As a slightly more elaborate exercise you could take caesar.py
and adapt it to handle upper case characters as well as lower case
ones.
39.2.2. Substitution ciphers
A simple substitution makes a table between clear text letters and encrypted letters. They don’t have have to be shifted by a fixed amount: the shift can be different for each character, but each clear text character will always map to the same secret encrypted letter.
Here is an example of substitution in which ‘a’ is mapped to ‘b’, ‘b’ to ‘o’, ‘c’ to ‘m’, ‘d’ to ‘l’, and so forth up to ‘z’ being mapped to ‘p’. There seems to be no rhyme or reason to the mapping. The word ‘hello’ woulud become ‘itxxn’:
abcdefghijklmnopqrstuvwxyz
bomltzhiqjgxwdnsvackeuyfrp
Note that the substitute characters do not have to be letters of the alphabet, as long as there is a bijective mapping (one-to-one correspondence) between the substitute characters and the letters of the alphabet.
A program which encrypts with that substitution cypher is
substitution.py
in Listing 39.2.2.
#! /usr/bin/env python3
import sys
def main():
original = list('abcdefghijklmnopqrstuvwxyz')
#substitute = list('bomltzhiqjgxwdnsvackeuyfrp')
substitute = list('omltzniqjgxwdhsvackeuyfrpb')
encrypted_line = ''
fname = ''
if len(sys.argv) > 2:
print('error: usage is "%s [filepath]"' % sys.argv[0])
sys.exit(1)
elif len(sys.argv) == 2:
fname = sys.argv[1]
f = open(fname, 'r')
else:
fname = 'standard-input'
f = sys.stdin
## now go through the file, one character at a time
for c in f.read():
if c < 'a' or c > 'z': # we only handle 'a' <= c <= 'z'
encrypted_line = encrypted_line + c
continue
## pos is the position in the original list
pos = ord(c) - ord('a')
## find the new character by taking that position in the
## substitute list
crypt_char = substitute[pos]
# print(c, '->', crypt_char)
encrypted_line = encrypted_line + crypt_char
print('encrypted: ', encrypted_line)
main()
39.2.3. A “literary” substitution cypher
Edgar Allan Poe wrote a short story called “The Gold-Bug” (see https://www.gutenberg.org/files/2147/2147-h/2147-h.htm#link2H_4_0006 and https://en.wikipedia.org/wiki/The_Gold-Bug and http://self.gutenberg.org/articles/eng/The_Gold-Bug )
In this story a Mr. Legrand is obsessed with decrypting a cryptogram purportedly written by the famous pirate Captain Kidd, giving directions to his mythical buried treasure. The cryptogram reads:
53‡‡†305))6*;4826)4‡.)4‡);806*;48†8
¶60))85;;]8*;:‡*8†83(88)5*†;46(;88*96
*?;8)*‡(;485);5*†2:*‡(;4956*2(5*—4)8
¶8*;4069285);)6†8)4‡‡;1(‡9;48081;8:8‡
1;48†85;4)485†528806*81(‡9;48;(88;4
(‡?34;48)4‡;161;:188;‡?;
39.2.4. Preparing to attack substitution cyphers: frequency analysis
The field of cryptanalysis is more than a thousand years old. Most classical cyphers were substitution cyphers, and the method of frequency analysis is effective in attacking those.
To see how this works, let us remember the words of Mr. Legrand in Edgar Allan Poe’s “The Gold-Bug”:
“Now, in English, the letter which most frequently occurs is e. Afterwards, succession runs thus: a o i d h n r s t u y c f g l m w b k p q x z. E predominates so remarkably that an individual sentence of any length is rarely seen, in which it is not the prevailing character.
“Here, then, we leave, in the very beginning, the groundwork for something more than a mere guess. The general use which may be made of the table is obvious […]
The key words are “[…] groundwork for something more than a mere guess”: the happiest words for a code breaker to hear.
Let us make a histogram plot of the frequency with which letters occur in English. How do we make such a plot? And how do we get a statistically valid sample of English text so that we can count how often letters typically occur?
To make the plot we can write the program letter_frequency.py
in
Listing 39.2.3.
#! /usr/bin/env python3
import sys
def main():
fname = ''
if len(sys.argv) > 2:
print('error: usage is "%s [filepath]"' % sys.argv[0])
sys.exit(1)
elif len(sys.argv) == 2:
fname = sys.argv[1]
f = open(fname, 'r')
else:
fname = 'standard-input'
f = sys.stdin
letter_count = [0]*26
for c in f.read():
if not c.isalpha(): # special cases: character is not alphabetic
continue
if ord(c) > 127: # or not ascii
continue
## special case: character is uppercase, so we fold it down to
## lowercase
if c.isupper():
c = c.lower()
## now we have a lower chase char, so we find its position in
## the alphabet
pos = ord(c) - ord('a')
## now increase the letter_count histogram for that position
letter_count[pos] += 1
f.close()
print_frequency_histogram(fname, letter_count, offset=ord('a'))
def print_frequency_histogram(title, hist, offset=0):
print('====== frequency histogram for %s ======' % title)
for i in range(len(hist)): # simple ascii table+histogram
c = chr(i+offset)
if not (i + offset in range(ord('A'), 1+ord('Z'))
or i + offset in range(ord('a'), 1+ord('z'))
or i + offset in range(ord('0'), 1+ord('9'))):
c = ' '
print('%3d -- %s %8d -- '
% (i, c, hist[i]), end="")
# % (i, chr(i + ord('a')), hist[i]), end="")
## now print a histogram of the letter X, but scale it down so
## the max has some 50 Xs
n_Xs_to_print = float(hist[i]) * 50.0 / max(hist)
for j in range(int(n_Xs_to_print)):
print('X', end="")
print()
if __name__ == '__main__':
main()
Now let us analyze three different samples of English words with our
letter_frequency.py
program.
39.2.4.2. A work by Shakespeare
King Lear by William Shakespeare can be found at http://www.gutenberg.org/ebooks/1128 and the plain text version is at: http://www.gutenberg.org/cache/epub/1128/pg1128.txt
We can download its text and analyze it with:
wget --output-document king-lear.txt http://www.gutenberg.org/cache/epub/1128/pg1128.txt
python3 ./letter_frequency.py king-lear.txt
...
====== frequency histogram for king-lear.txt ======
0 -- a 8373 -- XXXXXXXXXXXXXXXXXXXXXXXXX
1 -- b 1734 -- XXXXX
2 -- c 2603 -- XXXXXXXX
3 -- d 4519 -- XXXXXXXXXXXXX
4 -- e 16158 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
5 -- f 2512 -- XXXXXXX
6 -- g 2580 -- XXXXXXX
7 -- h 7046 -- XXXXXXXXXXXXXXXXXXXXX
8 -- i 7300 -- XXXXXXXXXXXXXXXXXXXXXX
9 -- j 89 --
10 -- k 1290 -- XXX
11 -- l 5150 -- XXXXXXXXXXXXXXX
12 -- m 3243 -- XXXXXXXXXX
13 -- n 7316 -- XXXXXXXXXXXXXXXXXXXXXX
14 -- o 9752 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
15 -- p 1841 -- XXXXX
16 -- q 78 --
17 -- r 7703 -- XXXXXXXXXXXXXXXXXXXXXXX
18 -- s 7150 -- XXXXXXXXXXXXXXXXXXXXXX
19 -- t 10797 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
20 -- u 4494 -- XXXXXXXXXXXXX
21 -- v 480 -- X
22 -- w 2726 -- XXXXXXXX
23 -- x 120 --
24 -- y 2794 -- XXXXXXXX
25 -- z 34 --
39.2.4.3. A contemporary English book
Looking at the books on Project Gutenberg sorted by release date at
https://www.gutenberg.org/ebooks/search/?sort_order=release_date
we can pick “An Ocean Tragedy” by William Clark Russell and download its text and then run our program with:
wget --output-document ocean-tragedy.txt https://www.gutenberg.org/files/56363/56363-0.txt
python3 ./letter_frequency.py ocean-tragedy.txt
...
====== frequency histogram for ocean-tragedy.txt ======
0 -- a 64744 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
1 -- b 12390 -- XXXXXX
2 -- c 18785 -- XXXXXXXXX
3 -- d 34787 -- XXXXXXXXXXXXXXXXXX
4 -- e 95677 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
5 -- f 20728 -- XXXXXXXXXX
6 -- g 18775 -- XXXXXXXXX
7 -- h 53974 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXX
8 -- i 59395 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
9 -- j 973 --
10 -- k 7401 -- XXX
11 -- l 34322 -- XXXXXXXXXXXXXXXXX
12 -- m 19202 -- XXXXXXXXXX
13 -- n 54177 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXX
14 -- o 60199 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
15 -- p 13126 -- XXXXXX
16 -- q 820 --
17 -- r 44082 -- XXXXXXXXXXXXXXXXXXXXXXX
18 -- s 51953 -- XXXXXXXXXXXXXXXXXXXXXXXXXXX
19 -- t 71954 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
20 -- u 22794 -- XXXXXXXXXXX
21 -- v 6639 -- XXX
22 -- w 21089 -- XXXXXXXXXXX
23 -- x 1308 --
24 -- y 14615 -- XXXXXXX
25 -- z 586 --
39.2.4.4. Conclusion: signatures
Take a close look at the three histograms, the one from
/usr/share/dict/words
has a certain feeling
39.2.4.5. Exercises
Discuss with your fellow students why the histogram of letters in
/usr/share/dict/words
does not resemble that in actual books.
Are there significant differences in the letter frequency between Shakespeare and the modern book? Discuss this with your fellow students.
Study the differnce in letter frequency distributions between short and long bits of text. Take a few single-paragraph files and look at their letter distribution and compare it to what you got with a full book.
Look at books in other languages and see what the letter
distribution is there. Can you identify the language a book was
written in just by looking at the distribution of letters? Note:
the program letter_frequency.py
will run on a european language
file that has accented letters (like Italian or German or French)
which are not part of ASCII, but it will do so by dropping all
those characters. You will want to adapt it to handle those
special accented characters.
39.2.5. Applying the frequency analysis to a message
Let’s do this as a sequence of exercises:
First type a reasonably long message in English into a file. Then
run your caesar.py
or substitution.py
on that file to
encrypt it and save the output to another file.
Email that file to a “friend” and to a hypothetical “foe”. Only tell your friend what the Caesar shift or the substitution map is, and see if the foe can crack the code.
Alternate being writer, friend and foe so that everyone in your group gets to decrypt a message.
Is it easier to decode short or long messages?
The foe should try to crack the code without knowing what the
substitution table is. How can she use the letter_frequency.py
program to do this?
39.3. Strong crypto
We have seen how easy it is to break simple substitution cyphers. Those patterns in the distribution of letters are a clear sign that we are not encrypting very intelligently.
Between the 1880s and 1910s stronger methods of encryption were invented. The one time pad is a theoretically unbreakable method of encryption. Both sender and recipient have the same random sequence of symbols, and each character they send gets folded in some way with the next symbol in the one time pad.
We will implement something similar to the one time pad, but first we need to review some aspects of binary representation of numbers.
39.3.1. Binary numbers, XOR, hiding the byte
Remember that characters are represented by their ASCII codes. The ASCII code for ‘g’ is 103 in decimal, 0x67 in hex, and 01100111 in binary.
We want to find a way to fold ‘g’ with some random byte (remember: a byte is an 8-bit number). Let’s choose 01101010. We could fold the two together in many different ways, but we want to then be able to separate them back out.
The approach that is usually used is to use the XOR operator. What is this?
We will apply the various logic operators to bits: if 0 means false and 1 means true, then we have the following truth tables:
0 OR 0 = 0
0 OR 1 = 1
1 OR 0 = 1
1 OR 1 = 1
0 AND 0 = 0
0 AND 1 = 0
1 AND 0 = 0
1 AND 1 = 1
NOT 0 = 1
NOT 1 = 0
Sometimes people find the behavior of OR to be counter-intuitive because “1 OR 1 = 1”. People sometimes remember their parents saying “you can have cake OR you can have ice cream but not both”. That violates “1 OR 1 = 1”, but parents don’t usually accept this logic. For parents and cryptographers we have another logical operation called XOR (exclusive OR). The truth table for XOR is:
0 XOR 0 = 0
0 XOR 1 = 1
1 XOR 0 = 1
1 XOR 1 = 0
This was done one bit at a time. Suppose you have the character ‘g’ (01100111) and our pattern 01101010. The XOR of those two is done bit-by-bit:
01100111 ('g')
XOR
01101010 (number from one time pad)
|
v
00001101 (encrypted number)
The cool thing about the XOR function is that if you XOR with the same number again you get back the original: all those bits get flipped back:
01100111 XOR 01101010 -> 00001101 (the encrypted number)
00001101 XOR 01101010 -> 01100111 (back to 'g'!)
What does this mean? It means that if you know the secret number you can encrypt your numbers, and you can also decrypt them! And someone who does not have have the secret number cannot make any progress.
39.3.2. Manipulating bits in python
Python offers us the “bitwise xor” operator which uses the symbol
^
Let us see examples of how to use it:
## in the python interpreter, at the >>> prompt, type the following
x = int('01100111', 2)
y = int('01101010', 2)
z = x ^ y
print('%d ^ %d = %d' % (x, y, z))
print('0x%x ^ 0x%x = 0x%x' % (x, y, z))
print('%s ^ %s = %s' % (bin(x), bin(y), bin(z)))
print('to restore the original number:')
print('%s ^ %s = %s' % (bin(x), bin(z), bin(z^y)))
print(bin(x), bin(z^y), x == z^y)
39.3.3. Revisiting random number generators
We have explored random numbers before, but our angle here is slightly different. We are now interested in how two different people can get the same stream of random numbers.
First let us look at an example of how to get the first ten numbers in a stream. At the Python interpreter type:
import random
for i in range(10):
print(i, random.randint(0, 255))
Now run the same thing in a different terminal. You will see that the two streams are quite different, even though you are using the same random number generator in both examples.
The reason for the difference is that the starting point for the sequence is different. This starting point is called the random number seed, and by default it is set to a value determined by some real time event, like the current wall clock time.
But almost all random number generators allow you to set the seed to a given value, so now let us try running the following code in two different terminals:
import random
random.seed(7419)
for i in range(10):
print(i, random.randint(0, 255))
The output will be the same for both: when we set an identical seed, the random sequences are identical.
Note that the seed (which I set to 7419) could be any integer – each different seed will produce a different sequence, but the if you use that integer twice then you get the same sequence.
This setting of the random number seed will be crucial to allowing a person to decrypt our message.
39.3.4. Implementing tougher encryption
The idea is be to generate a stream of random numbers, use them to encrypt a stream of bytes with the XOR operation. Later we will use that same sequence to decrypt the encrypted bytes.
The program which will encrypt text is pad_encrypt.py
in
Listing 39.3.1.
#! /usr/bin/env python3
import sys
import random
def main():
## we need a known sequence, so we use a known random number seed
seed = 1234
random.seed(seed)
fname = ''
if len(sys.argv) > 2:
print('error: usage is "%s [filepath]"' % sys.argv[0])
sys.exit(1)
elif len(sys.argv) == 2:
fname = sys.argv[1]
f = open(fname, 'r')
else:
fname = 'standard-input'
f = sys.stdin
## now process the file
encrypted_text = ''
for c in f.read():
# print(c, ord(c), bin(ord(c)))
ascii_value = ord(c) # turn char into an integer
if ascii_value > 255:
continue
next_random = random.randint(0, 255) # get next random number
## find the new character by XORing it with the random number
crypt_char = ascii_value ^ next_random
## add the encrypted character to the stream of encrypted text
encrypted_text = encrypted_text + chr(crypt_char)
dump_bytes_as_hex(encrypted_text)
def dump_bytes_as_hex(bytes, show_ascii=False):
"""print out a stream of bytes as 2-digit hex bytes; optionally
also show if there are valid ASCII substrings"""
hex_line = ''
ascii_line = ''
for i in range(len(bytes)):
if i > 0 and i % 15 == 0:
print_line(hex_line, ascii_line, show_ascii)
hex_line = ''
ascii_line = ''
hex_line += ' 0x%02x' % ord(bytes[i])
if bytes[i].isalnum():
ascii_line += bytes[i]
else:
ascii_line += '.'
## print the final bytes
print_line(hex_line, ascii_line, show_ascii)
def print_line(hex_line, ascii_line, show_ascii):
print(hex_line, end="")
if show_ascii:
## attempt at being "clever": print 4 spaces plus some extra
## if the line is shorter, so that the printable part is
## always aligned
print(' '*(4 + 5*(15 - len(ascii_line))), end="")
print(ascii_line, end="")
print()
main()
You can look at the result of encrypting a small file by typing a
paragraph into a text file and running pad_encrypt.py filename
and
for very short snippets you can do something like echo "just three
words" | python3 pad_encrypt.py
and get:
0x8b 0x4e 0x70 0x5a 0x31 0x5e 0x5a 0xc7 0x1c 0x6d 0x2f 0x7f 0xde 0x85 0x89
0x3f 0x24
As you can see the program pad_encrypt.py
outputs the encrypted
bytes as hex numbers that look like 0x5a. This is because these
encrypted bytes would seldom be readable as part of a string: most of
them would not be ascii values for anything printable.
39.3.5. Decrypting this tougher encryption
The program to decrypt text encoded by pad_encrypt.py
is
pad_decrypt.py
in Listing 39.3.2.
#! /usr/bin/env python3
import sys
import random
def main():
## we need a known sequence, so we use a known random number seed
seed = 1234
random.seed(seed)
fname = ''
if len(sys.argv) > 2:
print('error: usage is "%s [filepath]"' % sys.argv[0])
sys.exit(1)
elif len(sys.argv) == 2:
fname = sys.argv[1]
f = open(fname, 'r')
else:
fname = 'standard-input'
f = sys.stdin
encrypted_letter_count = [0]*256
decrypted_letter_count = [0]*26
decrypted_text = ''
for line in f.readlines():
words = line.split()
## extract the encrypted characters
for word in words:
crypt_char = int(word, 16)
encrypted_letter_count[crypt_char] += 1 # track a histogram
next_random = random.randint(0, 255) # get next random number
ascii_value = crypt_char ^ next_random
if ascii_value >= ord('a') and ascii_value <= ord('z'):
decrypted_letter_count[ascii_value-ord('a')] += 1 # track a histogram
decrypted_char = chr(ascii_value)
## add the decrypted character to the stream of decrypted text
decrypted_text = decrypted_text + decrypted_char
from letter_frequency import print_frequency_histogram
print_frequency_histogram('king lear encrypted',
encrypted_letter_count)
print()
print_frequency_histogram('king lear decrypted',
decrypted_letter_count, offset=ord('a'))
## now write the output to a file ending in _decrypted
fout = fname + '_decrypted'
open(fout, 'w').write(decrypted_text)
print('output written to %s' % fout)
def dump_bytes_as_hex(bytes, show_ascii=False):
"""print out a stream of bytes as 2-digit hex bytes; optionally
also show if there are valid ASCII substrings"""
hex_line = ''
ascii_line = ''
for i in range(len(bytes)):
if i > 0 and i % 15 == 0:
print_stuff(hex_line, ascii_line, show_ascii)
hex_line = ''
ascii_line = ''
hex_line += ' 0x%02x' % ord(bytes[i])
if bytes[i].isalnum():
ascii_line += bytes[i]
else:
ascii_line += '.'
## print the final bytes
print_stuff(hex_line, ascii_line, show_ascii)
def print_stuff(hex_line, ascii_line, show_ascii):
print(hex_line, end="")
if show_ascii:
print(' '*(4 + 5*(15 - len(ascii_line))), end="")
print(ascii_line, end="")
print()
if __name__ == '__main__':
main()
You will notice that pad_decrypt.py
has an interesting feature: it
uses the function print_frequency_histogram()
from
letter_frequency.py
to show us the frequency histogram for both
the encrypted and decrypted text:
...
====== frequency histogram for king lear encrypted ======
0 -- 611 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
1 -- 592 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
2 -- 586 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
3 -- 611 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
4 -- 628 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
5 -- 623 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
6 -- 628 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
7 -- 607 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
8 -- 633 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
9 -- 615 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
10 -- 631 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
...
249 -- 611 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
250 -- 584 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
251 -- 599 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
252 -- 641 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
253 -- 615 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
254 -- 597 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
255 -- 641 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
====== frequency histogram for king lear decrypted ======
0 -- a 7882 -- XXXXXXXXXXXXXXXXXXXXXXXXX
1 -- b 1345 -- XXXX
2 -- c 2221 -- XXXXXXX
3 -- d 4274 -- XXXXXXXXXXXXX
4 -- e 15706 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
5 -- f 2050 -- XXXXXX
6 -- g 2064 -- XXXXXX
7 -- h 6733 -- XXXXXXXXXXXXXXXXXXXXX
8 -- i 6345 -- XXXXXXXXXXXXXXXXXXXX
9 -- j 85 --
10 -- k 1039 -- XXX
11 -- l 4586 -- XXXXXXXXXXXXXX
12 -- m 2947 -- XXXXXXXXX
13 -- n 7042 -- XXXXXXXXXXXXXXXXXXXXXX
14 -- o 9503 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
15 -- p 1602 -- XXXXX
16 -- q 61 --
17 -- r 7450 -- XXXXXXXXXXXXXXXXXXXXXXX
18 -- s 6516 -- XXXXXXXXXXXXXXXXXXXX
19 -- t 10002 -- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
20 -- u 4442 -- XXXXXXXXXXXXXX
21 -- v 413 -- X
22 -- w 2320 -- XXXXXXX
23 -- x 118 --
24 -- y 2663 -- XXXXXXXX
25 -- z 31 --
output written to king-lear-encrypted-pad.txt_decrypted
You should be delighted at how the encrypted text has no discernible pattern in its frequency distribution, while the decrypted version has the pattern we are familiar with.
This means that one cannot decrypt this code using frequency analysis.
39.4. Further reading
39.4.1. Technical details
39.4.2. Historical
https://en.wikipedia.org/wiki/Venona_project which contains the statement “[…] Due to a serious blunder on the part of the Soviets, some of this traffic was vulnerable to cryptanalysis. The Soviet company that manufactured the one-time pads produced around 35,000 pages of duplicate key numbers, as a result of pressures brought about by the German advance on Moscow during World War II. […]”
“The Code Book”, by Simon Singh
39.4.3. Videos
Crash Course https://www.youtube.com/watch?v=jhXCTbFnK8o
Cris Moore’s talk on cryptography: https://www.youtube.com/watch?v=3J2RHGVf0ho