from __init__ import install_dependencies
await install_dependencies()
%reload_ext divewidgets
Python is a popular tool among hackers and engineers. In this lab, you will learn Cryptology in cybersecurity, which covers:
- Cryptography: Encryption and decryption using a cipher.
- Cryptanalysis: Devising an attack to break a cipher.
Symmetric key cipher¶
We first implement a simple cipher known as the Caesar cipher. Named after Julius Caesar, who used it to protect his private correspondence, this cipher involves shifting letters in the alphabet by a fixed number of positions:
The Caesar cipher
Encrypt/decrypt a character¶
How to encrypt a character?
The following code encrypts a character char
using a non-negative integer key
.
cc_n = 1114112
def cc_encrypt_character(char, key):
"""
Return the encryption of a character by an integer key using Caesar cipher.
Parameters
----------
char: str
a unicode (UTF-8) character to be encrypted.
key int:
secret key to encrypt char.
"""
char_code = ord(char)
shifted_char_code = (char_code + key) % cc_n
encrypted_char = chr(shifted_char_code)
return encrypted_char
For example, to encrypt the letter 'A'
using a secret key 5
:
cc_encrypt_character("A", 5)
The character 'A'
is encrypted to the character 'F'
as follows:
ord(char)
return the integer65
, which is the code point (integer representation) of the unicode of'A'
.(char_code + key) % cc_n
cyclic shifts the code by the key5
.chr(shifted_char_code)
converts the shifted code back to a character, which is'F'
.
Encryption steps | ||||||||
---|---|---|---|---|---|---|---|---|
char | ... | A | B | C | D | E | F | ... |
ord(char) | ... | 65 | 66 | 67 | 68 | 69 | 70 | ... |
(ord(char) + key) % cc_n | ... | 70 | 71 | 72 | 73 | 74 | 75 | ... |
chr((ord(char) + key) % cc_n) | ... | F | G | H | I | J | K | ... |
You may learn more about ord
and chr
from their docstrings:
help(ord)
help(chr)
How to decrypt a character?
Mathematically, we define the encryption and decryption of a character for Caesar cipher as
where is the character code and is the secret key, both of which are in . mod
operator above is the modulo operator. In Mathematics, it has a lower precedence than addition and multiplication and is typeset with an extra space accordingly.
The encryption and decryption satisfy the recoverability condition
so two people with a common secret key can encrypt and decrypt a character, but others without the key cannot. This defines a symmetric cipher.
The following code decrypts a character using a key.
def cc_decrypt_character(char, key):
"""
Return the decryption of a character by the key using Caesar cipher.
Parameters
----------
char: str
a unicode (UTF-8) character to be decrypted.
key: int
secret key to decrypt char.
"""
char_code = ord(char)
shifted_char_code = (char_code - key) % cc_n
decrypted_char = chr(shifted_char_code)
return decrypted_char
For instance, to decrypt the letter 'F'
by the secret key 5
:
cc_decrypt_character("F", 5)
The character 'F'
is decrypted back to 'A'
because
(char_code - key) % cc_n
reverse cyclic shifts the code by the key 5
.
Encryption steps | Decryption steps | ||||||||
---|---|---|---|---|---|---|---|---|---|
char | ... | A | B | C | D | E | F | ... | chr((ord(char) - key) % cc_n) |
ord(char) | ... | 65 | 66 | 67 | 68 | 69 | 70 | ... | (ord(char) - key) % cc_n |
(ord(char) + key) % cc_n | ... | 70 | 71 | 72 | 73 | 74 | 75 | ... | ord(char) |
chr((ord(char) + key) % cc_n) | ... | F | G | H | I | J | K | ... | char |
YOUR ANSWER HERE
Encrypt a plaintext and decrypt a ciphertext¶
Of course, it is more interesting to encrypt a string instead of a character. The following code implements this in one line.
def cc_encrypt(plaintext, key):
"""
Return the ciphertext of a plaintext by the key using the Caesar cipher.
Parameters
----------
plaintext: str
A unicode (UTF-8) message to be encrypted.
public_key: int
Public key to encrypt plaintext.
"""
return "".join([chr((ord(char) + key) % cc_n) for char in plaintext])
The above function encrypts a message, referred to as the plaintext, by replacing each character with its encryption. This is referred to as a substitution cipher.
def cc_decrypt(ciphertext, key):
"""
Return the plaintext that encrypts to ciphertext by the key using Caesar cipher.
Parameters
----------
ciphertext: str
message to be decrypted.
key: int
secret key to decrypt the ciphertext.
"""
# YOUR CODE HERE
raise NotImplementedError
Source
# tests
assert cc_decrypt(r"bcdefghijklmnopqrstuvwxyz{", 1) == "abcdefghijklmnopqrstuvwxyz"
assert cc_decrypt(r"Mjqqt1%\twqi&", 5) == "Hello, World!"
Source
# hidden tests
Another symmetric key cipher is the columnar transposition cipher. A transposition cipher encrypts a text by permuting instead of substituting characters.
# YOUR CODE HERE
raise NotImplementedError
Source
# tests
key = "ZEBRAS"
plaintext = "WEAREDISCOVEREDFLEEATONCE"
ciphertext = "EVLNACDTESEAROFODEECWIREE"
assert ct_encrypt(plaintext, key) == ciphertext
assert ct_decrypt(ciphertext, key) == plaintext
Source
# hidden tests
Brute-force attack¶
A result from the work of Claude Shannon, the Father of Information Theory, work is that the security of a secrecy system is fundamentally linked to the length of the key. A brute-force attack is a method of defeating a cryptographic system by systematically trying all possible keys until the correct one is found. The feasibility of a brute-force attack depends on the length of the key and the computational power available to the attacker.
Create an English dictionary¶
You will perform a brute-force attack to guess the key that encrypts an English text. The process is straightforward:
- Attempt to decrypt the ciphertext using various keys.
- Evaluate which of the resulting plaintexts is the most English-like.
To check whether a plaintext is English-like, we need to have a list of English words. One way is to type them out but this is tedious. Alternatively, we can obtain the list from the Natural Language Toolkit (NLTK):
%pip install nltk
import nltk
nltk.download("words")
from nltk.corpus import words
words.words()
returns a list of words. We can check whether a string is in the list using the operator in
.
for word in "Ada", "ada", "Hello", "hello":
print("{!r} in dictionary? {}".format(word, word in words.words()))
However, there are two issues:
- Checking membership is slow for a long list.
- Both ‘Hello’ and ‘ada’ are English-like but not in the words list.
# YOUR CODE HERE
raise NotImplementedError
Source
# tests
assert isinstance(dictionary, set) and len(dictionary) == 234377
assert all(word in dictionary for word in ("ada", "hello"))
assert all(word not in dictionary for word in ("Ada", "hola"))
Source
# hidden tests
Identify English-like text¶
To determine how English-like a text is, we calculate the following score:
where tokens are substrings, not necessarily English words, separated by white space characters in the text.
def tokenizer(text):
"""Returns the list of tokens of the text."""
return text.split()
def get_score(text):
"""Returns the fraction of tokens which appear in dictionary."""
tokens = tokenizer(text)
words = [token for token in tokens if token in dictionary]
return len(words) / len(tokens)
# tests
get_score("hello world"), get_score("Hello, World!")
As shown in the tests above, the code fails to handle text with punctuations and uppercase letters properly.
In particular,
- while
get_score
recognizeshello world
as English-like and returns the maximum score of 1, - it fails to recognize
Hello, World!
as English-like and returns the minimum score of 0.
Why? Every word in dictionary
- are in lowercase, and
- have no leading/trailing punctuations.
import string
def tokenizer(text):
"""Returns the list of tokens of the text such that
1) each token has no leading or trailing spaces/punctuations, and
2) all letters in each token are in lowercase."""
# YOUR CODE HERE
raise NotImplementedError
Source
# tests
assert tokenizer("Hello, World!") == ["hello", "world"]
assert get_score("Hello, World!") >= 0.99999
assert tokenizer("Do you know Jean-Pierre?") == ["do", "you", "know", "jean-pierre"]
assert get_score("Do you know Jean-Pierre?") >= 0.99999
Source
# hidden tests
Launch a brute-force attack¶
def cc_attack(ciphertext, threshold=0.6):
"""Returns a generator that generates the next guess of the key that
decrypts the ciphertext to a text with get_score(text) at least the threshold.
"""
# YOUR CODE HERE
raise NotImplementedError
Source
# tests
ciphertext = cc_encrypt("Hello, World!", 12345)
key_generator = cc_attack(ciphertext)
key_guess = next(key_generator)
assert key_guess == 12345
text = cc_decrypt(ciphertext, key_guess)
print("guess of the key: {}\nscore: {}\ntext :{}".format(key_guess, get_score(text), text))
Source
# hidden tests
- Shannon, C. E. (1949). Communication Theory of Secrecy Systems*. Bell System Technical Journal, 28(4), 656–715. 10.1002/j.1538-7305.1949.tb00928.x