{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "6saJj6PoXkOf"
},
"source": [
"# Cryptography"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Aue46UK2YMW6"
},
"source": [
"**Why cryptography?**\n",
"- Cryptographic analysis provides distinct advantages over some more en vogue comp neuro methods, like deep learning\n",
" - It sits at a crucial point between supervised and unsupervised learning: you don't need training labels, but you also pull on pre-existing knowledge about the structure of information in the world\n",
"- Cryptographic analysis centers on using the underlying non-uniform distribution of data in the world to decode information\n",
" - This is something humans naturally do to learn -- e.g. children use morpheme co-ocurance-rates to learn language\n",
" - On a similar note, human adults are able to remember larger numbers of social relationships when they exhibit properties that make them representable in a lower number of bits \n",
"\n",
"**What will we cover in this notebook?**\n",
"- A hands-on example of using cryptographic analysis to break a substitution cipher in English"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Run the cell below to download data and prepare our environment!"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 86
},
"id": "dinSRX3QXKn-",
"outputId": "bf98b7a0-059b-46c1-f0c4-3eba91941619"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[nltk_data] Downloading package words to /Users/f004p57/nltk_data...\n",
"[nltk_data] Package words is already up-to-date!\n",
"[nltk_data] Downloading package gutenberg to\n",
"[nltk_data] /Users/f004p57/nltk_data...\n",
"[nltk_data] Package gutenberg is already up-to-date!\n"
]
}
],
"source": [
"#file imports\n",
"import nltk\n",
"from nltk.corpus import words as wordlist\n",
"import re\n",
"import pandas\n",
"from collections import Counter\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import random\n",
"import os\n",
"from IPython.display import HTML, display\n",
"import string\n",
"\n",
"#set random seed and download corpora\n",
"os.environ['PYTHONHASHSEED'] = '0'\n",
"nltk.download('words')\n",
"nltk.download('gutenberg')\n",
"\n",
"#make sure the display fits the width of the page\n",
"def set_css():\n",
" display(HTML('''\n",
" \n",
" '''))\n",
"get_ipython().events.register('pre_run_cell', set_css)\n",
"\n",
"def get_letter_distribution(text, plot=False, title='', zipfian=False):\n",
" text = re.sub(r'[^a-z]', '', text.lower())\n",
" letter_counts = Counter(sorted(text))\n",
" if plot:\n",
" if zipfian:\n",
" letters, vals = zip(*letter_counts.most_common())\n",
" letters = np.array(letters)\n",
" vals = np.array(vals).astype('float64')\n",
" vals /= sum(vals)\n",
" plt.bar(letters, vals)\n",
" else:\n",
" plt.bar(letter_counts.keys(), np.array(list(letter_counts.values()))/len(text))\n",
" plt.ylim([0,0.2])\n",
" plt.xlabel('letter')\n",
" plt.ylabel('percentage of all letters')\n",
" plt.title(title)\n",
" plt.show()\n",
" return letter_counts\n",
"\n",
"alphabet = list(string.ascii_lowercase)\n",
"random.shuffle(alphabet)\n",
"cipher = dict(zip(sorted(alphabet), alphabet))\n",
"for letter in alphabet:\n",
" cipher[letter.upper()] = cipher[letter].upper()\n",
"\n",
"original_quote = nltk.corpus.gutenberg.raw(nltk.corpus.gutenberg.fileids()[1])\n",
"encrypted_quote = ''\n",
"for letter in original_quote:\n",
" if letter in cipher.keys():\n",
" encrypted_quote+=(cipher[letter])\n",
" else:\n",
" encrypted_quote+=(letter)\n",
"\n",
"def decrypt(encrypted_quote, reverse_cipher):\n",
" unencrypted_quote = ''\n",
" for letter in encrypted_quote:\n",
" if letter in reverse_cipher.keys():\n",
" unencrypted_quote+=(reverse_cipher[letter])\n",
" else:\n",
" unencrypted_quote+=(letter)\n",
" return unencrypted_quote\n",
"\n",
"def fix_single_letter_words(encrypted_quote, reverse_cipher):\n",
" single_letter_words = []\n",
" for word in re.sub(r'[^a-zA-Z ]', '', encrypted_quote.lower()).split():\n",
" if len(word) == 1:\n",
" single_letter_words.append(word)\n",
" (first, count1), (second, count2) = Counter(single_letter_words).most_common()[0:2] \n",
" if reverse_cipher[first] != 'a':\n",
" for letter in reverse_cipher.keys():\n",
" if reverse_cipher[letter] == 'a':\n",
" reverse_cipher[letter] = reverse_cipher[first]\n",
" if reverse_cipher[letter] == 'A':\n",
" reverse_cipher[letter] = reverse_cipher[first].upper()\n",
" reverse_cipher[first] = 'a'\n",
" reverse_cipher[first.upper()] = 'A'\n",
" if reverse_cipher[second] != 'i':\n",
" for letter in reverse_cipher.keys():\n",
" if reverse_cipher[letter] == 'i':\n",
" reverse_cipher[letter] = reverse_cipher[second]\n",
" if reverse_cipher[letter] == 'I':\n",
" reverse_cipher[letter] = reverse_cipher[second].upper()\n",
" reverse_cipher[second] = 'i'\n",
" reverse_cipher[second.upper()] = 'I'\n",
" return reverse_cipher\n",
"\n",
"english = set(wordlist.words())\n",
"def get_percent_real(recovered):\n",
" english = set(wordlist.words())\n",
" reals = [word in english for word in re.sub(r'[^a-zA-Z ]', '', recovered.lower()).split()]\n",
" percent_real = sum(reals)/len(reals)\n",
" return percent_real\n",
"\n",
"import copy\n",
"def swap_keys(reverse_cipher, L1, L2): \n",
" updated = copy.deepcopy(reverse_cipher)\n",
" L1V, L2V = reverse_cipher[L1], reverse_cipher[L2]\n",
" updated[L1], updated[L2] = L2V, L1V\n",
" updated[L1.upper()], updated[L2.upper()] = L2V.upper(), L1V.upper()\n",
" return updated\n",
"\n",
"def swap_keys_inplace(reverse_cipher, L1, L2): \n",
" L1V, L2V = reverse_cipher[L1], reverse_cipher[L2]\n",
" reverse_cipher[L1], reverse_cipher[L2] = L2V, L1V\n",
" reverse_cipher[L1.upper()], reverse_cipher[L2.upper()] = L2V.upper(), L1V.upper()\n",
"\n",
"\n",
"def improve_deciphering(second_pass, reverse_cipher, percent_real_before):\n",
" english = set(wordlist.words())\n",
" c = get_letter_distribution(second_pass)\n",
" dists = np.ones((26,26))\n",
" for i in range(26):\n",
" for j in range(i+1,26):\n",
" dists[i,j] = abs(c[alphabet[i]] - c[alphabet[j]])/sum(c.values())\n",
" percent_real = percent_real_before\n",
" indeces = np.argsort(dists.flatten())\n",
" while percent_real < 0.81:\n",
" for idx in indeces:\n",
" i,j=np.unravel_index(idx, (26,26))\n",
" L1, L2 = alphabet[i], alphabet[j]\n",
" updated = swap_keys(reverse_cipher, L1, L2)\n",
" unencrypted_quote = ''\n",
" for letter in encrypted_quote:\n",
" if letter in updated.keys():\n",
" unencrypted_quote+=(updated[letter])\n",
" else:\n",
" unencrypted_quote+=(letter)\n",
" reals = [word in english for word in re.sub(r'[^a-zA-Z ]', '', unencrypted_quote.lower()).split()]\n",
" percent_real = sum(reals)/len(reals)\n",
" print(percent_real)\n",
" if percent_real > percent_real_before+(0.82-percent_real_before)/4:\n",
" print(percent_real, '\\t', L1, '\\t',L2,'\\n\\n')\n",
" print(re.sub(r'[^a-zA-Z ]', '', unencrypted_quote.lower())[35:776])\n",
" swap_keys_inplace(reverse_cipher, L1, L2)\n",
" percent_real_before = percent_real\n",
" break\n",
" return reverse_cipher"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZXKEjXeO0Sgg"
},
"source": [
"Let's use some cryptanalysis to solve a letter subtitution cipher. The text below is a passage from a classic novel in which each letter has been replaced with another letter from the English alphabet, with unique pairings. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 329
},
"id": "Ta1c1ZqS1qn2",
"outputId": "3d60dcc9-772a-4fec-c5b5-1d55c361f4a0"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Skxdbjf 1\n",
"\n",
"\n",
"Rcf Yxabjf Jaacpb, pq Tjaangsk Kxaa, cg Rpijfrjbrkcfj, yxr x ixg ykp,\n",
"qpf kcr pyg xihrjijgb, gjzjf bppt hd xgn oppt ohb bkj Oxfpgjbxmj;\n",
"bkjfj kj qphge psshdxbcpg qpf xg ceaj kphf, xge spgrpaxbcpg cg x\n",
"ecrbfjrrje pgj; bkjfj kcr qxshabcjr yjfj fphrje cgbp xeicfxbcpg xge\n",
"fjrdjsb, on spgbjidaxbcgm bkj acicbje fjigxgb pq bkj jxfacjrb dxbjgbr;\n",
"bkjfj xgn hgyjaspij rjgrxbcpgr, xfcrcgm qfpi epijrbcs xqqxcfr\n",
"skxgmje gxbhfxaan cgbp dcbn xge spgbjidb xr kj bhfgje pzjf\n",
"bkj xaiprb jgeajrr sfjxbcpgr pq bkj axrb sjgbhfn; xge bkjfj,\n",
"cq jzjfn pbkjf ajxq yjfj dpyjfajrr, kj sphae fjxe kcr pyg kcrbpfn\n",
"ycbk xg cgbjfjrb ykcsk gjzjf qxcaje. Bkcr yxr bkj dxmj xb ykcsk\n",
"bkj qxzphfcbj zpahij xayxnr pdjgje:\n",
"\n",
" \"JAACPB PQ TJAANGSK KXAA.\n",
"\n",
"\"\n"
]
}
],
"source": [
"print(encrypted_quote[35:776])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "a_CL2tY92Q6e"
},
"source": [
"If we wanted to decrypt this, we could try a few approaches. We could brute force it, trying every single possible combination of unique letter replacement pairs. This comes out to $26! = 4.03e^{26}$ potential combinations, which would take a prohibitively long time to run.\n",
"
\n",
"A more fruitful aproach might be to use the underlying statistics of the English language to help us seed this process with educated guesses. Let's take a look at how often particular letters show up in English text (as approximated by a collection of classic novels from Project Gutenberg). "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 851
},
"id": "BVbthXmHutQJ",
"outputId": "9d8570fe-be96-43e6-a816-55128bd08088"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"for fileid in nltk.corpus.gutenberg.fileids()[-3:]:\n",
" words = re.sub(r'[^a-z]', '', nltk.corpus.gutenberg.raw(fileid).lower())\n",
" get_letter_distribution(words, plot=True, title=fileid)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kXcnEeaj0UyB"
},
"source": [
"From the above distributions, you can see that some letters are used consistently more frequently than others across these texts. Let's take a look at the distribution defined be the average of all of these texts:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 573
},
"id": "YO6n9vk_0Wwk",
"outputId": "10d1e8ad-4050-4007-ab6d-2d36f031a961"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"words = ''\n",
"for fileid in nltk.corpus.gutenberg.fileids():\n",
" words += re.sub(r'[^a-z]', '', nltk.corpus.gutenberg.raw(fileid).lower())\n",
"get_letter_distribution(words, plot=True, title='Averaged Distribution');\n",
"get_letter_distribution(words, plot=True, title='Averaged Distribution; Re-Ordered', zipfian=True);"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rfY5CytS1EQU"
},
"source": [
"With this knowledge about expected patterns of letter occurances, let's see how they compare to the statistics of our encrypted text."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 573
},
"id": "c7Mqrp-46HCV",
"outputId": "2e96736b-155f-4ec0-938a-81e905f3cec1"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAEWCAYAAABxMXBSAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3debxcRZ338c+XsC8BWVQIhLAEEXBEuQFcQEDAqGzPGCDKOoNmGIy4DD6GUREz+AjqjI6KSpBdFBAEgoJBQRGRJQtLFo2EsOSGIJCwL4GE3/NH1YWTzr23z0n63HRyv+/Xq1/pU6equk6nb//6VJ2qo4jAzMysrNVWdAPMzGzl4sBhZmaVOHCYmVklDhxmZlaJA4eZmVXiwGFmZpU4cNhKTdIQSSFp9eWo4yeSvtqi9gyW9LykAXn7j5I+2Yq6c303SDquVfWtbCQdL+nPffRa+0jq7IvXWtk4cKzCJD0k6aX8Rdb1+OGKbleRpNMl/azG+rveg+ckPS3pL5JOlPT6Zz8iToyI/ypZ1/695YmIRyJi/YhY3IK2L/XeRMSHI+Ki5a27xGtfKOmV/JlZIOl3knZczjoPknSXpBckzZd0qaQtW9Vm6zsOHKu+g/MXWddjdCsrX55f+n3o4IjYANgaOBP4EnBeq19kJXkvqvhWRKwPDALmshzvmaQRwM+B7wGbAjsDC4E/S3pTD2Va+n52nQXa8nPg6Ke6TvklfUfSU5IelPThwv6NJV0g6dG8/5qcvo+kTklfkvQYcIGkaZIOLpRdQ9KTkt5V6EoaleuaJ+mUnG848J/AkfmX7b05fUNJ5+W8cyWdUej6GZDb/KSk2cBHyx5zRDwTEeOBI4HjJO2S67xQ0hn5+aaSfp3PThZIulXSapIuAQYD1+W2/t/CsZ0g6RHg5h66zrbLv7SflXStpI2L72XD/8tDkvbv5b15vesrt+srkh6W9LikiyVtmPd1teM4SY/k9+vLhdd5v6SnS75vLwFXALsWym8h6SpJT+TPzsk9lZck4L+BMyLi5xHxUkQ8BnwSeB74fM53vKTbJH1X0nzgdEmbSBqf37u7gO0a6t4xnw0tkDRT0hGFfRdK+rGk6yW9AOzbW7slrZPLPCVpBjCszPvTHzlw9G97ADNJvwC/BZyX/8gBLgHWJf0yfDPw3UK5twIbk37BjwIuBo4u7P8IMC8i7i6k7QsMBQ4EviRp/4j4LfD/gMvz2dA7c94LgUXA9sC7cpmucYJPAQfl9A5gRNWDjoi7gE5gr252/0fetxnwFtKXd0TEMcAjvHEG961CmQ8Abwc+1MNLHgv8K7B5Pq7vl2hjT+9N0fH5sS+wLbA+0NgV+X7gbcAHgdMkvT3X/+eI2KhZOwAkrQd8HJiVt1cDrgPuJZ2NfBD4nKSejv9tpKD7y4ZjfA24CjigkLwHMJv03n8DOBt4mfTe/Wt+FNv1O9KZzJuBkcCPJO1UqO8TuZ4NgL80affXSIFpO9L/Zb8dS2rGgWPVd03+9dz1+FRh38MRcW7uj7+I9Mf5FkmbAx8GToyIpyLi1Yi4pVDuNeBrEbEw/xr9GfARSQPz/mNIgafo6xHxQkRMBS4gfREtRdJbSIHnczn/46SgNTJnOQL4XkTMiYgFwDeX7W3hUVLwa/Qq6X3YOh/3rdF8QbfTc1tf6mH/JRExLSJeAL4KHKHWdJscBfxPRMyOiOeBU4GRDWc7X8+/8O8lfWF2F4B6cko+K3mOFICOyenDgM0iYmxEvBIRs4FzeeP/qNGm+d953eybV9gP8GhE/CAiFgGvAB8DTsvv7zTS57TLQcBDEXFBRCzKP1SuAg4v5Lk2Im7LQeodTdp9BPCNiFgQEXMoEeD7q1WtT9aWdlhE/L6HfY91PYmIF/PJxvqkL9QFEfFUD+WeiIiXC2UflXQb8DFJV5OCzmcbyswpPH+Y9Efcna2BNYB5b5z8sFqh/Bbd1LUsBgELukn/NnA6cGN+/XERcWaTuuZU2P8w6fg27SFvFVuw5PE/TPqbfksh7bHC8xdJ/79lfSciviJpMPBb0pnDfaT/oy0auroGALcCSHq+kL4T8GR+vjnwYMNrbF7YD0u+V5vl4+np/3trYI+GdqzOkj9a5jTk77HdtO6ztcpz4LDuzAE2lrRRRHTXD97dL/CLSN1JqwO3R8Tchv1bAX/LzweTfvF3V9cc0qDppvlXZ6N5ua4ug3s8ih5IGkYKHEtd1hkRz5G6q/4jj4HcLGliRNzUTVtfL9bkJRvb+yrpy/IFUndgV7sGkL4sy9b7KOnLsFj3IuAfQMuuVoqIRyR9FrhI0q9J/0cPRsTQHvIvEZxy92cn6UzgW4X01UhnFNcUixeeP0E6nsbPTpc5wC0RUezqWqo5Dfl7bDdvfLamd/NaVuCuKltKRMwDbiD1F79JabB77ybFrgHeTTrTuLib/V+VtK6knYF/AS7P6f8AhuQvka7XvhH4b0kD8wDwdpI+kPNfAZwsaUulq3HGlD2uXN9BwGXAz3K3WWOegyRtn7/sngEWk7rmutq6bdnXKzha0k6S1gXGAlfm7sG/A2tL+qikNYCvAGsVyi3x3nTjF8DnJW0jaX3eGBPpLuA2Huc+kkrfUyEifkcKVKOAu4DnlC6QWEfpgoVdckDurmwApwBfkfQJSWtLeivwU2AgS46fFcstBn5FGiRfN49dFMcdfg3sIOmY/BldQ9KwrnGcbjRr9xXAqfkzvyXwmbLvT3/jwLHq67oKqOtxdclyx5B+Gf8NeBz4XG+Zc//+VcA2pD/2RreQBldvInWB3JjTuwZM50uakp8fC6wJzACeAq4kdWlA6pOeQOqvn9LDazW6TtJzpF+cXwb+hxS8ujMU+D3pap/bgR9FxB/yvm+SvvyeVr4yrKRLSAP+jwFrAydDusoLOIn0BTqXdAZSvMqqu/em6Pxc959IXUAvU/7LbivSYHEV3wb+L+ms8iDSVVYPks6efgps2FPBiLic9Jn6PDCf9H+7DvC+iJjfy2uOJnWvPUZ6Dy8o1Pkc6cKJkaSg9hhwFksG32IbFjdp99dJ3VMPkn68NI7TWSbfyMlaRdJpwA4RcXQhbQjpD3GNMr+ErW9I+inwy4iYsKLbYisfBw5rCaW5CXcDx0TEnwrpQ3DgMFul1NpVJWl4npQzS9JSfdGSviBphqT7JN0kaevCvuMk3Z8fxxXSd5M0Ndf5/cK8A1tB8iW+c4AbikHDzFZNtZ1x5CtE/k6a3NMJTAQ+HhEzCnn2Be7Ml4L+O7BPRByZf71OIk3wCmAysFtEPKU0e/Rk4E7geuD7EXFDLQdhZmZLqfOMY3dgVp6c9ArpSpZDixki4g8R8WLevIM3LiH8EPC7PBHnKdLs0OF5YtrAiLgjX6lxMXBYjcdgZmYN6pzHMYglJ9N0kpYT6MkJpEtAeyo7KD86u0lfiqRRpEsHWW+99XbbccflWtjTzKzfmTx58pMRsVljeltMAJR0NKlb6gPN8pYVEeOAcQAdHR0xadKkVlVtZtYvSOp29nydXVVzWXLG7JY5bQlK9zf4MnBIRCxsUnYuS86I7bZOMzOrT52BYyIwNM9qXZM0SWd8MYOkdwHnkILG44VdE4AD8wzON5Em+UzIs4qflbRnvprqWODaGo/BzMwa1NZVFRGLJI0mBYEBwPkRMV3SWGBSpPsifJs0K/SX+araRyLikIhYIOm/SMEHYGxeCRXSTNsLSbNOb+CNcREzM+sD/WICoMc4zMyqkzQ5Ijoa071WlZmZVeLAYWZmlThwmJlZJQ4cZmZWiQOHmZlV4sBhZmaVOHCYmVklDhxmZlaJA4eZmVXiwGFmZpU4cJiZWSUOHGZmVokDh5mZVeLAYWZmlThwmJlZJQ4cZmZWiQOHmZlV4sBhZmaV1Bo4JA2XNFPSLEljutm/t6QpkhZJGlFI31fSPYXHy5IOy/sulPRgYd+udR6DmZktafW6KpY0ADgbOADoBCZKGh8RMwrZHgGOB04plo2IPwC75no2BmYBNxayfDEirqyr7WZm1rPaAgewOzArImYDSLoMOBR4PXBExEN532u91DMCuCEiXqyvqWZmVladXVWDgDmF7c6cVtVI4BcNad+QdJ+k70paa1kbaGZm1bX14LikzYF3ABMKyacCOwLDgI2BL/VQdpSkSZImPfHEE7W31cysv6gzcMwFtipsb5nTqjgCuDoiXu1KiIh5kSwELiB1iS0lIsZFREdEdGy22WYVX9bMzHpSZ+CYCAyVtI2kNUldTuMr1vFxGrqp8lkIkgQcBkxrQVvNzKyk2gJHRCwCRpO6mf4KXBER0yWNlXQIgKRhkjqBw4FzJE3vKi9pCOmM5ZaGqi+VNBWYCmwKnFHXMZiZ2dIUESu6DbXr6OiISZMmrehmmJmtVCRNjoiOxvS2Hhw3M7P248BhZmaVOHCYmVklDhxmZlaJA4eZmVXiwGFmZpU4cJiZWSUOHGZmVokDh5mZVeLAYWZmlThwmJlZJQ4cZmZWiQOHmZlV4sBhZmaVOHCYmVklDhxmZlaJA4eZmVXiwGFmZpU0DRyS1pO0Wn6+g6RDJK1Rf9PMzKwdlTnj+BOwtqRBwI3AMcCFZSqXNFzSTEmzJI3pZv/ekqZIWiRpRMO+xZLuyY/xhfRtJN2Z67xc0ppl2mJmZq1RJnAoIl4E/hn4UUQcDuzctJA0ADgb+DCwE/BxSTs1ZHsEOB74eTdVvBQRu+bHIYX0s4DvRsT2wFPACSWOwczMWqRU4JD0HuAo4Dc5bUCJcrsDsyJidkS8AlwGHFrMEBEPRcR9wGtlGitJwH7AlTnpIuCwMmXNzKw1ygSOzwKnAldHxHRJ2wJ/KFFuEDCnsN2Z08paW9IkSXdI6goOmwBPR8SiZnVKGpXLT3riiScqvKyZmfVm9d525u6mQ4pdRRExGzi57oYBW0fE3ByobpY0FXimbOGIGAeMA+jo6Iia2mhm1u/0esYREYuB9y9j3XOBrQrbW+a0UiJibv53NvBH4F3AfGAjSV0Br1KdZma2/Mp0Vd0tabykYyT9c9ejRLmJwNB8FdSawEhgfJMyAEh6k6S18vNNgfcBMyIiSN1kXVdgHQdcW6ZOMzNrjTKBY23SL/39gIPz46BmhfI4xGhgAvBX4Io8RjJW0iEAkoZJ6gQOB86RND0XfzswSdK9pEBxZkTMyPu+BHxB0izSmMd55Q7VzMxaQelH/Kqto6MjJk2atKKbYWa2UpE0OSI6GtPLzBzfQdJNkqbl7X+S9JU6GmlmZu2vTFfVuaTLcV8FyPMuRtbZKDMza19lAse6EXFXQ9qibnOamdkqr0zgeFLSdkAA5DWl5tXaKjMza1u9TgDMPk2aSLejpLnAg6TlR8zMrB8qEzgiIvaXtB6wWkQ8J2mbuhtmZmbtqUxX1VUAEfFCRDyX067sJb+Zma3CejzjkLQjafn0DRtmig8kTQo0M7N+qLeuqreRZohvRJot3uU54FN1NsrMzNpXj4EjIq4FrpW0d0T8qbhP0vtqb5mZmbWlMmMc3+sm7QetboiZma0cehvjeA/wXmAzSV8o7BpIuTsAmpnZKqi3MY41gfVzng0K6c/yxrLmZmbWz/Q2xnELcIukCyPiYUnrRsSLfdg2MzNrQ2XGOLaQNAP4G4Ckd0r6Ub3NMjOzdlV2cPxDpJs5ERH3AnvX2SgzM2tfZQIHETGnIWlxDW0xM7OVQJm1quZIei8QktYAPku6FWy/MGTMb5rmeejMj/ZBS8zM2kOZM44TSSvkDgLmArvm7aYkDZc0U9IsSWO62b+3pCmSFuXl2rvSd5V0u6Tpku6TdGRh34WSHpR0T37sWqYtZmbWGk3POCLiSZZhGXVJA4CzgQOATmCipPERMaOQ7RHgeOCUhuIvAsdGxP2StgAmS5oQEU/n/V+MCC+0aGa2AvQ2AfAH5Js3dSciTm5S9+7ArIiYneu7DDgUeD1wRMRDed9rDXX/vfD8UUmPA5sBT2NmZitUb2cck5az7kFAcVC9E9ijaiWSdidNRnygkPwNSacBNwFjImJhN+VGAaMABg8eXPVlzcysB71NALyoLxvSHUmbA5cAx0VE11nJqcBjpGAyDvgSMLaxbESMy/vp6Ojo8czJzMyqKXU57jKaC2xV2N4yp5UiaSDwG+DLEXFHV3pEzItkIXABqUvMzMz6SJ2BYyIwVNI2ktYERgLjyxTM+a8GLm4cBM9nIUgScBgwraWtNjOzXtUWOCJiETAamECa93FFREyXNFbSIQCShknqBA4HzpE0PRc/gjQ7/fhuLru9VNJUYCqwKXBGXcdgZmZLq/OqKiLieuD6hrTTCs8nkrqwGsv9DPhZD3Xu1+x1zcysPnVeVWVmZqugtr6qyszM2k9vXVXX0XtX1SG1tMjMzNpab11V3+mzVpiZ2Uqj2R0AzczMltB0kUNJQ4FvAjsBa3elR8S2NbbLzMzaVJl5HBcAPwYWAfsCF9PDpbJmZrbqKxM41omImwBFxMMRcTrgOxeZmfVTZe4AuFDSasD9kkaT1ptav95mmZlZuypzxvFZYF3gZGA34GjguDobZWZm7avMHQAn5qfPA/9Sb3PMzKzd1bk6rpmZrYIcOMzMrJIeA4eks/K/h/ddc8zMrN31dsbxkXyzpFP7qjFmZtb+ehsc/y3wFLC+pGcBkRY9FBARMbAP2mdmZm2mxzOOiPhiRGwE/CYiBkbEBsV/+7CNZmbWRspcjnuopLcAw3LSnRHxRL3NMjOzdtX0qqo8OH4X6b7gRwB3SRpRpnJJwyXNlDRL0phu9u8taYqkRY11SjpO0v35cVwhfTdJU3Od38/jMGZm1kfKLDnyFWBYRDwOIGkz4PfAlb0VkjQAOBs4AOgEJkoaHxEzCtkeAY4HTmkouzHwNaCDNK4yOZd9irTg4qeAO0n3Mx8O3FDiOMzMrAXKzONYrStoZPNLltsdmBURsyPiFeAy4NBihoh4KCLuA15rKPsh4HcRsSAHi98BwyVtDgyMiDsiIkgr9R5Woi1mZtYiZc44fitpAvCLvH0k6Zd+M4OAOYXtTmCPku3qruyg/OjsJn0pkkYBowAGDx5c8mXNzKyZMoPjX5T0z8D7c9K4iLi63mYtv4gYB4wD6Ojo6PHe6WZmVk2ZMw4i4lfAryrWPRfYqrC9ZU4rW3afhrJ/zOlbLmOdZmbWAnWuVTURGCppG0lrAiOB8SXLTgAOlPQmSW8CDgQmRMQ84FlJe+arqY4Frq2j8WZm1r3aAkdELAJGk4LAX4ErImK6pLGSDgGQNExSJ+lS33MkTc9lFwD/RQo+E4GxOQ3gJOCnwCzgAXxFlZlZnyrVVSVpHWBwRMysUnlEXE/DQHpEnFZ4PpElu56K+c4Hzu8mfRKwS5V2mJlZ65SZAHgwcA9p7Sok7SqpbJeTmZmtYsp0VZ1OmpPxNEBE3ANsU2ObzMysjZUJHK9GxDMNab681cysnyozxjFd0ieAAZKGAicDf6m3WWZm1q7KnHF8BtgZWEiaPf4s8Lk6G2VmZu2rzMzxF4Ev54eZmfVzTQOHpOtYekzjGWAScE5EvFxHw8zMrD2VGeOYDWzGkoscPgfsAJwLHFNP01ZOQ8b8pmmeh878aB+0xMysHmUCx3sjYlhh+zpJEyNiWNdMbzMz6z/KDI6vL+n1dcnz8/Xz5iu1tMrMzNpWmTOO/wD+LOkBQKTJfydJWg+4qM7GmZlZ+ylzVdX1ef7GjjlpZmFA/Hu1tczMzNpSqUUOgaHA24C1gXdKIiIurq9ZZmbWrspcjvs10k2VdiKtdPth4M+k+32bmVk/U2ZwfATwQeCxiPgX4J3AhrW2yszM2laZrqqXIuI1SYskDQQeZ8lbwtpy8LwPM1vZlAkckyRtRJrsNxl4Hri91laZmVnbKnNV1Un56U8k/RYYGBH31dssMzNrV2XuAHhT1/OIeCgi7iumNSk7XNJMSbMkjelm/1qSLs/775Q0JKcfJemewuM1SbvmfX/MdXbte3PZgzUzs+XX4xmHpLWBdYFNJb2JNPkPYCAwqFnFkgYAZwMHAJ3AREnjI2JGIdsJwFMRsb2kkcBZwJERcSlwaa7nHcA1+c6DXY7K9x43M7M+1ltX1b+R7ruxBWlsoytwPAv8sETduwOzImI2gKTLgEOBYuA4lHRrWoArgR9KUkQUV+P9OHBZidczM7M+0GPgiIj/Bf5X0mci4gfLUPcgYE5huxPYo6c8EbFI0jPAJsCThTxHkgJM0QWSFgNXAWc0BBoAJI0CRgEMHjy4cfdKqcwVWOCrsMysXmUGx38g6b3AkGL+vpg5LmkP4MWImFZIPioi5kragBQ4jqGbyYgRMQ4YB9DR0eF7pJuZtUiZmeOXANsB9wCLc3LQfOb4XJac77FlTusuT6ek1UkTC+cX9o/kjfuApBeOmJv/fU7Sz0ldYp7F3g2foZhZHcrM4+gAduquO6iJicBQSduQAsRI4BMNecYDx5HmhYwAbu56HUmrAUcAe3VlzsFlo4h4UtIawEHA7yu2y8zMlkOZwDENeCswr0rFecxiNDABGACcHxHTJY0FJkXEeOA84BJJs4AFpODSZW9gTtfgerYWMCEHjQGkoHFulXaZmdnyKRM4NgVmSLoLWNiVGBGHNCsYEdeTFkYspp1WeP4ycHgPZf8I7NmQ9gKwW4k2m5lZTcoEjtPrboSZma08ylxVdYukrYGhEfF7SeuSuonMzKwfKrPkyKdIk/POyUmDgGvqbJSZmbWvMl1VnyZd8nonQETc7/WhVk1e4t3MyihzI6eFEfFK10a+JNYT6szM+qkygeMWSf8JrCPpAOCXwHX1NsvMzNpVma6qMaRVbKeSFj68HvhpnY2ylYO7tsz6pzKBYx3S5L1z4fXl0tcBXqyzYWZm1p7KBI6bgP1Jt4yFFDRuBN5bV6Ns1eQzFLNVQ5kxjrUjoitokJ+vW1+TzMysnZUJHC9IenfXhqTdgJfqa5KZmbWzMl1VnwV+KelR0l0A30q6uZKZmfVDvQaOPBC+F7Aj8LacPDMiXq27Yda/+V4iZu2r166qiFgMfDwiXo2IafnhoGFm1o+V6aq6TdIPgcuBF7oSI2JKba0yM7O2VSZw7Jr/HVtIC2C/1jfHzMzaXZll1ffti4aYmdnKocyy6m+RdJ6kG/L2TpJOqL9pZmbWjsrM47iQdN/wLfL234HPlalc0nBJMyXNkjSmm/1rSbo8779T0pCcPkTSS5LuyY+fFMrsJmlqLvN9SSrTFjMza40ygWPTiLgCeA0gIhYBi5sVypfyng18GNgJ+LiknRqynQA8FRHbA98FzirseyAids2PEwvpPwY+BQzNj+EljsHMzFqk7MzxTcj34JC0J/BMiXK7A7MiYna+n8dlwKENeQ4FLsrPrwQ+2NsZhKTNgYERcUdEBHAxcFiJtpiZWYuUCRxfAMYD20m6jfRl/ZkS5QYBcwrbnTmt2zz5TOYZYJO8bxtJd0u6RdJehfydTeoEQNIoSZMkTXriiSdKNNfMzMooc1XVFEkfIM0cF30zc3weMDgi5ue1sa6RtHOVCiJiHDAOoKOjw3csNDNrkaaBQ9LawEnA+0ndVbdK+klEvNyk6Fxgq8L2ljmtuzyd+Za0GwLzczfUQoCImCzpAWCHnH/LJnWamVmNynRVXQzsDPwA+GF+fkmJchOBoZK2kbQmMJLU5VU0HjguPx8B3BwRIWmzPLiOpG1Jg+CzI2Ie8KykPfNYyLHAtSXaYmZmLVJm5vguEVG8GuoPkmY0KxQRiySNJl3KO4B0F8HpksYCkyJiPHAecImkWcACUnAB2BsYK+lV0tVcJ0bEgrzvJNIlwusAN+SHmZn1kTKBY4qkPSPiDgBJewCTylQeEdeT7lFeTDut8Pxl4PBuyl0FXNVDnZOAXcq8vpmZtV6ZwLEb8BdJj+TtwcBMSVOBiIh/qq11ZmbWdsoEDk+wMzOz15W5HPfhvmiImZmtHMqccZi1Pd8x0KzvOHBYv1Qm0DjImHWvzDwOMzOz1zlwmJlZJQ4cZmZWiQOHmZlV4sBhZmaVOHCYmVklDhxmZlaJA4eZmVXiwGFmZpU4cJiZWSUOHGZmVokDh5mZVeLAYWZmldQaOCQNlzRT0ixJY7rZv5aky/P+OyUNyekHSJosaWr+d79CmT/mOu/JjzfXeQxmZrak2pZVlzQAOBs4AOgEJkoaHxEzCtlOAJ6KiO0ljQTOAo4EngQOjohHJe0CTAAGFcodle89btYnvAy72RvqvB/H7sCsiJgNIOky4FCgGDgOBU7Pz68EfihJEXF3Ic90YB1Ja0XEwhrba9YyDjS2Kquzq2oQMKew3cmSZw1L5ImIRcAzwCYNeT4GTGkIGhfkbqqvSlJrm21mZr1p6zsAStqZ1H11YCH5qIiYK2kD4CrgGODibsqOAkYBDB48uA9aa7ZsfNtbW9nUGTjmAlsVtrfMad3l6ZS0OrAhMB9A0pbA1cCxEfFAV4GImJv/fU7Sz0ldYksFjogYB4wD6OjoiBYdk9kK50BjK1qdXVUTgaGStpG0JjASGN+QZzxwXH4+Arg5IkLSRsBvgDERcVtXZkmrS9o0P18DOAiYVuMxmJlZg9rOOCJikaTRpCuiBgDnR8R0SWOBSRExHjgPuETSLGABKbgAjAa2B06TdFpOOxB4AZiQg8YA4PfAuXUdg9mqwAP11mq1jnFExPXA9Q1ppxWevwwc3k25M4Azeqh2t1a20cyW5EBjzbT14LiZtT8Hmv7HS46YmVklDhxmZlaJA4eZmVXiMQ4z6zOeg7JqcOAws7blQNOe3FVlZmaVOHCYmVklDhxmZlaJA4eZmVXiwXEzW2V4Fnvf8BmHmZlV4sBhZmaVOHCYmVklDhxmZlaJB8fNrN/yYPqy8RmHmZlV4sBhZmaVOHCYmVkltQYOScMlzZQ0S9KYbvavJenyvP9OSUMK+07N6TMlfahsnWZmVq/aBsclDQDOBg4AOoGJksZHxIxCthOApyJie0kjgbOAIyXtBIwEdga2AH4vaYdcplmdZma1qDKYviovCV/nVVW7A7MiYjaApMuAQ4Hil/yhwOn5+ZXADyUpp18WEQuBByXNyvVRok4zs5VO1UCzIq8IU0TUU7E0AhgeEZ/M28cAe0TE6EgVF/gAAAh+SURBVEKeaTlPZ95+ANiDFEzuiIif5fTzgBtysV7rLNQ9ChiVN98GzGzh4W0KPFlT/jrrbrf87dSWuvO3U1vqzt9Obamav53a0hf5m9k6IjZrTFxl53FExDhgXB11S5oUER115K+z7nbL305tqTt/O7Wl7vzt1Jaq+dupLX2Rf1nVOTg+F9iqsL1lTus2j6TVgQ2B+b2ULVOnmZnVqM7AMREYKmkbSWuSBrvHN+QZDxyXn48Abo7UdzYeGJmvutoGGArcVbJOMzOrUW1dVRGxSNJoYAIwADg/IqZLGgtMiojxwHnAJXnwewEpEJDzXUEa9F4EfDoiFgN0V2ddx9CLql1gVfLXWXe75W+nttSdv53aUnf+dmpL1fzt1Ja+yL9MahscNzOzVZNnjpuZWSUOHGZmVokDxzKQ9JcSeYbkeSr9Ql8cr6STJf1V0qV1vk6TNizTcZb5zPRnkk6XdMoKbsNGkk5akW0okvT8im5DTxw4lkFEvHdFt2FVoqTMZ/Ek4ICIOKruNrWaPzMrhY1InzFrwoFjGVT4JbC6pEvzr+QrJa3bS53DJN0naW1J60maLmmXXvIPkfS3CvV/NS8O+WdJv2j2607Ssbk990q6pNzhvl52W0l3SxrWpP0zJV0MTGPJ+Tnd5f8JsC1wg6TPN8l7jaTJ+T0c1VvenP9oSXdJukfSOXmdtd4MkHRurv9GSeuUeI1Sn5n8f/+b/L5Pk3RkL3nHSvpcYfsbkj7bpP4vS/p7mc9B/j/6a7NjLXwWL8x1Xyppf0m3Sbpf0u491P96W0irO/RK0hfyezKteNy95D8x/5/eI+lBSX9oUuRMYLuc/9u91PtFSSfn59+VdHN+vl93Z8OSzpT06cJ2S8+uGs+CJZ0i6fRW1d+tiPCj4gN4vkSeIUAA78vb5wOnNClzBvAd0kKOp7aqfmAYcA+wNrABcH9vbSEtLvl3YNO8vXHJ451G+gK4G3hnifyvAXtWeN8f6mpTk3wb53/XyW3apJe8bweuA9bI2z8Cjm3S7kXArnn7CuDoVnxmcr6PAecWtjds0pYp+flqwANNjnU3YCqwLjAQmNXkc1DqWAv53pHbMTl/HrvWnbumBW3pyr8esD4wHXhXyfd0DeBW4OAyn+ES9e0J/DI/v5U0x2wN4GvAv3WT/13ALYXtGcBWLfzMLNFu4BTg9DJll/XhM456zYmI2/LznwHvb5J/LGnl3w7gWy2s/33AtRHxckQ8R/qi7M1+pD+MJwEiYkGJtgBsBlwLHBUR95bI/3BE3FGy7ipOlnQvcAfpTGZoL3k/SPpSmijpnry9bZP6H4yIe/LzyaQ/3FaZChwg6SxJe0XEMz1ljIiHgPmS3gUcCNwdEfN7qXsv4OqIeDEinqXc5Nmyx/pgREyNiNdIX+o3RfoWm9pDmapteX/O/0JEPA/8KtdRxv+SJhc3+9yXNRnYTdJAYCFwO+lvdi9SIFlCRNwNvFnSFpLeSVoRfE6L2rJCrLJrVbWJxkkyzSbNbEL6NbUG6ezghRbXX7dngEdIf+RlVixudnyVSdoH2B94T0S8KOmPpPeyxyLARRFxaoWXWVh4vph0ZtMSEfF3Se8GPgKcIemmiBjbS5GfAscDbyX9ym+1ssdazPdaYfs1VuD3jKTjga2BpRZCXVYR8aqkB0nv+1+A+4B9ge2Bv/ZQ7Jek1THeClzeqrZki1hy2KG3z3tL+IyjXoMlvSc//wTw5yb5zwG+ClxKujdJq+q/DTg4j5+sDxzUpN6bgcMlbQIgaeMSbQF4Bfg/wLGSPlGyTKttSPpF96KkHUndCr25CRgh6c2QjlXS1nU3sieStgBejLQy9LeBdzcpcjUwnNQdOaFJ3j8Bh0laR9IGwMHL297lULUtt+b860paj/Q5W+rXfZGk3UjdNkfnM6FmniN15ZZxa677T/n5iaQzvp5+vF1OWhljBCmItNI/SGc0m0hai+Z/38vNZxzLpuwv+5nApyWdT/oF/uOeMko6Fng1In6eB2f/Imm/iLh5eeuPiImSxpN+Gf2D1H3QWxfIdEnfAG6RtJg0ZnF8L+0oln1B0kHA7yQ9H2lpmb70W+BESX8lvT+9doVFxAxJXwFuVLqy61Xg08DDtbe0e+8Avi3ptdyWf+8tc0S8kgd9n468LE8veadIuhy4F3ictPbbClG1LTn/haTxBICf5i6g3owGNgb+IAnSUkef7OU15ucB/WnADRHxxV7qvhX4MnB7/sy/TC+BLP9NbQDMjYh5TdpdST4DGkt6b+YCf2tl/d3xkiMV5V/hUyJihf0qze0YAvw6Inq88qoh//oR8bzSlVd/AkZFxJQam2h9IAe7KcDhEXF/xbKnkwZgv1NH22zV5a6qCnI3wu2kK59WNuPy4O8U4CoHjZWf0i2WZ5EGoisFDbPl4TMOMzOrxGccZmZWiQOHmZlV4sBhZmaVOHCYtVCzNanUsAJrXmdoRc15MVsmDhxmfatxBdYhpMmbpUny/CtboRw4zGqSV1GdqLTK8NdzcuMKrGcCe+Xtz0saIOnbhXL/luvaR9KteSJnmeVczGrjXy5mNZB0IGlxxd1J62GNl7Q3MAbYJSJ2zfn2Ia0Ke1DeHgU8ExHD8vIRt0m6MVf77lz2wb49GrMlOXCY1ePA/OhaFmN9UiB5pES5f5I0Im9vmMu9AtzloGHtwIHDrB4CvhkR5yyRmJaKaVbuMxGxxIKF+cyk5asJmy0Lj3GY1WMC8K95NWIkDcor8DauwNq4PQH4d0lr5HI75NVgzdqGzzjMahARN0p6O3B7Xpn1edLy3g8UV2AF/hNYnG88dSHppkNDgClKBZ8ADlsBh2DWI69VZWZmlbiryszMKnHgMDOzShw4zMysEgcOMzOrxIHDzMwqceAwM7NKHDjMzKyS/w9DSDv2bSS+RAAAAABJRU5ErkJggg==",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"get_letter_distribution(encrypted_quote, plot=True, title='Encrypted Distribution');\n",
"get_letter_distribution(encrypted_quote, plot=True, title='Encrypted Distribution; Re-Ordered', zipfian=True);"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "i6j1ATgyLvaX"
},
"source": [
"As the above distribution shows us, even though each individual letter in the encrypted text no longer has the same occurance probability as it does in typical English text, the general shape of the re-ordered distribution is still the same. We can use this to build a first-pass remapping to attempt deciphering the encrypted text, substituting each encrypted letter with the English letter closest in usage frequency.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 17
},
"id": "7MG6KBzi7hit",
"outputId": "e34e2765-1a1f-4514-9145-0b190de67534"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"quote_dist = list(map(lambda x: x[0], Counter(re.sub(r'[^a-z]', '', encrypted_quote.lower())).most_common()))\n",
"english_dist = list(map(lambda x: x[0], Counter(re.sub(r'[^a-z]', '', words.lower())).most_common()))\n",
"reverse_cipher = dict(zip(quote_dist, english_dist))\n",
"for letter in alphabet:\n",
" reverse_cipher[letter.upper()] = reverse_cipher[letter].upper()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 329
},
"id": "kso6pkEXZeuE",
"outputId": "1ad4fc0d-8661-49fa-a657-a86a6a954228"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Fiabter 1\n",
"\n",
"\n",
"Snr Wadter Eddnot, oc Keddyhfi Iadd, nh Somersetsinre, was a mah wio,\n",
"cor ins owh amusemeht, hever took ub ahy pook put tie Parohetage;\n",
"tiere ie couhl offubatnoh cor ah nlde iour, ahl fohsodatnoh nh a\n",
"lnstressel ohe; tiere ins cafudtnes were rousel nhto almnratnoh ahl\n",
"resbeft, py fohtembdatnhg tie dnmntel remhaht oc tie eardnest batehts;\n",
"tiere ahy uhwedfome sehsatnohs, arnsnhg crom lomestnf accanrs\n",
"fiahgel haturaddy nhto bnty ahl fohtembt as ie turhel over\n",
"tie admost ehldess freatnohs oc tie dast fehtury; ahl tiere,\n",
"nc every otier deac were bowerdess, ie foudl real ins owh instory\n",
"wnti ah nhterest winfi hever candel. Tins was tie bage at winfi\n",
"tie cavournte vodume adways obehel:\n",
"\n",
" \"EDDNOT OC KEDDYHFI IADD.\n",
"\n",
"\"\n"
]
}
],
"source": [
"first_pass = decrypt(encrypted_quote, reverse_cipher)\n",
"print(first_pass[35:776])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DqPKao5xaJnj"
},
"source": [
"While our first pass definitely gets us closer to the original text, it's not quite good enough (unless you, like yours truly, happen to be both a Jane Austen fan and one accustomed to reading through heavy mispellings). \n",
"
\n",
"At this point, we'll want to use some basic facts about English to help us out. For example, we want to make sure that any single-letter words are either assigned to A or I. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 329
},
"id": "laJUmPcSPL5f",
"outputId": "982727da-ecb0-45ab-bed7-745fef887157"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Fnabter 1\n",
"\n",
"\n",
"Sir Wadter Eddiot, oc Keddyhfn Nadd, ih Somersetsnire, was a mah wno,\n",
"cor nis owh amusemeht, hever took ub ahy pook put tne Parohetage;\n",
"tnere ne couhl offubatioh cor ah ilde nour, ahl fohsodatioh ih a\n",
"listressel ohe; tnere nis cafudties were rousel ihto almiratioh ahl\n",
"resbeft, py fohtembdatihg tne dimitel remhaht oc tne eardiest batehts;\n",
"tnere ahy uhwedfome sehsatiohs, arisihg crom lomestif accairs\n",
"fnahgel haturaddy ihto bity ahl fohtembt as ne turhel over\n",
"tne admost ehldess freatiohs oc tne dast fehtury; ahl tnere,\n",
"ic every otner deac were bowerdess, ne foudl real nis owh nistory\n",
"witn ah ihterest wnifn hever caidel. Tnis was tne bage at wnifn\n",
"tne cavourite vodume adways obehel:\n",
"\n",
" \"EDDIOT OC KEDDYHFN NADD.\n",
"\n",
"\"\n"
]
}
],
"source": [
"reverse_cipher = fix_single_letter_words(encrypted_quote, reverse_cipher)\n",
"second_pass = decrypt(encrypted_quote, reverse_cipher)\n",
"print(second_pass[35:776])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "MC6_2FSeb8W5"
},
"source": [
"Intuitively, this text feels closer to English text than the previous two. But at this point, we might decide that we want a better metric to determine closeness to the original text than our gut feeling of what looks better.\n",
"
\n",
"As we go through the decrypted text (which is significantly larger than the single-paragraph sample displayed above), we can use a cannonical list of English words, then check what percentage of the words in our recovered text are true English words. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"id": "93peeSaFwrOj",
"outputId": "57dd7c76-eaf3-426c-f5a2-c380ea1ca427"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.29535548269005385 % of recovered words were valid English\n"
]
}
],
"source": [
"percent_real_before = get_percent_real(second_pass)\n",
"print(percent_real_before,'% of recovered words were valid English')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5X0JnV5QdbGY"
},
"source": [
"We can definitely do better than 30% -- in fact, the original text contains 81.7% accepted English words. In order to improve our decryption, we'll try swapping out letter mappings for letters that have the most similar probabilities in English (i.e. R and S both constitute approximately 7.2% of letters used in this corpus). If the percentage of accepted English words improves by making a substitution, we keep it. Let's watch this technique at work. *N.B. This make take a few minutes; feel free to make a nice cup of tea while the next cell runs!*"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "oxKSeu6d3h1k",
"outputId": "2ad83429-c92a-4671-8fd1-aea9925a6148"
},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'get_letter_distribution' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m#@markdown Run this to improve the deciphering! { display-mode: \"form\" }\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mc\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_letter_distribution\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msecond_pass\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mdists\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mones\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m26\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m26\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m26\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mj\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m+\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m26\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mNameError\u001b[0m: name 'get_letter_distribution' is not defined"
]
}
],
"source": [
"c = get_letter_distribution(second_pass)\n",
"dists = np.ones((26,26))\n",
"for i in range(26):\n",
" for j in range(i+1,26):\n",
" dists[i,j] = abs(c[alphabet[i]] - c[alphabet[j]])/sum(c.values())\n",
"percent_real = percent_real_before\n",
"indeces = np.argsort(dists.flatten())\n",
"while percent_real < 0.81:\n",
" for idx in indeces:\n",
" i,j = np.unravel_index(idx, (26,26))\n",
" L1, L2 = alphabet[i], alphabet[j]\n",
" updated = swap_keys(reverse_cipher, L1, L2)\n",
" unencrypted_quote = ''\n",
" for letter in encrypted_quote:\n",
" if letter in updated.keys():\n",
" unencrypted_quote += (updated[letter])\n",
" else:\n",
" unencrypted_quote += (letter)\n",
" reals = [word in english for word in re.sub(r'[^a-zA-Z ]', '', unencrypted_quote.lower()).split()]\n",
" percent_real = sum(reals)/len(reals)\n",
" if percent_real > percent_real_before+(0.82-percent_real_before)/4:\n",
" print('swap', L1.upper(), 'and',L2.upper(),'\\tpercent valid English words:', percent_real,'\\n\\n')\n",
" # print(re.sub(r'[^a-zA-Z ]', '', unencrypted_quote.lower())[35:776])\n",
" print(unencrypted_quote[35:776])\n",
" swap_keys_inplace(reverse_cipher, L1, L2)\n",
" percent_real_before = percent_real\n",
" break"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EnhDUjmsqueW"
},
"source": [
"And here we find outselves with the fully recovered opening paragraph of Jane Austen's timeless classic *Persuasion*, decrypted in a computationally feasible timeframe, based solely on our pre-existing knowledge of how frequently different letters are used in the English language.\n",
"
\n",
"Now that you (hopefully!) have a stronger intuition for frequency-based decryption techniques, my recommended reading is [A cryptography-based approach for movement decoding](https://www.nature.com/articles/s41551-017-0169-7) from Eva Dyer and her colleagues in Konrad Kording's lab, who used this frequency-based approach to decode motor activity from macaque neural recordings. "
]
}
],
"metadata": {
"colab": {
"name": "Introductory Cryptography for Neuro",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 0
}