Monoalphabetic Ciphers

With only 25 possible keys, the Caesar cipher is far from secure.A dramatic increase in the key space can be achieved by allowing an arbitrary substitution. Before proceeding we define the term permutation.A permutation of a finite set of elements $S$ is an ordered sequence of all the elements of $S$, with each element appearing exactly once. For example, if $S=\{a,b,c\}$, there are six permutations of  $S$:

$$\{abc, acb, bac, bca, cab, cba\}$$

In general, there are $n!$ permutations of a set of elements, because the first element can be chosen in one of $n$ ways, the second in $n-1$ ways, the third in $n-2$ ways, and so on.

Recall the assignment for the Caesar cipher:

plain:    a b c d e f g h i j k l m n o p q r s t u v w x y z
cipher: D E F G H I J K L M N O P Q R S T U V W X Y Z A B C

If, instead, the “cipher” line can be any permutation of the 26 alphabetic characters, then there are 26! or greater than $4 \times 10^{26}$ possible keys.This is 10 orders of magnitude greater than the key space for DES and would seem to eliminate brute-force techniques for cryptanalysis. Such an approach is referred to as a monoalphabetic substitution cipher, because a single cipher alphabet (mapping from plain alphabet to cipher alphabet) is used per message.

There is, however, another line of attack. If the cryptanalyst knows the nature of the plaintext (e.g., noncompressed English text), then the analyst can exploit the regularities of the language. To see how such a cryptanalysis might proceed, we give a partial example 

The ciphertext to be solved is

UZQSOVUOHXMOPVGPOZPEVSGZWSZOPFPESXUDBMETSXAIZ
VUEPHZHMDZSHZOWSFPAPPDTSVPQUZWYMXUZUHSX
EPYEPOPDZSZUFPOMBZWPFUPZHMDJUDTMOHMQ

As a first step, the relative frequency of the letters can be determined and compared to a standard frequency distribution for English, such as is shown in Figure below. If the message were long enough, this technique alone might be sufficient, but because this is a relatively short message, we cannot expect an exact match. In any case, the relative frequencies of the letters in the ciphertext (in percentages) are as follows:


Comparing this breakdown with Figure , it seems likely that cipher letters $P$ and $Z$ are the equivalents of plain letters $e$ and $t$, but it is not certain which is which.The letters $S,U,O, M$, and $H$ are all of relatively high frequency and probably correspond to plain letters from the set $\{a, h, i, n, o, r, s\}$. The letters with the lowest frequencies (namely,$A, B,G,Y, I, J$) are likely included in the set $\{b, j, k, q, v, x, z\}$.

There are a number of ways to proceed at this point.We could make some tentative assignments and start to fill in the plaintext to see if it looks like a reasonable “skeleton” of a message.A more systematic approach is to look for other regularities.For example, certain words may be known to be in the text. Or we could look for repeating sequences of cipher letters and try to deduce their plaintext equivalents.

A powerful tool is to look at the frequency of two-letter combinations, known as digrams.A table similar to Figure  could be drawn up showing the relative frequency of digrams.The most common such digram is $th$. In our ciphertext, the most common digram is $ZW$, which appears three times. So we make the correspondence of $Z$ with $t$ and $W$ with $h$. Then, by our earlier hypothesis, we can equate $P$ with $e$.
Now notice that the sequence $ZWP$ appears in the ciphertext, and we can translate that sequence as $“the”$.This is the most frequent trigram (three-letter combination) in English, which seems to indicate that we are on the right track.Next, notice the sequence $ZWSZ$ in the first line.We do not know that these four letters form a complete word, but if they do, it is of the form $th_t$. If so, $S$ equates with $a$.

So far, then, we have



Only four letters have been identified, but already we have quite a bit of the message. Continued analysis of frequencies plus trial and error should easily yield a solution from this point.The complete plaintext, with spaces added between words, follows:

it was disclosed yesterday that several informal but
direct contacts have been made with political
representatives of the viet cong in moscow

Monoalphabetic ciphers are easy to break because they reflect the frequency data of the original alphabet. A countermeasure is to provide multiple substitutes, known as homophones, for a single letter. For example, the letter $e$ could be assigned a number of different cipher symbols, such as 16, 74, 35, and 21, with each homophone assigned to a letter in rotation or randomly. If the number of symbols assigned to each letter is proportional to the relative frequency of that letter, then single-letter frequency information is completely obliterated. The great mathematician Carl Friedrich Gauss believed that he had devised an unbreakable cipher using homophones.However, even with homophones, each element of plaintext affects only one element of ciphertext, and multiple-letter patterns (e.g., digram frequencies) still survive in the ciphertext, making cryptanalysis relatively straightforward.

Two principal methods are used in substitution ciphers to lessen the extent to which the structure of the plaintext survives in the ciphertext: One approach is to encrypt multiple letters of plaintext, and the other is to use multiple cipher alphabets.

Comments

Popular posts from this blog

Cryptographic Algorithms CST 393 KTU CS Honour Notes Semester V -Dr Binu V P

Data Encryption Standard - DES Algorithm

OSI Security Architecture