MATLAB project on Cryptography |
|
Encoding messages
Probably you know the 'secret code' of writing B for A, C for B, D for C and so on.
'MATLAB' becomes 'NBUMBC', which looks scrambled enough. But even if you encode the message by an
arbitrary fixed shift of the letters, an attacker can decode it easily
- by 'brute-force-attack' - trying all possible 26 shifts or
- by 'frequency attack' - counting the most frequent letter, which is probably E,if the message is long enough.
It does not help at all just to use funny symbols instead of letters.
But for all commonly used methods of encryption, the letters of the message are converted into
numbers - eg A-0, B-1 and so on, and then the numbers are scrambled, and after unscrambling the
message is translated back to letters.
Let us write m for the message and c(m) for the coded message.
The simple shift mentioned above would just do c(m)=m+1.
What do you do with Z? Z becomes A. This is best thought of as arranging the 26 letters on a
circle and shifting them around on the circle. Mathematically, you can write
c(m):=m+1 mod 26
This does exactly c(0)=1, c(1)=2, c(2)=3, ... c(25)=0, because 'mod 26' or 'modulo 26'
means 'take the remainder after division by 26'. So eg 30 mod 26 is 4, 52 mod 26 is 0. The conversion from letters to numbers and back is done by my little MATLAB functions strtovec(m,L) and vectostr(v,L).
Here, L is the length of the message blocks. I allow not only letters, but also digits and a few special characters. This means I have to work modulo 43, not modulo 26, but the principle is the same.
Hill Ciphers
Of course, people have thought of many ingenuous ways of encoding messages over the centuries.
The idea of the so-called Hill ciphers
is to break the message into blocks of length L, after converting it into numbers.
The result is a list of vectors in RL. Then each vector m
is multiplied by an L x L - matrix A and
some constant vector b is added to it. So the coding function reads
c(m):=Am+b mod 26
The modulo 26 operation is carried out with every component of the vector Am+b.
I have programmed this operation for a sample matrix A and a sample vector b
(again, with more symbols, so modulo 43 rather than 26).
This is the MATLAB file to download for this function.
Problem 1 Download the various MATLAB sample files and save them in a directory where you want to store your whole project. Start MATLAB from there.
Write a MATLAB function decrypt(c,A,b,L) which decrypts a given coded message block c of length L,
supposing it was coded using a Hill cipher with matrix A and vector b.
This is just as easy as the function for encrypting.
Test your function on the 3 coded messages
4N FU TB
which have been encrypted using
A defined by
and b=[1,1]t. You can download A and b here.
The decrypted message should be 'MA TL AB' - MATLAB broken up into blocks of size 2.
Remark. In order to be able to decrypt messages, your transformation c has to be invertible.
In particular, it is important that the function c is one-to-one, which amounts to
the matrix A having kernel zero.
You know that for real matrices, you have to have a nonzero determinant for this.
However, here we have a modulo operation. It is a fact that the coding function c is one-to-one if
det(A) is not zero modulo 43 (this is fairly hard to prove - not part of this project).
Hill Cipher for money transfer messages
Banks have to send thousands of messages about money transfers between them.
If somebody writes out a check to you which is presented to bank F, then a
message would be sent from bank F to bank G (your bank) to put this amount into your account, after the check has cleared.
Obviously, there has to be a rigid standard for these messages. The following is a simplified version of an actual message standard.
Each message block is a line, describing one money transaction.
Each line contains the following items (behind every item is its length in parentheses, followed by a comment).
- 'A1' (2, for no apparent reason>
- Date (6, yymmdd)
- Zeros (10, perhaps for future use)
- Payee name (20)
- Payee account number (10)
- Payment amount in cents (10)
- Routing number of bank F (10)
- Memo (7)
- 'X' (1, no apparent reason)
Empty spaces are filled by zeros, not blanks (this makes conversion to numbers
much easier).
For simplicity, let's assume each message has just one block (one line).
Here are some cleartext examples.
Messages are converted into vectors in R76, encrypted, sent to bank G, decrypted and converted back to letters and digits.
Differential attack on Hill ciphers
Does this system look secure to you?
It is not - there is a simple way to cheat the system!
Suppose you know the format of the messages and the general method of encryption - your bank clerk cousin Charlie told you all about the standard above (length=76), and they use a Hill cipher).
But even Charlie does not know the actual matrix A and not the vector b.
You are able to intercept the encrypted messages from bank F to bank G.
Now you open an account at bank F, too. You present 4 checks to bank F, made out to yourself at bank G, over say $13, $10, $100 and $20 - ie you make sure that bank F sends messages to bank G
causing $13, 10, 100 and 20 to be put into your account there.
You intercept these encrypted messages e1, e2, e3, e4.
You know how to transform these into vectors v1, v2, v3, v4.
You also know the original messages m1, m2, m3, m4 and you know how to transform these into vectors u1, u2, u3, u4.
Moreover, you know
v1=A*u1 + b, v2=A*u2+b, v3=A*u3 + b, v4=A*u4+b
but you know neither A nor b! So far, everything was completely legal ...
Problem 2: how can you produce a message e5, which will make bank G believe it comes from bank F and make it put $999 into your account? No guessing, please. I will assume you are called 'John R Doe' and your account number is 12345678, routing number 1234567890 in the messages.
Into 'Memo', I put 'Candy'. Hints:
- Take differences!
- Take care distinguishing the number 3 from the string '0000000003'.
- Strictly speaking, you do not need u1, u2, u3, u4 at all.
- You also do not need to find A or b at all! This is lucky, because it would be actually impossible. Just use the vectors v1..v4 to construct a vector v5.
- Can you construct a valid encrypted vector z that tells Bank G to depose ZERO dollars into your account? Of course this won't help you financially, but it might help mathematically. Note that z is not the zero vector.
- What then does v1-v2 actually mean?