Skip to main content

Bayes, Bletchley, JN-25 and a 'modern' optimization

In the current edition (June 2010) of significance (the magazine of the Royal Statistical Society) there's an article titled Edward Simpson: Bayes at Bletchley Park. Simpson is the man behind Simpson's Paradox (which, if you have not heard of it, you should immediately go and read about because it's... weird). He also worked at Bletchley Park during the Second World War.

I found the article fascinating for three reasons: it's about Bletchley Park, it's about Bayes Theorem (which I've spent quite a lot of time dealing with) and it highlights an optimization that was done to enable WRENs to work faster, yet is still in use today.

The article talks about life at Bletchley Park, and the breaking of two codes: Enigma and the Japanese JN-25. So much has been written about Enigma that I have little more to say about it, but JN-25 is a different matter. Its breaking involved a bit of Bayes and ended up helping the Allied victory at the Battle of Midway.


This Japanese Naval code works as follows. The Japanese word to be enciphered is looked up in a book where a corresponding 5 digit number is found. This number is the input to the encipherment. For example, the Japanese word for weather might be encoded as 30525. To help prevent transcription errors the Japanese code numbers were all divisible by three. Thus it would be easy to check a message before enciphering it (especially since it's easy to check if a number is divisible by 3).

So the message would consist of blocks of five-digit numbers. Next a book containing random five digit numbers was used to select a sequence of numbers that would be used to encipher the message. For each five digit group in the message, a five digit number from the book (in sequence) would be chosen.

Simpson's article has the following example:

Corresponding code numbers: 70863 34131 30525
Secret sequence from code book: 58304 68035 91107

The original five digit code representing a word would be added to the secret five digit number digit by digit with no carry (a form of modulo arithmetic). For example, "weather" might be 30525 and the chosen secret number 91107 and so the enciphered word would be 21622. The recipient would reverse the addition to get at the message.

That's a very secure system if the secret numbers are truly random and if they are not reused. In fact the Japanese secret numbers were not randomly used. The Japanese sent multiple messages using the same sequences of random numbers.

In addition, by cracking messages, Bletchley Park was able to build up a table of code numbers and their frequency of occurrence. That frequency of occurrence made an application of Bayes theorem possible.

Here's how it worked. A code breaker would be given a set of messages all known to have been enciphered with the same stretch of secret numbers. They might have between 2 and 20 messages. Suppose that in one message there's "good" (34131) enciphered as 92166 using the secret number 68035 and in another there's "stop" (50418) enciphered with the same secret number as 18443. The code breaker has in front of them 18443 and 92166 and by lining up messages on the table has columns of numbers enciphered with the same, unknown secret numbers.

The difference between 18443 and 92166 (using the modulo arithmetic) will be 26387. This is in fact the difference between the original code words 34131 and 50418: the secret number 68035 has been eliminated.

What the folks at Bletchley Park did was build themselves a big table of differences between the code words. The code breaker would look up 26387 in the book to see if this difference was known (indicating that the two underlying words had been identified and they would have been present in the book as 34131 and 50418). If 26387 appeared in the book then the actual code numbers (34131 and 50418) could be subtracted from the enciphered numbers 18443 and 92166 to reveal the secret number 68035 that was used to encipher the message.

Since they had many messages to work with they would actually check all pairs of numbers. For each pair, a difference is calculated and the resulting number looked up in the book to see if it exists.

Now for Bayes

On the table in front of the code breaker are a set of messages. They would take the number 68035 and subtract it from each enciphered number in the same column to reveal the underlying code number. If any of the numbers was not divisible by three then the decipherment had failed, but if not the next stage was undertaken.

Since the code numbers represented actual words they had a pattern of frequency of occurrence. So, the code breaker would take the list of numbers from the column they had deciphered and look up their frequency in a book. Suppose the code breaker has numbers a, b, c, ... and finds that a and c are in the book but b is not (perhaps because it's wrong or perhaps because the five digit code has never been seen before). The book gives P(a) and P(c) (the probability of those words occurring based on their frequency), the probability that those five digit groups occurred randomnly (actually 3/100000) was called P. Thus the Bayesian evidence given by a occurring is P(a)/P (and similarly for c).

For the word b that hasn't been seen before (or may be wrong) a frequency of occurrence of 1 was set which meant that P(b) was tiny and hence b was essentially ignored.

Then Bayes Theorem was used to multiply together P(a)/P and P(c)/P to get a probability that the number 68035 was the real underlying secret word used to encipher the numbers. If that probability was above a certain threshold the number 68035 was taken as genuine and the decipherment could continue.


There are three things that struck me in Simpson's explanation. Firstly, the prior probability is ignored in the Bayesian calculation. This sort of ignoring the priors happens in Naive Bayesian text classifiers where the prior probability of a 'document' or 'text' is irrelevant: it just acts as a scale factor that can be ignored. I did exactly that in the implementation of POPFile.

Secondly, the handling of the 'unknown' code numbers (b in the example above) is very similar to the way in which POPFile handles unknown words. A very small (close to 0, but non-zero) probability is chosen indicating that this word probability doesn't exist, but just might do.

The other optimization that occurs in POPFile is that rather than multiplying Bayes factors together, the logs of Bayes factors are added together. This is fast and prevents arithmetic underflow when dealing with tiny probabilities. Recall that log AB = log A + log B and log A/B = log A - log B. This is exactly the same trick that makes slide rules work (sliding the rule is 'adding' on a logrithmic scale).

This technique is common (I certainly didn't invent it) and was used at Bletchley Park. Instead of having WRENs do laborious multiplications and divisions of Bayes factors a book of scores was created consisting of the log of the Bayes factor for each code number. Checking whether the code numbers passed the Bayes test to reveal a useable secret number was thus a matter of looking up and adding these factors and seeing if the sum passed a threshold value.

PS If you have access to significance the rest of the article is well worth reading.


Popular posts from this blog

Your last name contains invalid characters

My last name is "Graham-Cumming". But here's a typical form response when I enter it:

Does the web site have any idea how rude it is to claim that my last name contains invalid characters? Clearly not. What they actually meant is: our web site will not accept that hyphen in your last name. But do they say that? No, of course not. They decide to shove in my face the claim that there's something wrong with my name.

There's nothing wrong with my name, just as there's nothing wrong with someone whose first name is Jean-Marie, or someone whose last name is O'Reilly.

What is wrong is that way this is being handled. If the system can't cope with non-letters and spaces it needs to say that. How about the following error message:

Our system is unable to process last names that contain non-letters, please replace them with spaces.

Don't blame me for having a last name that your system doesn't like, whose fault is that? Saying "Your last name …

Importing an existing SSL key/certificate pair into a Java keystore

I'm writing this blog post in case anyone else has to Google that. In Java 6 keytool has been improved so that it now becomes possible to import an existing key and certificate (say one you generated outside of the Java world) into a keystore.

You need: Java 6 and openssl.

1. Suppose you have a certificate and key in PEM format. The key is named host.key and the certificate host.crt.

2. The first step is to convert them into a single PKCS12 file using the command: openssl pkcs12 -export -in host.crt -inkey host.key > host.p12. You will be asked for various passwords (the password to access the key (if set) and then the password for the PKCS12 file being created).

3. Then import the PKCS12 file into a keystore using the command: keytool -importkeystore -srckeystore host.p12 -destkeystore host.jks -srcstoretype pkcs12. You now have a keystore named host.jks containing the certificate/key you need.

For the sake of completeness here's the output of a full session I performe…

More fun with toys: the Ikea LILLABO Train Set

As further proof of my unsuitability to be a child minder (see previous post) I found myself playing with an Ikea LILLABO 20-piece basic set train.

The train set has 16 pieces of track (12 curves, two straight pieces and a two part bridge) and 4 pieces of train. What I wondered was... how many possible looping train tracks can be made using all 16 pieces?

The answer is... 9. Here's a picture of the 9 different layouts.

The picture was generated using a little program written in Processing. The bridge is red, the straight pieces are green and the curves are blue or magenta depending on whether they are oriented clockwise or anticlockwise. The curved pieces can be oriented in either way.

To generate those layouts I wrote a small program which runs through all the possible layouts and determines which form a loop. The program eliminates duplicate layouts (such as those that are mirror images of each other).

It outputs a list of instructions for building loops. These instructions con…