Skip to main content

Bayes, Bletchley, JN-25 and a 'modern' optimization

In the current edition (June 2010) of significance (the magazine of the Royal Statistical Society) there's an article titled Edward Simpson: Bayes at Bletchley Park. Simpson is the man behind Simpson's Paradox (which, if you have not heard of it, you should immediately go and read about because it's... weird). He also worked at Bletchley Park during the Second World War.

I found the article fascinating for three reasons: it's about Bletchley Park, it's about Bayes Theorem (which I've spent quite a lot of time dealing with) and it highlights an optimization that was done to enable WRENs to work faster, yet is still in use today.

The article talks about life at Bletchley Park, and the breaking of two codes: Enigma and the Japanese JN-25. So much has been written about Enigma that I have little more to say about it, but JN-25 is a different matter. Its breaking involved a bit of Bayes and ended up helping the Allied victory at the Battle of Midway.


This Japanese Naval code works as follows. The Japanese word to be enciphered is looked up in a book where a corresponding 5 digit number is found. This number is the input to the encipherment. For example, the Japanese word for weather might be encoded as 30525. To help prevent transcription errors the Japanese code numbers were all divisible by three. Thus it would be easy to check a message before enciphering it (especially since it's easy to check if a number is divisible by 3).

So the message would consist of blocks of five-digit numbers. Next a book containing random five digit numbers was used to select a sequence of numbers that would be used to encipher the message. For each five digit group in the message, a five digit number from the book (in sequence) would be chosen.

Simpson's article has the following example:

Corresponding code numbers: 70863 34131 30525
Secret sequence from code book: 58304 68035 91107

The original five digit code representing a word would be added to the secret five digit number digit by digit with no carry (a form of modulo arithmetic). For example, "weather" might be 30525 and the chosen secret number 91107 and so the enciphered word would be 21622. The recipient would reverse the addition to get at the message.

That's a very secure system if the secret numbers are truly random and if they are not reused. In fact the Japanese secret numbers were not randomly used. The Japanese sent multiple messages using the same sequences of random numbers.

In addition, by cracking messages, Bletchley Park was able to build up a table of code numbers and their frequency of occurrence. That frequency of occurrence made an application of Bayes theorem possible.

Here's how it worked. A code breaker would be given a set of messages all known to have been enciphered with the same stretch of secret numbers. They might have between 2 and 20 messages. Suppose that in one message there's "good" (34131) enciphered as 92166 using the secret number 68035 and in another there's "stop" (50418) enciphered with the same secret number as 18443. The code breaker has in front of them 18443 and 92166 and by lining up messages on the table has columns of numbers enciphered with the same, unknown secret numbers.

The difference between 18443 and 92166 (using the modulo arithmetic) will be 26387. This is in fact the difference between the original code words 34131 and 50418: the secret number 68035 has been eliminated.

What the folks at Bletchley Park did was build themselves a big table of differences between the code words. The code breaker would look up 26387 in the book to see if this difference was known (indicating that the two underlying words had been identified and they would have been present in the book as 34131 and 50418). If 26387 appeared in the book then the actual code numbers (34131 and 50418) could be subtracted from the enciphered numbers 18443 and 92166 to reveal the secret number 68035 that was used to encipher the message.

Since they had many messages to work with they would actually check all pairs of numbers. For each pair, a difference is calculated and the resulting number looked up in the book to see if it exists.

Now for Bayes

On the table in front of the code breaker are a set of messages. They would take the number 68035 and subtract it from each enciphered number in the same column to reveal the underlying code number. If any of the numbers was not divisible by three then the decipherment had failed, but if not the next stage was undertaken.

Since the code numbers represented actual words they had a pattern of frequency of occurrence. So, the code breaker would take the list of numbers from the column they had deciphered and look up their frequency in a book. Suppose the code breaker has numbers a, b, c, ... and finds that a and c are in the book but b is not (perhaps because it's wrong or perhaps because the five digit code has never been seen before). The book gives P(a) and P(c) (the probability of those words occurring based on their frequency), the probability that those five digit groups occurred randomnly (actually 3/100000) was called P. Thus the Bayesian evidence given by a occurring is P(a)/P (and similarly for c).

For the word b that hasn't been seen before (or may be wrong) a frequency of occurrence of 1 was set which meant that P(b) was tiny and hence b was essentially ignored.

Then Bayes Theorem was used to multiply together P(a)/P and P(c)/P to get a probability that the number 68035 was the real underlying secret word used to encipher the numbers. If that probability was above a certain threshold the number 68035 was taken as genuine and the decipherment could continue.


There are three things that struck me in Simpson's explanation. Firstly, the prior probability is ignored in the Bayesian calculation. This sort of ignoring the priors happens in Naive Bayesian text classifiers where the prior probability of a 'document' or 'text' is irrelevant: it just acts as a scale factor that can be ignored. I did exactly that in the implementation of POPFile.

Secondly, the handling of the 'unknown' code numbers (b in the example above) is very similar to the way in which POPFile handles unknown words. A very small (close to 0, but non-zero) probability is chosen indicating that this word probability doesn't exist, but just might do.

The other optimization that occurs in POPFile is that rather than multiplying Bayes factors together, the logs of Bayes factors are added together. This is fast and prevents arithmetic underflow when dealing with tiny probabilities. Recall that log AB = log A + log B and log A/B = log A - log B. This is exactly the same trick that makes slide rules work (sliding the rule is 'adding' on a logrithmic scale).

This technique is common (I certainly didn't invent it) and was used at Bletchley Park. Instead of having WRENs do laborious multiplications and divisions of Bayes factors a book of scores was created consisting of the log of the Bayes factor for each code number. Checking whether the code numbers passed the Bayes test to reveal a useable secret number was thus a matter of looking up and adding these factors and seeing if the sum passed a threshold value.

PS If you have access to significance the rest of the article is well worth reading.


Popular posts from this blog

How to write a successful blog post

First, a quick clarification of 'successful'. In this instance, I mean a blog post that receives a large number of page views. For my, little blog the most successful post ever got almost 57,000 page views. Not a lot by some other standards, but I was pretty happy about it. Looking at the top 10 blog posts (by page views) on my site, I've tried to distill some wisdom about what made them successful. Your blog posting mileage may vary. 1. Avoid using the passive voice The Microsoft Word grammar checker has probably been telling you this for years, but the passive voice excludes the people involved in your blog post. And that includes you, the author, and the reader. By using personal pronouns like I, you and we, you will include the reader in your blog post. When I first started this blog I avoid using "I" because I thought I was being narcissistic. But we all like to read about other people, people help anchor a story in reality. Without people your bl

Your last name contains invalid characters

My last name is "Graham-Cumming". But here's a typical form response when I enter it: Does the web site have any idea how rude it is to claim that my last name contains invalid characters? Clearly not. What they actually meant is: our web site will not accept that hyphen in your last name. But do they say that? No, of course not. They decide to shove in my face the claim that there's something wrong with my name. There's nothing wrong with my name, just as there's nothing wrong with someone whose first name is Jean-Marie, or someone whose last name is O'Reilly. What is wrong is that way this is being handled. If the system can't cope with non-letters and spaces it needs to say that. How about the following error message: Our system is unable to process last names that contain non-letters, please replace them with spaces. Don't blame me for having a last name that your system doesn't like, whose fault is that? Saying "Your

The Elevator Button Problem

User interface design is hard. It's hard because people perceive apparently simple things very differently. For example, take a look at this interface to an elevator: From flickr Now imagine the following situation. You are on the third floor of this building and you wish to go to the tenth. The elevator is on the fifth floor and there's an indicator that tells you where it is. Which button do you press? Most people probably say: "press up" since they want to go up. Not long ago I watched someone do the opposite and questioned them about their behavior. They said: "well the elevator is on the fifth floor and I am on the third, so I want it to come down to me". Much can be learnt about the design of user interfaces by considering this, apparently, simple interface. If you think about the elevator button problem you'll find that something so simple has hidden depths. How do people learn about elevator calling? What's the right amount of