Tuesday, August 25, 2009

How to trick Apple Numbers into colouring individual bars on a bar chart

Suppose you have a bar chart like this:



and it's made from a data table like this:



And you are really proud of the sales of the XP1000 and want to change the colour of its bar to red. In Apple Numbers you can't do that because the bar colour is based on the data series.

But you can fool Apple Numbers by creating two data series like this:



Then choose a Stacking Bar chart after selecting the two series of data in the data table and you'll get a chart like this:



You can change the colour of any of the series by clicking on the Fill button on the toolbar. And you can extend that beyond two series to colour the individual bars as needed.

Sunday, August 16, 2009

Geek Weekend, Day 2: The Brunel Museum

So after yesterday's trip to Bletchley Park I stayed in London and hopped over to a spot not far from Tower Bridge where Marc Brunel and his son Isambard built the first tunnel under a navigable river: the Thames Tunnel. The tunnel was dug out by hand using a tunnel shield (which is the basis of all tunnel building to the present day). Workers stood inside a metal cage pressed against the undug earth and removed boards, dug in a few inches and replaced the boards. Once the digging was done the entire structure was forced forwards a few centimeters and bricklayers would fill in behind.



The tunnel has a rich and varied history and is still in use today (read the Wikipedia link above to learn more). The entrance to the tunnel was through a massive circular tube (a caisson) which the Brunels built above ground and then sunk it into place. The entrance has been closed for about 140 years and is being renovated, but I was lucky enough to be taken into it by the curator of the Brunel Museum.

The museum displays works by the Brunels and runs tours through the tunnel itself. The grand entrance hall will be reopened to the public in September. Before that here's a shot of me standing in the interior of the entrance about 15 meters underground.


Image credit: Jonathan Histed

The diagonal line on the wall is the remains of where the grand staircase came down and brought visitors into the tunnel.

Saturday, August 15, 2009

Geek Weekend, Day 1: Bletchley Park

Left to my own devices to the weekend I decided to embark on a Geek Weekend with visits to two places within easy reach of London. Today I visited Bletchley Park which is simply wonderful for any geek out there.

Bletchley Park is where the cryptanalysts of the Second World War worked in great secrecy (including Alan Turing) to break the Nazi German Enigma and Lorenz ciphers. To break them they used a combination of intimate knowledge of language, mathematics and machines.

Here's a Nazi German Enigma machine:



And here's a look inside one of the rotors inside an Enigma machine to see the wiring:



Two of the code breaking machines have been reconstructed. One is the Turing Bombe, an electromechanical machine made to break the Enigma cipher. Here's a look at the wiring in the back of the Bombe:



The other machine is the Colossus, a binary computer built to decipher Lorenz. Enigma is far more famous than Lorenz, but I have a soft spot for the Lorenz code because of its close relationship to modern cryptography. Here's a Lorenz machine:



While I was there I signed a large stack of copies of my book, The Geek Atlas. If you are at Bletchley Park and pop into the shop you'll be able to buy a signed copy if that's your thing. Of course, Bletchley Park, Enigma, Lorenz and the National Museum of Computing (also on site) are covered.

50p from every copy of The Geek Atlas goes to Bletchley Park (if the book is bought in the UK) and so the folks at Bletchley treated me to a special geek moment: a chance to meet Tony Sale who worked at MI-5 and reconstructed the Lorenz breaking machine Colossus. He took me round the back of the machine, and past the No Admittance sign to see it in operation. A geek treat if ever there was one.

The Lorenz code is essentially binary. Letters were transmitted using the Baudot Code which is a five-bit code. To encrypt the Lorenz machine created a pseudo-random sequence of Baudot codes and then XORed them with the message to be transmitted. If both transmitting and receiving machines generated the same pseudo-random sequence then the nice property of XOR that if you perform the same operation twice you get back to where you started. Thus XORing once with the pseudo-random sequence gave you the ciphertext to be transmitted, XORing again gave you back the original message.

Breaking this binary code was achieved with a binary computer. After giving me a behind-the-scenes look at Colossus, Tony Sale stood for a portrait in front of the machine:



And behind the machine is where the valve-action is:



Standing and see the Turing Bombe, staring into Turing's office in Hut 8, being taken around the back of Colossus by the man who put it back together, and getting to see more Enigmas, Lorenzs and Typexs than anyone could ask for made it a real treat.

The National Museum of Computing is Britain's answer to the wonderful Computer History Museum in Mountain View, CA. It contains many machines from the mainframe through the 8-bit era of British computing. All the machines are working or being restored. If you've never seen core memory, massive hard disk packs the size of washing machines, or just Commodore PET it's worth visiting (and it's right next door to Colossus).

Lastly, it's worth knowing that the National Museum of Computing despite being part of the ticket price to Bletchley Park actually receives no money from them. Please consider either donating money directly to them (I gladly emptied my pockets of change) or buying something in their shop.

And tomorrow it's a step back into the 19th century with a special visit to a place important in the life of Isambard Kingdom Brunel.

Tuesday, August 11, 2009

Regular expression are hard, let's go shopping

After looking at a Tweet from Charles Arthur of The Guardian and I decided to hunt down his blog. I typed "Charles Arthur" into Google and the first link was to his blog.

But there was something strange about it. All the letter t's following an apostrophe were highlighted. Here's a screen shot:



Yet, if I typed the exact same URL into Firefox the highlighted t's were not there. Odd. Since the URL was there this had to be something inside the HTTP headers sent when I was clicking through from Google.

I fired up HTTPFox and watched the transaction. Here's a screen shot of the HTTP headers of the GET request for his page. The interesting thing to look at is the Referer header.



It immediately jumped out to me that one of the parameters was aq=t. Looked to me like something on his blog was reading that parameter and using it to highlight. Poking around I discovered that his site is written using WordPress and there's a plugin for WordPress (that hasn't been updated for years) that's intended to highlight search terms when the visitor comes from a search engine.

Looking into the source of his web page it looked from the CSS like he was using that plugin. So I downloaded the source of the plugin and took a look. There's a bug in the way in which it extracts the query parameters from the Referer header for Google.

Here's the code:

$query_terms = preg_replace('/^.*q=([^&]+)&?.*$/i','$1',
$referer);

That regular expression is buggy. It's looking for the right-most instance of a string that begins q= followed by anything other than the & symbol or the end of the Referer header. It's getting the right-most because the ^.* at the beginning means skip over anything from the start of the Referer header until you find q= and be greedy about it: skip over as much stuff as possible.

In the Referer string that are two parameters with q= in them. The first one is the correct one, the second one is the aq=. Since the regular expression isn't written to check that before the q= there's a ? or & it gets the wrong one.

I did a bunch of tests with wget to confirm that I'm right. It's a bug.

The aq=t parameter was added in 2006, here are the details. It's only present when you use the Firefox Google search box. Unfortunately, the plugin hasn't been updated since 2005.

It can be fixed by changing that line above to:

$query_terms = preg_replace('/^.*[\?&]q=([^&]+)&?.*$/i','$1',
$referer);

But the right thing to do here is to rewrite this so that it didn't use regular expressions at all. After all, PHP has parse_url and parse_str functions that can do all the URL and query string parsing for you.

Monday, August 10, 2009

In which I resurrect a 13 year old 3.5" floppy disk and reprint my doctoral thesis

This is a follow up to a post from the weekend about playing with my old Sharp MZ-80K. Someone commented that they'd be more impressed if I resurrected a 15 year old floppy disk than a 30 year cassette tape.

I don't have a 15 year old floppy disk to hand, but I do have this one that's 13 years old and according to the label contains a copy of my doctoral thesis. The disk was created in 1996 and the files on it date to 1994 for my doctoral thesis which I completed in 1992.



But would it still read?

The first step was finding a drive. I had an old-ish 3.5" USB disk drive kicking around, so I plugged it into my MacBook Air and fired up Windows XP under VMWare. It happily recognized the drive and the magically it loaded up the floppy disk:



The disk contains a single ZIP file called oxford.zip. Unzipping it and poking around in the directories reveals that it contains my thesis, all the papers I wrote as a doctoral student, my CV and helpful READ.ME files: a gift to my future self.



That's all well and good, but are any of these files usable? Can I take the LaTeX based source files and produce a copy of my thesis? Or can I take the DVI file that I had saved and make that into a PDF?

A quick copy over to the main Mac machine and a download of LaTeX later I had a working LaTeX system again and all the files.

So to get started I grabbed the DVI file of my thesis and ran it through dvipdf. Apart from complaining about various missing fonts it produced a totally readable PDF file and suddenly I was staring at my thesis. You can download the PDF by clicking on: The Formal Development of Secure Systems. Here's a sample page (the code at the bottom is written in Occam):



But it's not enough to stop at a DVI file, what I wanted was to compile from sources. My first test was to start with something small: my CV. Magically, that worked:



And so on to my thesis. I'm not going to show all that I went through, but it worked after I'd got things in the right directories and tracked down a couple of additional style files.


BTW Does anyone have a Research Machines 380Z with working 8" drives? I have a couple of my really old floppies that it would be fun to read.

Sunday, August 09, 2009

In which I switch on a 30 year old computer and it just works

Yesterday, I had the pleasure of visiting my parents and getting out an old computer. One of the first computers I used a lot was the Sharp MZ-80K which was sold from 1979 to 1982. I was but a wee bairn, but this is the first machine I really programmed. First using BASIC and then using Z-80 assembler (and sometimes by typing in characters directly on the screen corresponding to Z-80 opcodes and then calling the address of the start of screen memory to have the program on screen executed).

My parents have a Sharp MZ-80K that I purchased as a nostalgia item some years ago. Yesterday I fired it up for the first time and was straight into the boot ROM. Oddly, I could remember everything about the machine's operation and shoved a cassette tape containing SP-BASIC into the tape drive, hit play and typed LOAD.

The machine duly loaded SP-BASIC and gave me the prompt.

Then I did the real test. After poking around and finding a tape of my old BASIC programs I typed LOAD again and explored the tape. 30 years on all the programs loaded from tape just fine and executed. I was able to spend a happy few hours playing character-based games that I wrote.

Here's a screen shot of the listing of one such game: notice the J.G.C. initials at the start. This was one of the few programs I put my name in (I think because it was clearly co-written with A.S.).


I put the survival of that tape down to two things: my parents careful handling of a box of Sharp MZ-80K and BBC Micro tapes and my obsession at the time of being the highest quality CrO2 cassette tapes available: "It is still considered today by many oxide and tape manufacturers to have been the most perfect magnetic recording particulate ever invented."

Wednesday, August 05, 2009

The world's simplest log file

Back when I was doing embedded programming we had a debugging feature called 'pokeouts'. The idea was that the program could write a single character to the screen when some important even occurred.

Now writing single characters to the screen might not seem like a good way to do debugging. After all, these days we've got tons of disk space and can spew log files out and our CPUs are not burdened. But in embedded programming you tend to have little space and little time.

This system worked by having code that resembled the following:

pokeout: LD AH, 0Eh
INT 10h
RET

And to make things even easier we actually implemented this by hooking an interrupt so that programs didn't need to know whether the pokeout facility was available. They could safely do INT xxh for debugging output. This meant that the logging facility could be loaded onto a running system.

It's amazing how much information you can convey with single characters scrolling across the screen. It's easy to get critical area entry and exit (we used lots of combinations of ( ) [ ] { } < > followed by single characters to identify the area). You can build on that to output hexadecimal numbers easily. And individual events can be hooked individual characters.

I'd spend my days looking at these scrolling screens of characters waiting for a program to crash. Everything was on the screen.

For some high-speed systems the screen pokeout was too slow. We replaced the routine above with code that wrote the pokeouts into a circular buffer in memory. When the program finally crashed the buffer could be examined using a high-powered debugger like SoftICE to give us a trace of the program's final moments. It was the equivalent of an aircraft's 'black box'.

Just give me a simple CPU and a few I/O ports

Back when I started programming computers came with circuit diagrams and listings of their firmware. The early machines I used like the Sharp MZ-80K, the BBC Micro Model B, the Apple ][ and so on had limited instruction sets and an 'operating system' that was simple enough to comprehend if you understood assembly language. In fact, you really wanted to understand assembly language to get the most out of these machines.

Later I started doing embedded programming. I wrote a TCP/IP stack that ran on an embedded processor inside a network adapter card. Again it was possible to understand everything that was happening in that piece of hardware.

But along the way Moore's Law overtook me. The unending doubling in speed and capacity of machines means that my ability to understand the operation of the computers around me (including my phone) has long since been surpassed. There is simply too much going on.

And it's a personal tragedy. As computers have increased in complexity my enjoyment of them has plummeted. Since I can no longer understand the computer I am forced to spend my days in the lonely struggle against an implacable and yet deterministic foe: another man's APIs.

The worse thing about APIs is that you know that someone else created them, so your struggle to get the computer to do something is futile. This is made worse by closed source software where you are forced to rely on documentation.

Of course, back in my rose tinted past someone else had made the machine and the BIOS, but they'd been good enough to tell you exactly how it worked and it was small enough to comprehend.

I was reminded of all this reading the description of the Apollo Guidance Computer. The AGC had the equivalent of just over 67Kb of operating system in ROM and just over 4kb of RAM. And that was enough to put 12 men on the moon.

Even more interesting is how individuals were able to write the software for it: "Don was responsible for the LM P60's (Lunar Descent), while I was responsible for the LM P40's (which were) all other LM powered flight". Two men were able to write all that code and understand its operation.

12 men went to the moon using an understandable computer, and I sit before an unfathomable machine.

Luckily, there are fun bits of hardware still around. My next projects are going to use the Arduino.

Monday, August 03, 2009

Please don't use pie charts

I don't like pie charts. I don't like them because they fail to convey information. They do that because people have a really hard time judging relative areas instead of lengths. Wikipedia mentions some of the reasons why pie charts are generally poor.

I'd go a little further and say that pie charts are really only useful when a small number of categories of data are far, far greater than others. Like this image from Wikipedia of the English-speaking peoples:



Yep, there are lots of Americans.

Once you get data that isn't widely different or you have lots of categories your pie chart would be better as either a bar chart, or as simply a data table. Here's a particularly bad pie chart from a blog about Microsoft Office. It depicts the number of features added in various releases.



Literally eveything is wrong with this pie chart. The data being presented is the number of features added per release. Releases occur chronologically. So an obvious choice would be a bar chart or a line chart for cumulative information with time going from left to right. Instead we have to follow the chart around clockwise (finding the right starting point) to follow time.

And since the releases didn't come out at equal intervals it would be really nice to compare the number of features added with the amount of time between releases.

The pie chart has no values on it at all. We don't get the actual number of features, or just the percentage added. So we are left staring at the chart trying to guess the relative sizes of the slices. And that's made extra hard by the chart being in 3D. For example, how do Word 2000 and Word 2003 compare?

But if you still must use pie charts, I beg you not to use 3D pie charts. Please, they are simply an abomination. Making them 3D just makes them even harder to interpret.