Sunday, December 27, 2009

Toy decoding: vtech Push and Ride Alphabet Train

So, it's Christmas and you end up visiting people with kids... and they've got a fancy new vtech Push and Ride Alphabet Train. Now, you're the world's worst child minder because you see it and think: how does that work?


Specifically, when you insert one of the 26 alphabet blocks into the side of the train how does it know to say the correct letter? And how does it know which side (letter or word) is facing outwards (so it can say a letter or a corresponding word: "A is for Apple" etc.).

Now it quick examination shows that there are 6 small switches in each block receptacle and that each block has corresponding bits of plastic and holes to make different binary patterns. The top bit (bit 5) seems to be used to indicate which side of the block is showing.

That leaves 5 bits for the alphabet. Of course that means there are 32 possible combinations (actually 31 since 'block not present' indicated by all switches up is important), and 26 letters in the alphabet. So which 5 binary combinations are not needed for the alphabet and what do they do?

First here's the mapping between letters and their five bit patterns. Here 0 = button is depressed by little sliver of plastic, and 1 = button is left up because there's a space in the block.

a 11010 b 00010 c 00011 d 00100 e 00101 f 00110
g 00111 h 01000 i 01001 j 01010 k 01011 l 01100
m 01101 n 01110 o 01111 p 10000 q 10001 r 10010
s 10011 t 10100 u 10101 v 10110 w 10111 x 11000
y 11001 z 00001

As I'm sure you've noticed there's something very odd about this sequence. Letters b through y follow a nice pattern, but what's up with a and z? Here's the same information using decimal to make the problem clear:

a 26 b 2 c 3 d 4 e 5 f 6
g 7 h 8 i 9 j 10 k 11 l 12
m 13 n 14 o 15 p 16 q 17 r 18
s 19 t 20 u 21 v 22 w 23 x 24
y 25 z 1

As you can see it appears that the numbers for a and z are swapped. You'd expect a to be 1 and z to be 26. Now, there could be some clever explanation for this but I'm guessing it's the work of Captain Cock-up.

When I used to write software in a hardware company it was pretty common for there to be mistakes in the hardware design or implementation that had to be fixed in software. I remember one very snowy December outside of Route 128 at an HP works debugging something nasty with an EISA card on which our code was running inside some new HP workstation (pretty sure it was a Series 700 with the native GSC bus and something called the Wax ASIC to provide an EISA bus). Turned out that our hardware wasn't latching things onto the EISA bus with quite the perfect timing that the ASIC needed and corrupt data was hitting the main bus. This is not the sort of thing you want to have happen. The fix was done in software to alter the order of writing (which was done with two 16-bit writes) and a little loop to spin around checking for stability.

So, I bet vtech had a little mistake like that. Somehow the codes for a and z got swapped in software there's a fix.

If you haven't played around with hardware much you might have been surprised that button depressed = 0 (see above). This is actually pretty common because it's typical to connect logic lines going into some logic (especially if it's TTL) to positive 5V (or similar) with a pull-up resistor.

In TTL logic an unconnected pin will float around and try to be high, and so most designers ensure that it is actually high with a pull-up. Then to change the input you connect the input pin to ground via your switch (with no resistance). Thus the input is normally high (which is typically interpreted as 1) and goes low (normally that's 0) when the switch is depressed.

Here's a typical circuit:


The only disappointed me was that the extra 5 combinations of 1s and 0s don't do anything. I was really hoping for an Easter Egg left by the developers.

Friday, November 20, 2009

Parsing a JSON document and applying it to an HTML template in Google Go

Here's some simple code to parse a JSON document and the transform it into an HTML document using the Google Go packages json and template.

If you've done anything in a scripting language then you'll probably be surprised by the generation of fixed struct types that have to match the parsed JSON document (or at least match some subset of it). Also because of the way reflection works in Google Go the struct member names need to be in uppercase (and for that reason I've used uppercase everywhere).

import (
"fmt";
"os";
"json";
"template"
)

type Row struct {
Column1 string;
Column2 string;
}

type Document struct {
Title string;
Rows []Row;
}

const a_document = `
{
"Title" : "This is the title",
"Rows" : [ { "Column1" : "A1", "Column2" : "B1" },
{ "Column1" : "A2", "Column2" : "B2" }
]
}`

const a_template = `
<html>
<head><title>{Title}</title></head>
<body>
<table>
{.repeated section Rows}
<tr><td>{Column1}</td><td>{Column2}</td></tr>
{.end}
</body>
</html>`

func main() {

// The following code reads the JSON document in
// a_document and turns it into the Document structure
// stored in d

var d Document;
ok, e := json.Unmarshal( a_document, &d );

if ok {

// This code parses the template in a_template places
// it in t then it applies the parsed JSON document in
// d to the template and prints it out

t, e := template.Parse( a_template, nil );
if e == nil {
t.Execute( d, os.Stdout );
} else {
fmt.Printf( e.String() );
}
} else {
fmt.Printf( e );
}
}

All the real work is done my main() by calling json.Unmarshal, template.Parse and then Execute.

Here's the Makefile and output:

$ cat Makefile
P := template
all: $P

$P: $P.6
6l -o [email protected] $^

%.6: %.go
6g $<
$ make
6g template.go
6l -o template template.6
$./template

<html>
<head><title>This is the title</title></head>
<body>
<table>
<tr><td>A1</td><td>B1</td></tr>
<tr><td>A2</td><td>B2</td></tr>
</body>
</html>

Thursday, November 19, 2009

Installing Google Go on Mac OS X

I decided to have a go with Google Go since I'm an old fogey C/C++ programmer. Any new innovation in the C/C++ family gets me excited and Google Go has quite a few nice features (garbage collection is really nice to have and channels make me think of all the work I did in CSP).

I decided to go with the 6g compiler since gccgo doesn't have garbage collection implemented yet and hence there's no way to free memory. The only way to get 6g is to mirror its Mercurial repository. So...

Step 1: Install Mercurial

For that I used prebuilt packages from here and got Mercurial 1.4 for Mac OS X 1.5 (no, I haven't upgraded to Snow Leopard yet).

Step 2. Set GOROOT

I just did a quick cd ; mkdir go ; export GOROOT=$HOME/go to get me started.

Step 3. Clone the 6g repository

That was a quick hg clone -r https://go.googlecode.com/hg/ $GOROOT followed by the hard part: compiling it. You need to have gcc, make, bison and ed installed (whcih I do since I do development work on my Mac).

Step 5. Set GOBIN

This points to where the binaries will go, for me that's $HOME/bin since I'll be doing local development using Go. And I updated PATH to include $GOBIN.

Step 4. Compile 6g

You first need to set GOARCH and GOOS. For me that's amd64 for the architecture (the Intel Core 2 Duo in my Macbook Air is a 64-bit processor) and darwin for the OS (since this is a Mac).

$ export GOARCH=amd64
$ export GOOS=darwin

Then you can actually do the compile:

$ cd $GOROOT/src
$ ./all.bash

This does a build and test of 6g and it was very fast to build (although I'm used to building gcc which is a bit of a monster).

Step 5. Write a Hello, World! program

Here's my first little Google Go program (filename: hw.go) just to test the 6g compiler.

package main

import "fmt"

func main() {
fmt.Printf( "Hello, World\n" );
}

To simplify building I made a minimal Makefile:

all: hw
hw: hw.6 ; 6l -o [email protected] $^
%.6: %.go ; 6g $<

And then the magic moment:

$ make
6g hw.go
6l -o hw hw.6
$ ./hw
Hello, World!

And now for a real project... get SQLite to interface to it.

Thursday, November 12, 2009

Geek Weekend (Paris Edition), Day 4: Institut Pasteur

Leaving my SO in bed at the hotel with a nasty bacterial infection and some antibiotics, I went with timely irony to visit the home and laboratory of Louis Pasteur at the Institut Pasteur. (It's pretty easy to find since it has a conveniently named stop on the Paris metro: Pasteur).


At the Institut Pasteur there's a wonderful museum that covers the life and work of Louis Pasteur (and his wife). It's housed in the building (above) where the Pasteurs lived. There's a single room of Pasteur's science and the rest of the house is Pasteur's home; so a visit is partly scienfitic and partly like visiting any old home. I was mostly interested in the laboratory (although seeing how he lived---pretty darn well!---was also worth it).

Pasteur wrote standing up at a raised table (much like old bank clerks used to use) and his lab is full of specimens that he worked on. There's a nice display about chirality which Pasteur had initially worked on while study tartaric acid in wine. (Pasteur determined that there were two forms of tartaric acid by painstakingly sorting tiny crystals by hand).

The rest of the lab covers immunization, pasteurization and the germ theory of disease. There was a nice display of Pasteur's bottles of chicken broth that he used to demonstrate the germ theory of disease. The bottles contain boiled broth and have a long tapering curved neck. Although the neck is open the shape prevents dust from entering and the broth sits undisturbed (as it has for 150 years).

In the same room there's also a big bottle of horse's blood that looks fresh despite its age, and there are detailed displays about immunization (and especially Pasteur's rabies vaccine).

The museum also has a lot of equipment used by Pasteur, such as vacuum pumps and autoclaves. It all has that lovely Victorian feel of wrought iron and brass.

The oddest part of the museum is the Pasteurs' burial chamber built beneath the house and in a totally over the top Byzantine style.

Note that the museum is only open in the afternoons during the week and that you must bring photo ID with you to get in since it is inside the Institut Pasteur.

Tuesday, November 10, 2009

Geek Weekend (Paris Edition), Day 3: The Arago Medallions

The old Paris Meridian (which was in use up until 1914) passes not far from The Pantheon which I visited to see Foucault's Pendulum. It's actual longitude today is 2°20′14.025″.

To mark the old meridian the French decided to install some art work and they commissioned an artist called Jan Dibbets to build something appropriate. What he did was embed brass disks in the streets of Paris marking the meridian and turning the whole city into a sort of treasure hunt.

These Arago medallions (which celebrate the meridian and the life of François Arago) cut through the very heart of Paris. They make a wonderful way to see Paris at going on a treasure hunt. And the meridian goes to the very heart of something important: the meter. The original definition of a meter was based on the length of the Paris meridian from the north pole to the equator. Arago surveyed the meridian and came up with a very precise definition for this fundamental unit of measure.

Here's a photo I took of one on Boulevard Saint-Germain:


There's a full list of the medallions (in French) here. And here's my English translation of the list (the numbers in parentheses give the number of medallions to be found there):

Position of the medallions along the meridian from north to south

  • XVIIIe arrondissement


    • 18 av. de la Porte de Montmartre, in front of the municipal library (1)

    • Intersection of rue René Binet and av. de la Porte de Montmartre (1)

    • 45/47 av. Junot (1)

    • 15 rue S. Dereure (1)

    • 3 and 10 av. Junot (2)

    • Mire du Nord, 1 av. Junot, in a private courtyard with controlled access (1)

    • 79 rue Lepic (1)


  • IXe arrondissement


    • 21 boulevard de Clichy, on the pavement (2)

    • 5 rue Duperré (1)

    • 69/71 rue Pigalle (2)

    • 34 rue de Châteaudun, inside the courtyard of the Ministry for National Education (2)

    • 34 rue de Châteuadun (1)

    • 18/16 and 9/11 boulevard Haussmann, in front of the restaurant (2)

    • Intersection of rue Taitbout, in front of the restaurant and 24 boulevard des Italiens (2)


  • IIe arrondissement


    • 16 rue du 4 septembre (1)

    • 15 rue saint Augustin


  • Ie arrondissement


    • 24 rue de Richelieu (1)

    • 9 rue de Montpensier (1)

    • At the Palais Royal: Montpensier and Chartres Colonnades, Nemours Gallery, passageway on place Colette and place Colette in front of the café (7)

    • Intersection of place Colette and Conseil d'État, rue saint Honoré (1)

    • place du Palais royal, on the rue de Rivoli side (1)

    • rue de Rivoli, at the entrance of the passageway (1)

    • At the Louvre, Richelieu Wing: French sculpture room and in front of the escalator (3)

    • At the Louvre, Napoléon Courtyard, behind the pyramid (5)

    • At the Louvre, Denon Wing: Roman antiquity room, stairs and corridor (3)

    • Quai du Louvre, near the entrance to the Daru pavillion (1)

    • port du Louvre, not far from the Pont des Arts (1)


  • VIe arrondissement


    • port des Saints-Pères (1)

    • quai Conti, near the place de l'Institut (2)

    • place de l'institut and rue de Seine (1)

    • 3 and 12 rue de Seine (4)

    • Intersection of rue de Seine and rue des Beaux-Arts (1)

    • 152 and 125-127 boulevard Saint-Germain (2)

    • 28 rue de Vaugirard, on the Sénat side (1)

    • In the Jardin de Luxembourg, on asphalt and cement surfaces (10)

    • rue Auguste Comte, at the entrance to the garden(1)

    • av. de l'Observatoire on the pavement near the garden (2)

    • Intersection of av. de l'Observatoire and rue Michelet (1)

    • jardin Marco Polo (3)

    • Intersection of av. de l'Observatoire and rue d'Assas (1)

    • place Camille Jullian (2)

    • On the ground at the intersection of av. Denfert Rochereau and av. de l'Observatoire, on the Observatoire side (1)

    • av. de l'Observatoire (2)


  • XIVe arrondissement


    • Courtyard of the Observatoire de Paris (2)

    • Inside the Observatoire (1)

    • Terrace and garden in the private area of the Observatoire (7)

    • boulevard Arago and place de l'Ile de Sein (6)

    • 81 rue du faubourg Saint Jacques (1)

    • place Saint Jacques (1)

    • parc Montsouris (9)

    • boulevard Jourdan (2)

    • Cité universitaire, on the axis from the pavillon Canadien to the pavillon Cambodgien, the final one is behind the pavillion (10)



This special Google Map has many of them on it, the rest you'll have find by wandering:

View Paris Meridian in a larger map

Monday, November 09, 2009

Parsing HTML in Python with BeautifulSoup

I got into a spat with Eric Raymond the other day about some code he's written called ForgePlucker. I took a look at the source code and posted saying it looks like a total hack job by a poor programmer.

Raymond replied by posting a blog entry in which he called me a poor fool and snotty kid.

So far so good. However, he hadn't actually fixed the problems I was talking about (and which I still think are the work of a poor programmer). This morning I checked and he's removed two offending lines that I was talking about and done some code rearrangement. The function that had caught my eye initially was one to parse data from an HTML table which he does with this code:

def walk_table(text):
"Parse out the rows of an HTML table."
rows = []
while True:
oldtext = text
# First, strip out all attributes for easier parsing
text = re.sub('<TR[^>]+>', '<TR>', text, re.I)
text = re.sub('<TD[^>]+>', '<TD>', text, re.I)
# Case-smash all the relevant HTML tags, we won't be keeping them.
text = text.replace("</table>", "</TABLE>")
text = text.replace("<td>", "<TD>").replace("</td>", "</TD>")
text = text.replace("<tr>", "<TR>").replace("</tr>", "</TR>")
text = text.replace("<br>", "<BR>")
# Yes, Berlios generated \r<BR> sequences with no \n
text = text.replace("\r<BR>", "\r\n")
# And Berlios generated doubled </TD>s
# (This sort of thing is why a structural parse will fail)
text = text.replace("</TD></TD>", "</TD>")
# Now that the HTML table structure is canonicalized, parse it.
if text == oldtext:
break
end = text.find("</TABLE>")
if end > -1:
text = text[:end]
while True:
m = re.search(r"<TR>\w*", text)
if not m:
break
start_row = m.end(0)
end_row = start_row + text[start_row:].find("</TR>")
rowtxt = text[start_row:end_row]
rowtxt = rowtxt.strip()
if rowtxt:
rowtxt = rowtxt[4:-5]# Strip off <TD> and </TD>
rows.append(re.split(r"</TD>\s*<TD>", rowtxt))
text = text[end_row+5:]
return rows

The problem with writing code like that is maintenance. It's got all sorts of little assumptions and special cases. Notice how it can't cope with a mixed case <TD> tag? Or how there's a special case for handling a doubled </TD>?

A much better approach is to use an HTML parser than knows all about the foibles of real HTML in the real world (Raymond's main argument in his blog posting is that you can't rely on the HTML structure to give you semantic information---I actually agree with that, but don't agree that throwing the baby out with the bath water is the right approach). If you use such an HTML parser you eliminate all the hassles you had maintaining regular expressions for all sorts of weird HTML situations, dealing with case, dealing with HTML attributes.

Here's the equivalent function written using the BeautifulSoup parser:

def walk_table2(text):
"Parse out the rows of an HTML table."
soup = BeautifulSoup(text)
return [ [ col.renderContents() for col in row.findAll('td') ]
for row in soup.find('table').findAll('tr') ]

In Raymond's code above he includes a little jab at this style saying:

# And Berlios generated doubled </TD>s
# (This sort of thing is why a structural parse will fail)
text = text.replace("</TD></TD>", "</TD>")

But that doesn't actually stand up to scrutiny. Try it and see. BeautifulSoup handles the extra </TD> without any special cases.

Bottom line: parsing HTML is hard, don't make it harder on yourself by deciding to do it yourself.

Disclaimer: I am not an experienced Python programmer, there could be a nicer way to write my walk_table2 function above, although I think it's pretty clear what it's doing.

Friday, November 06, 2009

Geek Weekend (Paris Edition), Day 2: Foucault's Pendulum

Not very far from The Curie Museum is the former church and now burial place for the great and good men (and one woman) of France: The Pantheon. Inside the Pantheon is the original Foucault's Pendulum.

The pendulum was first mounted in the Pantheon in 1851 to demonstrate that the Earth is rotating. The pendulum swings back and forth in the same plane, but the Earth moves. Relative to the floor (and to the convenient hour scale provided) the pendulum appears to rotate.


The pendulum is on a 67m long cable hanging from the roof of the Pantheon. The bob at the end of the cable weight 27kg. In the Pantheon the pendulum appears to rotate at 11 degrees per hour (which means it takes more than a day to return to its original position). If it were mounted at the North Pole it would 'rotate' once every 24 hours, the pendulum's period of rotation depends on the latitude diminishing to 0 degrees per hour at the equator (i.e. it doesn't 'rotate' at all).


If you take a look at the photograph above you can see that I was there just after 1200. The scale shows the current time measured by the pendulum.

The actual movement of the pendulum is only hard to understand because the common sense assumption is that the floor is not moving, but of course it is. It appears that what we are observing is a pendulum swinging above a fixed floor.

But the floor is actually moving because of the rotation of the Earth. That makes understanding the pendulum's motion harder. The important factor is the Coriolis Effect (sometimes erroneously called the Coriolis Force).

The simplest way to visualize the Coriolis Effect is to imagine firing a gun at the Equator straight northwards along a meridian. Because the Earth rotates the bullet will not land on the meridian, the Earth will have moved and the bullet will land to the west of the meridian. It looks as though a force has acted on the bullet to push it sideways. Of course, there's no actual force, it's just that the frame of reference (i.e. where the observer is) is not stationary.

Essentially the same thing happens with Foucault's Pendulum. The observer and the floor are not stationary and so the pendulum has an apparent motion.

Security Now #221

I was a guest on Security Now this week and the podcast has now been released (as has a transcript). Steve Gibson and some other people asked me to provide the presentation in some relatively readable format.

The original presentation is here, but it, ironically, requires JavaScript and Adobe Flash. So here are two additional formats: old style Microsoft PowerPoint and PDF.

Tuesday, November 03, 2009

Geek Weekend (Paris Edition), Day 1: The Curie Museum

So, it was off to Paris for the weekend via Eurotunnel and I managed to fit in four places from The Geek Atlas in four days. I was staying in a hotel in the Latin Quarter which is a stone's throw from... The Curie Museum.

Here's Marie Curie's laboratory:


The museum covers the lives and works of two Nobel Prize-winning couples: Pierre and Marie Curie (they discovered Radium and Polonium) and their daughter Irene and her husband Frederic Joliot (they discovered artificial radioactivity: you could make a substance radioactive by bombarding it with alpha particles).

Their Nobel Prizes are on display as is the equipment that they used (including the apparatus for measuring radiation by measuring ionization of air---which itself had been discovered by Becquerel).

Here are the Nobel Prizes:


Although I love the science section of the museum (including the laboratory where they worked with a piece of paper from one of their notebooks with its radioactive thumb print---they weren't too careful about handling radioactive elements), the best bit is the section on the craze for radium products in the 1920s and 1930s.

Here's an ad for a beauty cream that contains radium and thorium. Gives you that special glow!


Here you'll find make up that contains thorium and radium, special radium wool to keep babies warm, a radium dispenser so you could have a radioactive soak in the bath and more...


Seems stupid now, but back then the dangers were either ignored or unknown, and radioactivity seemed like a wondrous thing (especially since it was discovered early on that it would kill or reduce tumors). I wonder what products we are feeding ourselves that in 70 years we'll consider down right dangerous.

There's a nice web site of radioactive quack cures which make my skin crawl. Yes, I'm going to take a radioactive suppository to boost my sex life tonight! Move over Viagra, here's Vita Radium.

Thursday, October 29, 2009

Der Geek Atlas

The Geek Atlas ist jetzt auch in Deutsch.


Kaufen Sie es hier.

Die lebendige Geschichte der Wissenschaften ist überall um uns herum, man muss nur wissen, wo man hinschauen muss. Mit diesem einzigartigen Reiseführer kann man 128 Orte auf der Welt kennen lernen, die für bedeutsame Ereignisse in Wissenschaft und Technik stehen. Erlebe das Foucaultsches Pendel, das in Paris schwingt; erfahre Interessantes über das größste Wissenschaftsmuseum der Welt, das "Deutsche Museum" in München; besuche einen Ableger des Newtons Apfelbaums am Trinity College in Cambridge und vieles, vieles mehr...

Jeder Ort in Der Geek-Atlas stellt eine außerordentliche Entdeckung oder Erfindung in den Mittelpunkt und befasst sich darüber hinaus auch mit den Menschen und Geschichten, die hinter diesen Erfindungen stehen. Alle Orte werden mit interessanten Fotos vorgestellt und die Themen mit zahlreichen Zeichnungen illustriert. Das Buch ist nach Ländern aufgeteilt, für alle interessanten Orte werden auch -- neben nützlichen Tourismusinformationen -- die genauen GPS-Daten aufgeführt.

Eine kleine Auswahl der interessanten Orte: * Bletchley Park in Großbritannien, wo der Enigma-Code geknackt wurde * die Alan-Turing-Gedenkstätte in Manchester * die Hornantenne in New Jersey, wo die Big-Bang-Theorie bestätigt wurde * das National Cryptologic Museum in Fort Meade in Maryland (USA) * die Trinity Test Site in New Mexico, wo die erste Atombombe gezündet wurde * das National Museum of Scotland in Edinburgh, wo das Schaf Dolly ausgestopft ausgestellt wird Jeder Ort, der im Der Geek-Atlas vorgestellt wird, hat einen besonderen mathematischen, technischen oder wissenschaftlichen Hintergrund. Orte, die das Geek-Herz schneller schlagen lassen.

Thursday, October 22, 2009

Some real data about JavaScript tagging on web pages

Since March of this year I've been running a private web spider looking at the number of web tags on web pages belonging to the Fortune 1000 and the top 1,000 web sites by traffic. Using the spider I've been able to see which products are deployed where, and how those products are growing or shrinking.

The web tags being tracked are those used for ad serving, web analytics, A/B testing, audience measurement and similar.

The spider captures everything about the page, including screen shots, and I'm able to drill in to see the state of a page and all its includes at the time of spidering. Here's shot of Apple with all the detail that the spider keeps.



The first interesting thing is to look at the top 1,000 web sites by traffic and see how many different tags are deployed per page. The average is 2.21, but if you exclude those that have no tags at all then the average is 3.10. Here's the distribution of number of tags against percentage of sites.


And of course, it's possible to see the market share of various different products. Here are the top 10 that I am tracking. Google Analytics has an impressive 43% of the top 1,000 web sites by traffic.


Since I've been tracking over time it's also possible to watch the growth (and decline). Here's the growth in the average number of tags on a web page (excluding pages that have no tags) since March 2009.

Since I also keep all the JavaScript and HTML for a page it's a breeze to calculate page weights. Here's a chart showing the size of HTML and JavaScript for the top 1,000 web pages by traffic. The x-axis shows the size of the page (excluding images) in kilo- or megabytes. The y-axis is the percentage of sites in that band.


I was shocked when I saw that list and suspected a bug. How could there be web sites with megabytes of non-image content? It turned out that it wasn't a bug. For example, at the time of downloading the HTML and JavaScript for Gawker was over 1Mb.

In a previous post I showed in detail the tagging on a site and that 29% of the non-graphic content was JavaScript used for web tagging. Here's another chart showing what percentage of web page markup is included JavaScript (this can include stuff like jQuery and web tagging products).


The really surprising thing there is how much JavaScript there is on pages. For many pages it's the majority of non-graphic content. Take for example Subscene where the home page HTML is about 18k but then masses of JavaScript are included (including over 200k from Facebook, a similar amount from UPS and various other bits of code).

If you delve into the tags actually used by various products you'll see that the sizes of JavaScript used for them varies a lot. comScore's Beacon is tiny (just 866 bytes)!



Finally, you might be asking yourself which site had 16 different tags on it. The winner is the celebrity gossip site TMZ.

Monday, October 12, 2009

Monopoly .com Edition

I love Monopoly and have a small collection of Monopoly games from around the world. The oddest one is Monopoly .com Edition which was released in 2000.



In it the streets are replaced with '30 of today's hottest web sites'. These are: Sportsline.com and FoxSports.com, Yahoo! Geocities, Oxygen and iVillage, shockwave.com, games.com and E! Online, Priceline, Expedia, and eBay, weather.com, about.com and cnet.com, ETrade, monster.com and MarketWatch.com, Ask Jeeves, AltaVista and Lycos, and [email protected] and Yahoo! (Yes, there are only 22!)



The railway stations are replaced with telecom companies: MCI WorldCom, Nokia, Sprint and AT&T. The playing pieces are made of pewter and depict Mr Monopoly sitting at computer, the Internet Explorer Hand, a surfboard, a computer screen, a web browser, a PC, an email, a mouse and a microchip. The Mr Monopoly piece is a special token that can take any web site 'offline' making it unavailable for purchase.



The buildings are houses and office blocks (instead of hotels), all the money is in millions of $ and Community Chest and Chance are replaced with Email and Download.



The back of the box (sorry about the poor quality of these images, had to use my iPhone camera):

Tuesday, September 29, 2009

Solving the XSS problem by signing <SCRIPT> tags

Last week I talked about JavaScript security at Virus Bulletin 2009. One of the security problems with JavaScript (probably the most insidious) is Cross-site Scripting (which is usually shortened to XSS).

The basic defense against XSS is to filter user input, but this has been repeatedly shown to be a nightmare. Just yesterday Reddit got hit by an XSS worm that created comments because of a bug in the implementation of markdown.

I believe the answer is for sites to sign the <SCRIPT> tags that they serve up. If they signed against a key that they control then injected JavaScript could be rejected by the browser because its signature would be missing or incorrect and the entire XSS problem would disappear.

For example, this site includes Google Analytics and here's the JavaScript:

<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ?
"https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost +
"google-analytics.com/ga.js'
type='text/javascript'%3E%3C/script%3E"));
</script>

<script type="text/javascript">
try {
var pageTracker = _gat._getTracker("UA-402747-4");
pageTracker._trackPageview();
} catch(err) {}</script>

Since I chose to include that JavaScript I could also sign it to say that I made that decision. So I could modify it to something like this:

<script type="text/javascript"
sig="068dd60b18b6130420fed77417aa628b">
var gaJsHost = (("https:" == document.location.protocol) ?
"https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost +
"google-analytics.com/ga.js'
type='text/javascript'%3E%3C/script%3E"));
</script>

<script type="text/javascript"
sig="17aa628b05b602e505b602e505b602e5">
try {
var pageTracker = _gat._getTracker("UA-402747-4");
pageTracker._trackPageview();
} catch(err) {}</script>

The browser could verify that everything between the <SCRIPT> and </SCRIPT> is correctly signed. To do that it would need access to some PK infrastructure. This could be achieved either by piggybacking on top of existing SSL for the site, or by a simple scheme similar to DKIM where a key would be looked up via a DNS query against the site serving the page.

For example, jgc.org could have a special TXT DNS entry for _scriptkey.jgc.org which would contain the key for signature verification.

To make this work correctly with externally sourced scripts it would be important to include the src attribute in the signature. Or alternatively an entirely new tag just used for signatures could be created to sign the HTML between the tags:

<sign sig="068dd60b18b6130420fed77417aa628b">
<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ?
"https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost +
"google-analytics.com/ga.js'
type='text/javascript'%3E%3C/script%3E"));
</script>

<script type="text/javascript">
try {
var pageTracker = _gat._getTracker("UA-402747-4");
pageTracker._trackPageview();
} catch(err) {}</script>
</sign>

Either way this would mean that JavaScript blocks could be signed against the site serving the JavaScript completely eliminating XSS attacks.

Note that in the case of externally sourced scripts I am not proposing that their contents be signed, just that the site owner sign the decision to source a script from that URL. This means that an XSS attack isn't possible. Of course if the remotely sourced script itself is compromised there's still a problem, but it's a different problem.

Friday, September 25, 2009

Geek Side Trip: CERN

While over in Geneva for the Virus Bulletin 2009 conference I managed to make a side trip to see CERN. It turned out to be a great afternoon because the tour was guided by actual physicists and I took a school trip.

I am a little old for it, but when I organized my trip I was told that I would be added to a group from Steyning Grammar School. There I was with 23 final year A-level students on a whirlwind trip to Switzerland. They were extremely nice kids, and I could easily imagine that teaching such a group would be incredibly rewarding.

The visit started with a talk and a film. This told the story of CERN itself (it's almost 55 years old) and described the operation of the Large Hadron Collider.

Here's what part of the LHC looks like (this is a mock-up). The large blue thing is one of the super-conducting magnets. There are 1,232 of these in the 27km ring, each weighs about 27 tonnes.



After that we were bussed over to where the superconducting magnets used in the LHC are received and tested. This involves cooling them down to very close to 0 K (actually 1.7 K), turning on the pair of magnets and inserting a rotating rod inside the two tubes where the particle beam will pass.

Here's a view of a slice through one of the magnets. The two tubes in the middle are where the particle beams pass. The tubes contain a hard vacuum and are surrounded by super-conductors that form the magnet. The entire thing is bathed in liquid helium by a network of pipes.



The rotating rods inserted to test the magnets contain coils that have an electric current induced in them. Measuring the electric current it's possible to confirm that the magnetic field inside the tubes is perfect. The magnetic field is what bends the counter-rotating beams slightly so that they end up tracing out a circle.

This is a detail of one of the particle bean tubes with the valve used for maintaining the hard vacuum. I was surprised how small it was.



And here's a shot of a single dipole magnet ready to be attached to the test apparatus.



And if you are going to move one of those around you need a robot. This one floats around on an air cushion.



To join the magnets together in the circle you need a flexible coupling. The Bulgarian physicist who showed us this bit explained how the magnets were coupled and soldered together: 125,000 separate joints! This is where the LHC failure occurred.



As well as the magnets for bending the beam the beam has to be accelerated. That's achieved by one of these:



And to keep the beam focussed you need another sort of magnet (I don't have a picture of those, but there are 392 of them).

After all that we headed over to the AMS which is a satellite that will be attached to the International Space Station. The highlight of that part was that the designer of it (and friendly Italian man called Giovanni Ambrosi) was on hand to explain what he'd been up to for the last 15 years.

POPFile v1.1.1

The cool team that manages the POPFile project (that I started what seems like years ago...) have just released v1.1.1 with a bunch of improvements (especially for Windows users).

From the release notes:


1. New features

You can now customize Subject Header modification placement (head or tail)
by changing the new option 'bayes_subject_mod_pos'. (ticket #74)

NNTP module now caches articles received with the message number specified.

You can now jump to message header/message body/quick magnets/scores in the
single message view by clicking links on the head of the page. (ticket #77)

You can now filter messages shown in the history using 'reclassified' option.
(ticket #67)


2. Windows version improvements

The minimal Perl has been updated to the most recent 5.8 release. Since this
release of Perl only officially supports Windows 2000 or later POPFile 1.1.1
may not work on Windows 95, Windows 98, Windows Millennium or Windows NT. The
installer will display a warning message explaining that POPFile may not work
properly on these old systems.

The Windows system tray icon's menu now offers options to visit the support
website and check for new versions of POPFile.

If the automatic version check feature has been turned on (via the Security tab
in the User Interface) then the system tray icon will change and a message box
will be displayed. This check is performed once per day.

Now that all known problems with the system tray icon have been fixed it will
be enabled by default in new installations. (ticket #106)

The Windows installer now preselects the relevant components when upgrading or
modifying an existing installation. (tickets #13 and #26)

The Windows installer can now display the UI properly even if the database is
very large (tens of MB). (ticket #109)

Fixed a problem that POPFile does not work on Japanese Windows when the path
of the data directory contains non-ASCII characters (e.g. the user name is
written in Japanese). (ticket #111)

The installer is now compatible with Windows 7.


3. Mac OS X version improvements

The installer for Mac OS X 10.6 (Snow Leopard) has come.
Since Snow Leopard includes Perl v5.10.0, the Perl modules which are supplied
with the POPFile installer v1.1.0 or earlier aren't compatible with it.

Starting with this version, two versions of installer will be released.
One is for Snow Leopard, and another is for the former versions of Mac OS X.
The name of Snow Leopard installer will have '-sl' suffix.


4. Other improvements

The users who are using very large database (tens of MB) will be able to
reclassify messages faster. (ticket #108)

JavaScript must die

I've just completed my presentation at Virus Bulletin 2009 which was entitled JavaScript Security: The Elephant running in your browser.

My thesis is that the security situation with JavaScript is so poor that the only solution is to kill it. End users have very little in the way of protection against malicious JavaScript, major web sites suffer from XSS and CSRF flaws, the language itself allows appalling security holes, and as data moves to the cloud the 14 year old JavaScript security sandbox becomes more and more irrelevant.

Here are the slides:

Tuesday, September 22, 2009

The Geek Atlas: now on your iPhone

Today, O'Reilly released my book, The Geek Atlas, as an iPhone application. It's the complete text of the book on the iPhone. Since the book is organized as small chapters it's very readable on a small screen.



The neatest feature is that latitude and longitude given for each place in the book is clickable and takes you straight to that location on Google Maps.

And it's only $5.99 or £3.49.

Friday, September 11, 2009

"Hello John. It's Gordon Brown."

Last night the British Prime Minister Gordon Brown issued a long statement about my Alan Turing petition that included a clear apology for his treatment. Unfortunately, I've been in bed nursing the flu so it was only by chance that an amazing sequence of events occurred.

Yesterday evening I realized that I had to check my email (I'd been avoiding it while ill) because of a work commitment on Friday and so I logged in to find a message that read:

John - I wonder if you could call me as a matter of urgency, regarding your petition. Very many thanks!

Kirsty

Kirsty xxxxxxx
10 Downing St, SW1A 2AA
Tel: 020x xxxx xxxx

So, I called back. The telephone number was the Downing Street switchboard and after Kirsty told me that the government was planning to apologize for Alan Turing's treatment she then said "Gordon would like to talk to you".

A few minutes later the phone rang and a soft Scottish voice said: "Hello John. It's Gordon Brown. I think you know why I am calling you". And then he went on to tell me why. He thanked me for starting the campaign, spoke about a "wrong that he been left unrighted too long", said he thought I was "brave" (not sure why) and spoke about the terrible consequences of homophobic laws and all the people affected by them.

I was mostly speechless. The Prime Minister was calling me!

What no one saw was the work to make this happen. And what many don't realize is that the 'campaign' consisted of a staff of one: me. Although many people enthusiastically got the word out via Twitter, blogs and other means, I spent a great deal of time massaging the press, handling celebrities, and keeping the momentum to make it happen. One day, perhaps, I'll tell the story.

Most of the planning was done from the top deck of a London double-decker bus on the way to work. Amazing what you can do with 30 minutes of peace and an iPhone.

But what I must do is thank all 30,000 people who signed the petition, the media who ran with the story (especially the Manchester Evening News, BBC Radio Manchester, The Independent and BBC Newsnight) when it was still a small story. Thank you to all in the LGBT press that interviewed me and got the ball rolling in the first place. And thank you to the big names like Richard Dawkins and Stephen Fry who got the story out to a wide audience.

And thank you Gordon Brown. Your telephone conversation with me was heartfelt, and your apology clear and unambiguous. What a wonderful outcome!

For me, it's the end of my campaign.

But for others it is not. It's vital that Bletchley Park and the National Museum of Computing secure funding to keep them alive.

Tuesday, August 25, 2009

How to trick Apple Numbers into colouring individual bars on a bar chart

Suppose you have a bar chart like this:



and it's made from a data table like this:



And you are really proud of the sales of the XP1000 and want to change the colour of its bar to red. In Apple Numbers you can't do that because the bar colour is based on the data series.

But you can fool Apple Numbers by creating two data series like this:



Then choose a Stacking Bar chart after selecting the two series of data in the data table and you'll get a chart like this:



You can change the colour of any of the series by clicking on the Fill button on the toolbar. And you can extend that beyond two series to colour the individual bars as needed.

Sunday, August 16, 2009

Geek Weekend, Day 2: The Brunel Museum

So after yesterday's trip to Bletchley Park I stayed in London and hopped over to a spot not far from Tower Bridge where Marc Brunel and his son Isambard built the first tunnel under a navigable river: the Thames Tunnel. The tunnel was dug out by hand using a tunnel shield (which is the basis of all tunnel building to the present day). Workers stood inside a metal cage pressed against the undug earth and removed boards, dug in a few inches and replaced the boards. Once the digging was done the entire structure was forced forwards a few centimeters and bricklayers would fill in behind.



The tunnel has a rich and varied history and is still in use today (read the Wikipedia link above to learn more). The entrance to the tunnel was through a massive circular tube (a caisson) which the Brunels built above ground and then sunk it into place. The entrance has been closed for about 140 years and is being renovated, but I was lucky enough to be taken into it by the curator of the Brunel Museum.

The museum displays works by the Brunels and runs tours through the tunnel itself. The grand entrance hall will be reopened to the public in September. Before that here's a shot of me standing in the interior of the entrance about 15 meters underground.


Image credit: Jonathan Histed

The diagonal line on the wall is the remains of where the grand staircase came down and brought visitors into the tunnel.

Saturday, August 15, 2009

Geek Weekend, Day 1: Bletchley Park

Left to my own devices to the weekend I decided to embark on a Geek Weekend with visits to two places within easy reach of London. Today I visited Bletchley Park which is simply wonderful for any geek out there.

Bletchley Park is where the cryptanalysts of the Second World War worked in great secrecy (including Alan Turing) to break the Nazi German Enigma and Lorenz ciphers. To break them they used a combination of intimate knowledge of language, mathematics and machines.

Here's a Nazi German Enigma machine:



And here's a look inside one of the rotors inside an Enigma machine to see the wiring:



Two of the code breaking machines have been reconstructed. One is the Turing Bombe, an electromechanical machine made to break the Enigma cipher. Here's a look at the wiring in the back of the Bombe:



The other machine is the Colossus, a binary computer built to decipher Lorenz. Enigma is far more famous than Lorenz, but I have a soft spot for the Lorenz code because of its close relationship to modern cryptography. Here's a Lorenz machine:



While I was there I signed a large stack of copies of my book, The Geek Atlas. If you are at Bletchley Park and pop into the shop you'll be able to buy a signed copy if that's your thing. Of course, Bletchley Park, Enigma, Lorenz and the National Museum of Computing (also on site) are covered.

50p from every copy of The Geek Atlas goes to Bletchley Park (if the book is bought in the UK) and so the folks at Bletchley treated me to a special geek moment: a chance to meet Tony Sale who worked at MI-5 and reconstructed the Lorenz breaking machine Colossus. He took me round the back of the machine, and past the No Admittance sign to see it in operation. A geek treat if ever there was one.

The Lorenz code is essentially binary. Letters were transmitted using the Baudot Code which is a five-bit code. To encrypt the Lorenz machine created a pseudo-random sequence of Baudot codes and then XORed them with the message to be transmitted. If both transmitting and receiving machines generated the same pseudo-random sequence then the nice property of XOR that if you perform the same operation twice you get back to where you started. Thus XORing once with the pseudo-random sequence gave you the ciphertext to be transmitted, XORing again gave you back the original message.

Breaking this binary code was achieved with a binary computer. After giving me a behind-the-scenes look at Colossus, Tony Sale stood for a portrait in front of the machine:



And behind the machine is where the valve-action is:



Standing and see the Turing Bombe, staring into Turing's office in Hut 8, being taken around the back of Colossus by the man who put it back together, and getting to see more Enigmas, Lorenzs and Typexs than anyone could ask for made it a real treat.

The National Museum of Computing is Britain's answer to the wonderful Computer History Museum in Mountain View, CA. It contains many machines from the mainframe through the 8-bit era of British computing. All the machines are working or being restored. If you've never seen core memory, massive hard disk packs the size of washing machines, or just Commodore PET it's worth visiting (and it's right next door to Colossus).

Lastly, it's worth knowing that the National Museum of Computing despite being part of the ticket price to Bletchley Park actually receives no money from them. Please consider either donating money directly to them (I gladly emptied my pockets of change) or buying something in their shop.

And tomorrow it's a step back into the 19th century with a special visit to a place important in the life of Isambard Kingdom Brunel.

Tuesday, August 11, 2009

Regular expression are hard, let's go shopping

After looking at a Tweet from Charles Arthur of The Guardian and I decided to hunt down his blog. I typed "Charles Arthur" into Google and the first link was to his blog.

But there was something strange about it. All the letter t's following an apostrophe were highlighted. Here's a screen shot:



Yet, if I typed the exact same URL into Firefox the highlighted t's were not there. Odd. Since the URL was there this had to be something inside the HTTP headers sent when I was clicking through from Google.

I fired up HTTPFox and watched the transaction. Here's a screen shot of the HTTP headers of the GET request for his page. The interesting thing to look at is the Referer header.



It immediately jumped out to me that one of the parameters was aq=t. Looked to me like something on his blog was reading that parameter and using it to highlight. Poking around I discovered that his site is written using WordPress and there's a plugin for WordPress (that hasn't been updated for years) that's intended to highlight search terms when the visitor comes from a search engine.

Looking into the source of his web page it looked from the CSS like he was using that plugin. So I downloaded the source of the plugin and took a look. There's a bug in the way in which it extracts the query parameters from the Referer header for Google.

Here's the code:

$query_terms = preg_replace('/^.*q=([^&]+)&?.*$/i','$1',
$referer);

That regular expression is buggy. It's looking for the right-most instance of a string that begins q= followed by anything other than the & symbol or the end of the Referer header. It's getting the right-most because the ^.* at the beginning means skip over anything from the start of the Referer header until you find q= and be greedy about it: skip over as much stuff as possible.

In the Referer string that are two parameters with q= in them. The first one is the correct one, the second one is the aq=. Since the regular expression isn't written to check that before the q= there's a ? or & it gets the wrong one.

I did a bunch of tests with wget to confirm that I'm right. It's a bug.

The aq=t parameter was added in 2006, here are the details. It's only present when you use the Firefox Google search box. Unfortunately, the plugin hasn't been updated since 2005.

It can be fixed by changing that line above to:

$query_terms = preg_replace('/^.*[\?&]q=([^&]+)&?.*$/i','$1',
$referer);

But the right thing to do here is to rewrite this so that it didn't use regular expressions at all. After all, PHP has parse_url and parse_str functions that can do all the URL and query string parsing for you.

Monday, August 10, 2009

In which I resurrect a 13 year old 3.5" floppy disk and reprint my doctoral thesis

This is a follow up to a post from the weekend about playing with my old Sharp MZ-80K. Someone commented that they'd be more impressed if I resurrected a 15 year old floppy disk than a 30 year cassette tape.

I don't have a 15 year old floppy disk to hand, but I do have this one that's 13 years old and according to the label contains a copy of my doctoral thesis. The disk was created in 1996 and the files on it date to 1994 for my doctoral thesis which I completed in 1992.



But would it still read?

The first step was finding a drive. I had an old-ish 3.5" USB disk drive kicking around, so I plugged it into my MacBook Air and fired up Windows XP under VMWare. It happily recognized the drive and the magically it loaded up the floppy disk:



The disk contains a single ZIP file called oxford.zip. Unzipping it and poking around in the directories reveals that it contains my thesis, all the papers I wrote as a doctoral student, my CV and helpful READ.ME files: a gift to my future self.



That's all well and good, but are any of these files usable? Can I take the LaTeX based source files and produce a copy of my thesis? Or can I take the DVI file that I had saved and make that into a PDF?

A quick copy over to the main Mac machine and a download of LaTeX later I had a working LaTeX system again and all the files.

So to get started I grabbed the DVI file of my thesis and ran it through dvipdf. Apart from complaining about various missing fonts it produced a totally readable PDF file and suddenly I was staring at my thesis. You can download the PDF by clicking on: The Formal Development of Secure Systems. Here's a sample page (the code at the bottom is written in Occam):



But it's not enough to stop at a DVI file, what I wanted was to compile from sources. My first test was to start with something small: my CV. Magically, that worked:



And so on to my thesis. I'm not going to show all that I went through, but it worked after I'd got things in the right directories and tracked down a couple of additional style files.


BTW Does anyone have a Research Machines 380Z with working 8" drives? I have a couple of my really old floppies that it would be fun to read.