Friday, November 20, 2009

Parsing a JSON document and applying it to an HTML template in Google Go

Here's some simple code to parse a JSON document and the transform it into an HTML document using the Google Go packages json and template.

If you've done anything in a scripting language then you'll probably be surprised by the generation of fixed struct types that have to match the parsed JSON document (or at least match some subset of it). Also because of the way reflection works in Google Go the struct member names need to be in uppercase (and for that reason I've used uppercase everywhere).

import (
"fmt";
"os";
"json";
"template"
)

type Row struct {
Column1 string;
Column2 string;
}

type Document struct {
Title string;
Rows []Row;
}

const a_document = `
{
"Title" : "This is the title",
"Rows" : [ { "Column1" : "A1", "Column2" : "B1" },
{ "Column1" : "A2", "Column2" : "B2" }
]
}`

const a_template = `
<html>
<head><title>{Title}</title></head>
<body>
<table>
{.repeated section Rows}
<tr><td>{Column1}</td><td>{Column2}</td></tr>
{.end}
</body>
</html>`

func main() {

// The following code reads the JSON document in
// a_document and turns it into the Document structure
// stored in d

var d Document;
ok, e := json.Unmarshal( a_document, &d );

if ok {

// This code parses the template in a_template places
// it in t then it applies the parsed JSON document in
// d to the template and prints it out

t, e := template.Parse( a_template, nil );
if e == nil {
t.Execute( d, os.Stdout );
} else {
fmt.Printf( e.String() );
}
} else {
fmt.Printf( e );
}
}

All the real work is done my main() by calling json.Unmarshal, template.Parse and then Execute.

Here's the Makefile and output:

$ cat Makefile
P := template
all: $P

$P: $P.6
6l -o [email protected] $^

%.6: %.go
6g $<
$ make
6g template.go
6l -o template template.6
$./template

<html>
<head><title>This is the title</title></head>
<body>
<table>
<tr><td>A1</td><td>B1</td></tr>
<tr><td>A2</td><td>B2</td></tr>
</body>
</html>

Thursday, November 19, 2009

Installing Google Go on Mac OS X

I decided to have a go with Google Go since I'm an old fogey C/C++ programmer. Any new innovation in the C/C++ family gets me excited and Google Go has quite a few nice features (garbage collection is really nice to have and channels make me think of all the work I did in CSP).

I decided to go with the 6g compiler since gccgo doesn't have garbage collection implemented yet and hence there's no way to free memory. The only way to get 6g is to mirror its Mercurial repository. So...

Step 1: Install Mercurial

For that I used prebuilt packages from here and got Mercurial 1.4 for Mac OS X 1.5 (no, I haven't upgraded to Snow Leopard yet).

Step 2. Set GOROOT

I just did a quick cd ; mkdir go ; export GOROOT=$HOME/go to get me started.

Step 3. Clone the 6g repository

That was a quick hg clone -r https://go.googlecode.com/hg/ $GOROOT followed by the hard part: compiling it. You need to have gcc, make, bison and ed installed (whcih I do since I do development work on my Mac).

Step 5. Set GOBIN

This points to where the binaries will go, for me that's $HOME/bin since I'll be doing local development using Go. And I updated PATH to include $GOBIN.

Step 4. Compile 6g

You first need to set GOARCH and GOOS. For me that's amd64 for the architecture (the Intel Core 2 Duo in my Macbook Air is a 64-bit processor) and darwin for the OS (since this is a Mac).

$ export GOARCH=amd64
$ export GOOS=darwin

Then you can actually do the compile:

$ cd $GOROOT/src
$ ./all.bash

This does a build and test of 6g and it was very fast to build (although I'm used to building gcc which is a bit of a monster).

Step 5. Write a Hello, World! program

Here's my first little Google Go program (filename: hw.go) just to test the 6g compiler.

package main

import "fmt"

func main() {
fmt.Printf( "Hello, World\n" );
}

To simplify building I made a minimal Makefile:

all: hw
hw: hw.6 ; 6l -o [email protected] $^
%.6: %.go ; 6g $<

And then the magic moment:

$ make
6g hw.go
6l -o hw hw.6
$ ./hw
Hello, World!

And now for a real project... get SQLite to interface to it.

Thursday, November 12, 2009

Geek Weekend (Paris Edition), Day 4: Institut Pasteur

Leaving my SO in bed at the hotel with a nasty bacterial infection and some antibiotics, I went with timely irony to visit the home and laboratory of Louis Pasteur at the Institut Pasteur. (It's pretty easy to find since it has a conveniently named stop on the Paris metro: Pasteur).


At the Institut Pasteur there's a wonderful museum that covers the life and work of Louis Pasteur (and his wife). It's housed in the building (above) where the Pasteurs lived. There's a single room of Pasteur's science and the rest of the house is Pasteur's home; so a visit is partly scienfitic and partly like visiting any old home. I was mostly interested in the laboratory (although seeing how he lived---pretty darn well!---was also worth it).

Pasteur wrote standing up at a raised table (much like old bank clerks used to use) and his lab is full of specimens that he worked on. There's a nice display about chirality which Pasteur had initially worked on while study tartaric acid in wine. (Pasteur determined that there were two forms of tartaric acid by painstakingly sorting tiny crystals by hand).

The rest of the lab covers immunization, pasteurization and the germ theory of disease. There was a nice display of Pasteur's bottles of chicken broth that he used to demonstrate the germ theory of disease. The bottles contain boiled broth and have a long tapering curved neck. Although the neck is open the shape prevents dust from entering and the broth sits undisturbed (as it has for 150 years).

In the same room there's also a big bottle of horse's blood that looks fresh despite its age, and there are detailed displays about immunization (and especially Pasteur's rabies vaccine).

The museum also has a lot of equipment used by Pasteur, such as vacuum pumps and autoclaves. It all has that lovely Victorian feel of wrought iron and brass.

The oddest part of the museum is the Pasteurs' burial chamber built beneath the house and in a totally over the top Byzantine style.

Note that the museum is only open in the afternoons during the week and that you must bring photo ID with you to get in since it is inside the Institut Pasteur.

Tuesday, November 10, 2009

Geek Weekend (Paris Edition), Day 3: The Arago Medallions

The old Paris Meridian (which was in use up until 1914) passes not far from The Pantheon which I visited to see Foucault's Pendulum. It's actual longitude today is 2°20′14.025″.

To mark the old meridian the French decided to install some art work and they commissioned an artist called Jan Dibbets to build something appropriate. What he did was embed brass disks in the streets of Paris marking the meridian and turning the whole city into a sort of treasure hunt.

These Arago medallions (which celebrate the meridian and the life of François Arago) cut through the very heart of Paris. They make a wonderful way to see Paris at going on a treasure hunt. And the meridian goes to the very heart of something important: the meter. The original definition of a meter was based on the length of the Paris meridian from the north pole to the equator. Arago surveyed the meridian and came up with a very precise definition for this fundamental unit of measure.

Here's a photo I took of one on Boulevard Saint-Germain:


There's a full list of the medallions (in French) here. And here's my English translation of the list (the numbers in parentheses give the number of medallions to be found there):

Position of the medallions along the meridian from north to south

  • XVIIIe arrondissement


    • 18 av. de la Porte de Montmartre, in front of the municipal library (1)

    • Intersection of rue René Binet and av. de la Porte de Montmartre (1)

    • 45/47 av. Junot (1)

    • 15 rue S. Dereure (1)

    • 3 and 10 av. Junot (2)

    • Mire du Nord, 1 av. Junot, in a private courtyard with controlled access (1)

    • 79 rue Lepic (1)


  • IXe arrondissement


    • 21 boulevard de Clichy, on the pavement (2)

    • 5 rue Duperré (1)

    • 69/71 rue Pigalle (2)

    • 34 rue de Châteaudun, inside the courtyard of the Ministry for National Education (2)

    • 34 rue de Châteuadun (1)

    • 18/16 and 9/11 boulevard Haussmann, in front of the restaurant (2)

    • Intersection of rue Taitbout, in front of the restaurant and 24 boulevard des Italiens (2)


  • IIe arrondissement


    • 16 rue du 4 septembre (1)

    • 15 rue saint Augustin


  • Ie arrondissement


    • 24 rue de Richelieu (1)

    • 9 rue de Montpensier (1)

    • At the Palais Royal: Montpensier and Chartres Colonnades, Nemours Gallery, passageway on place Colette and place Colette in front of the café (7)

    • Intersection of place Colette and Conseil d'État, rue saint Honoré (1)

    • place du Palais royal, on the rue de Rivoli side (1)

    • rue de Rivoli, at the entrance of the passageway (1)

    • At the Louvre, Richelieu Wing: French sculpture room and in front of the escalator (3)

    • At the Louvre, Napoléon Courtyard, behind the pyramid (5)

    • At the Louvre, Denon Wing: Roman antiquity room, stairs and corridor (3)

    • Quai du Louvre, near the entrance to the Daru pavillion (1)

    • port du Louvre, not far from the Pont des Arts (1)


  • VIe arrondissement


    • port des Saints-Pères (1)

    • quai Conti, near the place de l'Institut (2)

    • place de l'institut and rue de Seine (1)

    • 3 and 12 rue de Seine (4)

    • Intersection of rue de Seine and rue des Beaux-Arts (1)

    • 152 and 125-127 boulevard Saint-Germain (2)

    • 28 rue de Vaugirard, on the Sénat side (1)

    • In the Jardin de Luxembourg, on asphalt and cement surfaces (10)

    • rue Auguste Comte, at the entrance to the garden(1)

    • av. de l'Observatoire on the pavement near the garden (2)

    • Intersection of av. de l'Observatoire and rue Michelet (1)

    • jardin Marco Polo (3)

    • Intersection of av. de l'Observatoire and rue d'Assas (1)

    • place Camille Jullian (2)

    • On the ground at the intersection of av. Denfert Rochereau and av. de l'Observatoire, on the Observatoire side (1)

    • av. de l'Observatoire (2)


  • XIVe arrondissement


    • Courtyard of the Observatoire de Paris (2)

    • Inside the Observatoire (1)

    • Terrace and garden in the private area of the Observatoire (7)

    • boulevard Arago and place de l'Ile de Sein (6)

    • 81 rue du faubourg Saint Jacques (1)

    • place Saint Jacques (1)

    • parc Montsouris (9)

    • boulevard Jourdan (2)

    • Cité universitaire, on the axis from the pavillon Canadien to the pavillon Cambodgien, the final one is behind the pavillion (10)



This special Google Map has many of them on it, the rest you'll have find by wandering:

View Paris Meridian in a larger map

Monday, November 09, 2009

Parsing HTML in Python with BeautifulSoup

I got into a spat with Eric Raymond the other day about some code he's written called ForgePlucker. I took a look at the source code and posted saying it looks like a total hack job by a poor programmer.

Raymond replied by posting a blog entry in which he called me a poor fool and snotty kid.

So far so good. However, he hadn't actually fixed the problems I was talking about (and which I still think are the work of a poor programmer). This morning I checked and he's removed two offending lines that I was talking about and done some code rearrangement. The function that had caught my eye initially was one to parse data from an HTML table which he does with this code:

def walk_table(text):
"Parse out the rows of an HTML table."
rows = []
while True:
oldtext = text
# First, strip out all attributes for easier parsing
text = re.sub('<TR[^>]+>', '<TR>', text, re.I)
text = re.sub('<TD[^>]+>', '<TD>', text, re.I)
# Case-smash all the relevant HTML tags, we won't be keeping them.
text = text.replace("</table>", "</TABLE>")
text = text.replace("<td>", "<TD>").replace("</td>", "</TD>")
text = text.replace("<tr>", "<TR>").replace("</tr>", "</TR>")
text = text.replace("<br>", "<BR>")
# Yes, Berlios generated \r<BR> sequences with no \n
text = text.replace("\r<BR>", "\r\n")
# And Berlios generated doubled </TD>s
# (This sort of thing is why a structural parse will fail)
text = text.replace("</TD></TD>", "</TD>")
# Now that the HTML table structure is canonicalized, parse it.
if text == oldtext:
break
end = text.find("</TABLE>")
if end > -1:
text = text[:end]
while True:
m = re.search(r"<TR>\w*", text)
if not m:
break
start_row = m.end(0)
end_row = start_row + text[start_row:].find("</TR>")
rowtxt = text[start_row:end_row]
rowtxt = rowtxt.strip()
if rowtxt:
rowtxt = rowtxt[4:-5]# Strip off <TD> and </TD>
rows.append(re.split(r"</TD>\s*<TD>", rowtxt))
text = text[end_row+5:]
return rows

The problem with writing code like that is maintenance. It's got all sorts of little assumptions and special cases. Notice how it can't cope with a mixed case <TD> tag? Or how there's a special case for handling a doubled </TD>?

A much better approach is to use an HTML parser than knows all about the foibles of real HTML in the real world (Raymond's main argument in his blog posting is that you can't rely on the HTML structure to give you semantic information---I actually agree with that, but don't agree that throwing the baby out with the bath water is the right approach). If you use such an HTML parser you eliminate all the hassles you had maintaining regular expressions for all sorts of weird HTML situations, dealing with case, dealing with HTML attributes.

Here's the equivalent function written using the BeautifulSoup parser:

def walk_table2(text):
"Parse out the rows of an HTML table."
soup = BeautifulSoup(text)
return [ [ col.renderContents() for col in row.findAll('td') ]
for row in soup.find('table').findAll('tr') ]

In Raymond's code above he includes a little jab at this style saying:

# And Berlios generated doubled </TD>s
# (This sort of thing is why a structural parse will fail)
text = text.replace("</TD></TD>", "</TD>")

But that doesn't actually stand up to scrutiny. Try it and see. BeautifulSoup handles the extra </TD> without any special cases.

Bottom line: parsing HTML is hard, don't make it harder on yourself by deciding to do it yourself.

Disclaimer: I am not an experienced Python programmer, there could be a nicer way to write my walk_table2 function above, although I think it's pretty clear what it's doing.

Friday, November 06, 2009

Geek Weekend (Paris Edition), Day 2: Foucault's Pendulum

Not very far from The Curie Museum is the former church and now burial place for the great and good men (and one woman) of France: The Pantheon. Inside the Pantheon is the original Foucault's Pendulum.

The pendulum was first mounted in the Pantheon in 1851 to demonstrate that the Earth is rotating. The pendulum swings back and forth in the same plane, but the Earth moves. Relative to the floor (and to the convenient hour scale provided) the pendulum appears to rotate.


The pendulum is on a 67m long cable hanging from the roof of the Pantheon. The bob at the end of the cable weight 27kg. In the Pantheon the pendulum appears to rotate at 11 degrees per hour (which means it takes more than a day to return to its original position). If it were mounted at the North Pole it would 'rotate' once every 24 hours, the pendulum's period of rotation depends on the latitude diminishing to 0 degrees per hour at the equator (i.e. it doesn't 'rotate' at all).


If you take a look at the photograph above you can see that I was there just after 1200. The scale shows the current time measured by the pendulum.

The actual movement of the pendulum is only hard to understand because the common sense assumption is that the floor is not moving, but of course it is. It appears that what we are observing is a pendulum swinging above a fixed floor.

But the floor is actually moving because of the rotation of the Earth. That makes understanding the pendulum's motion harder. The important factor is the Coriolis Effect (sometimes erroneously called the Coriolis Force).

The simplest way to visualize the Coriolis Effect is to imagine firing a gun at the Equator straight northwards along a meridian. Because the Earth rotates the bullet will not land on the meridian, the Earth will have moved and the bullet will land to the west of the meridian. It looks as though a force has acted on the bullet to push it sideways. Of course, there's no actual force, it's just that the frame of reference (i.e. where the observer is) is not stationary.

Essentially the same thing happens with Foucault's Pendulum. The observer and the floor are not stationary and so the pendulum has an apparent motion.

Security Now #221

I was a guest on Security Now this week and the podcast has now been released (as has a transcript). Steve Gibson and some other people asked me to provide the presentation in some relatively readable format.

The original presentation is here, but it, ironically, requires JavaScript and Adobe Flash. So here are two additional formats: old style Microsoft PowerPoint and PDF.

Tuesday, November 03, 2009

Geek Weekend (Paris Edition), Day 1: The Curie Museum

So, it was off to Paris for the weekend via Eurotunnel and I managed to fit in four places from The Geek Atlas in four days. I was staying in a hotel in the Latin Quarter which is a stone's throw from... The Curie Museum.

Here's Marie Curie's laboratory:


The museum covers the lives and works of two Nobel Prize-winning couples: Pierre and Marie Curie (they discovered Radium and Polonium) and their daughter Irene and her husband Frederic Joliot (they discovered artificial radioactivity: you could make a substance radioactive by bombarding it with alpha particles).

Their Nobel Prizes are on display as is the equipment that they used (including the apparatus for measuring radiation by measuring ionization of air---which itself had been discovered by Becquerel).

Here are the Nobel Prizes:


Although I love the science section of the museum (including the laboratory where they worked with a piece of paper from one of their notebooks with its radioactive thumb print---they weren't too careful about handling radioactive elements), the best bit is the section on the craze for radium products in the 1920s and 1930s.

Here's an ad for a beauty cream that contains radium and thorium. Gives you that special glow!


Here you'll find make up that contains thorium and radium, special radium wool to keep babies warm, a radium dispenser so you could have a radioactive soak in the bath and more...


Seems stupid now, but back then the dangers were either ignored or unknown, and radioactivity seemed like a wondrous thing (especially since it was discovered early on that it would kill or reduce tumors). I wonder what products we are feeding ourselves that in 70 years we'll consider down right dangerous.

There's a nice web site of radioactive quack cures which make my skin crawl. Yes, I'm going to take a radioactive suppository to boost my sex life tonight! Move over Viagra, here's Vita Radium.