Friday, January 29, 2010

New version of CRUTEM3 and HADCRUT3

There's a new version of the Met Office land surface temperature record out with lots and lots more stations.

Plus it includes corrections for all the problems I found with the data (they didn't make good on their promise to acknowledge me, sadly).

But my handiwork is shown by the points in green:

My two corrections: A and B.

I'll run these through my own programs to see what they produce.

Thursday, January 28, 2010

John's Amazing Diet Secrets Revealed!

Now, at last, I can reveal the top diet secrets that doctors have been keeping from you! Yes, this is how I lost an AMAZING 9.9kg (21.8 pounds) in just 6 months doing absolutely no exercise at all.

Put down those weights, step off the StairMaster and follows these amazing simple steps to a better figure:


In April 2008 I decided that my 82.5kg (181.9 pounds) was too much for my height. Ideally I should have been 72kg (158.7 pounds), so I needed to lose 10.5kg (23.1 pounds) to be my ideal weight.

It's this image of me that really decided me to lose weight. Nothing like the shape of that stomach in public.

So, I read up on dieting and digested The Hacker's Diet. I ignored almost all the advice except for one thing: you can stuff your face with calories far faster than you can burn it off. That revelation lead me to the Colarie.

A single can of Coke is about 147 calories. How much exercise does it take to burn that off? Well just running it would take something like 15 minutes. And I thought why run? Just don't stick the stuff in your face in the first place.

So, I eliminated all soda and drank cold water instead. I also stopped ever eating sweets (candies). But that wasn't enough, I decided that I needed to eat less.

So, I simply looked at whatever food I was eating and ate about 50% of it and left the rest. I didn't order desserts, I stopped having sugar in coffee.

And, boy, was I hungry at first. After a while I wasn't hungry any more and I think I enjoyed the taste of food more. After a while the amount of food I was eating was simply less than before and I was no longer forcing myself to stop.

That was helped by eating slowly. By doing that I discovered when I was full and stopped eating. I didn't stop eating particular foods (other than the particular nasties like Coke, desserts and sugary snacks), I ate a wide variety of things (including foie gras and other delicacies). I just ate less of it.

The weight fell off me. Here's the magic chart:

I never quite made it to my goal, my weight stabilized at around 72.5kg (158.7 pounds) and remained there. Now whenever it goes up a bit I know what to do.

(BTW I'm not saying don't exercise, there are lots of good reasons to do exercise, but I don't think weight loss is one of them. Do it because you enjoy it, do it because it clears your head, etc.)


Weighing yourself is vital. I made it a ritual so that I got consistent results. Your weight will vary throughout the day so I had a simple technique:

1. Same time, same day each week: I weighed myself on Saturday mornings, immediately after getting up (after visiting the bathroom) and before eating.

2. Same situation: I always weighed myself naked so that there were no inconsistencies.

3. Once: I weighed myself once on that Saturday morning and recorded the result. I didn't fret about the result. I didn't reweigh myself.


I kept all the weight data in a spreadsheet. From that I could draw an encouraging chart, and calculate how many more weeks were needed for me to reach my goal. And I could calculate my BMI (which seems like a bogus figure to me).

That's it: eat less, eat well, keep track.

PS I did do some exercise. During this period I didn't own a car and walked everywhere or took public transport. I was working at home and frequently would walk outside to get some sun and clear my mind.

Tuesday, January 26, 2010

£1,000 for Bletchley Park thanks to The Geek Atlas

When The Geek Atlas was published in June 2009, O'Reilly's UK arm decided to pledge to donate 50p per copy sold in the UK to help fund Bletchley Park.

O'Reilly has now made good on that pledge and with almost 2,000 copies of the book sold in the UK it has donated £1,000 to Bletchley Park.

And the 50p per copy pledge continues. All copies of The Geek Atlas sold in the UK result in a 50p donation to keep this wonderful place alive.

The squawking Squaw King was stabbed in a stab bed

Yesterday I tweeted: Realized that 'assisted' is 'ass is ted'. Are there other non-compound words in English which consist entirely of other words? and people replied with is land and cut lass.

Naturally, I couldn't resist writing a small amount of code to figure out other word sequences within words. Using a short program and a 57,000 word English dictionary of common words I had the answer: 12,870 words. That means 23% of English words have this property.

Of course, many are rather boring because they are just compound words. But others are more fun:

I have secretions of secret ions and a seepage (see page 21), but sematically my sematic ally says I am fatalistic and fatal is tic bite. But the fellow fell, ow! And asked, do we seal ants with sealants? I went to the palace to see my pal (ace) and said "Serge! Ants". He called for "Sergeants!". But with an antelope the ant elope.

I smelt an aroma: the rapist! Yet, it was just the aromatherapist.

You can get the full list here.

Update Here's the code

# ----------------------------------------------------------------------------
# Small program to find words that consist entirely of other words
# concatenated. An example is 'fatalistic' which is 'fatal is tic'
# Written by John Graham-Cumming
# ----------------------------------------------------------------------------

use strict;
use warnings;

# The first argument to the program is the filename of a dictionary of
# words, this dictionary will be searched for words consisting of word
# sequences. It should be simply one word per line.
# It is loaded into the %words hash.

my $dict = $ARGV[0];
my %words;

if ( open F, "<$dict" ) {
while (<F>) {
$words{$_} = 1;
close F;
} else {
die "Cannot open dictionary file $dict\n";

# Check every word in the dictionary using the recursive function
# check_word. Note that I don't sort the words here since that might
# take a long time. Sorting can be done on the output.

foreach my $w (keys %words) {
my $sub = check_word($w);

if ( $sub ne '' ) {
print "$w ($sub)\n";

# check_word extracts ever longer subsequences of the word to be
# checked and sees if they are themselves words (by checking in
# %words). If a word is found then the remainder of the word is sent
# to a recursive call to check_word.
# For example, suppose we do check_word( fatalistic ), the code will
# check the following:
# check_word: fatalistic; found so far:
# f?
# fa?
# fat?
# check_word: alistic; found so far: fat
# a?
# check_word: listic; found so far: fat a
# l?
# li?
# lis?
# list?
# check_word: ic; found so far: fat a list
# i?
# listi?
# al?
# ali?
# alis?
# alist?
# alisti?
# fata?
# fatal?
# check_word: istic; found so far: fatal
# i?
# is?
# check_word: tic; found so far: fatal is
# This function returns an empty string if the word does not consists
# of other words, or a string containing the word broken down into
# space separated words
# e.g. check_word('fatalistic') returns ' fatal is tic'
# check_word('potato') returns ''

sub check_word
my ( $w, # The word to check
$depth ) = @_; # Contains the words found so far, or
# undefined when first called

if ( !defined( $depth ) ) {
$depth = '';
} else {
if ( defined( $words{$w} ) ) {
return "$depth $w";

for my $i (1..length($w)-1) {
my $fragment = substr($w,0,$i);
if ( defined( $words{$fragment} ) ) {
my $sub = check_word(substr($w,$i), "$depth $fragment");
if ( $sub ne '' ) {
return $sub;

return '';

Saturday, January 23, 2010

Price drop on GNU Make Unleashed

I've dropped the price on GNU Make Unleashed to €15.00 (for the printed book) and €10.00 (for the PDF).

And I'm working on a version for the Kindle.

A not very illuminating reply from the Met Office

On the 15th I posted about six additional stations that can be used with the Met Office land surface temperature record. The Met Office kindly replied to my query about the six saying that they could be used despite the missing standard deviations.

I followed up with this query:

Thank you. I'll post on note on my blog with your reply.
Was there a particular rationale for using 16 instead of 15?

They have now replied. Unfortunately, the reply doesn't really shed any light on the situation because they don't say why 16 vs. 15 just that they were calculated separately:

Thank you for your email.

The normals and standard deviations were calculated separately and the limits (15 years and 16 years) were set independent of one another.

I wonder why? Brohan et al. 2006 clearly says 15 is the limit:

(the requirement now is simply to have at least 15 years of data in this period)

I just might be possible that the Met Office isn't telling my why because the why could be a bug. Since there are two programs this could be one of those classic off by one errors that crop up in programming all the time.

It's not hard to imagine the normals program doing

if ( number_of_years >= 15 )

and the standard deviation program doing

if ( number_of_years > 15 )

Equally that's almost groundless speculation on my part and perhaps there's some other good reason that the Met Office decided not, or hasn't taken the time, to tell me about.

Either way the six stations can be used.

Friday, January 22, 2010

Update list of my GNU Make articles

A reader pointed out that the GNU Make article list on by writing page is full of broken links because CM Crossroads has reorganized their site without providing backwards compatibility.

So you are faced with a choice: you could buy a copy of my book The GNU Make Book which contains all the articles (and more!), or you could use the following list (which I've newly updated):

May 2008: Usman's Law
March 2008: GNU Make user-defined functions, part 2
February 2008: GNU Make user-defined functions, part 1
December 2007: GNU Make path handling
October 2007: GMSL 1.09: A look inside the tweaks and updates
September 2007: Makefile Debugging: An introduction to remake
July 2007: GNU Make escaping: a walk on the wild side
June 2007: Painless non-recursive Make
May 2007: Atomic Rules in GNU Make
April 2007: GNU Make meets file names with spaces in them
February 2007: GNU Make's $(shell)/environment gotcha
December 2006: Makefile Optimization $(eval) and macro caching
November 2006: The pitfalls and benefits of GNU Make parallelization
October 2006: Tips and tricks from the automatic dependency generation masters
September 2006: Sorting and Searching
August 2006: Target-specific and Pattern-specific GNU Make macros
July 2006: Making directories in GNU Make
June 2006: Rebuilding when a file's checksum changes
May 2006: What's new in GNU Make 3.81
April 2006: Tracing rule execution in GNU Make
March 2006: Rebuilding when CPPFLAGS changes
February 2006: Dynamic Breakpoints in the GNU Make Debugger
December 2005: Adding set operations to GNU Make
November 2005: What's New in GMSL 1.0.2
October 2005: An Interactive GNU Make Debugger
August 2005: Makefile Assertions
July 2005: The Trouble with $(wildcard)
June 2005: GNU Make Gotcha ifndef and ?=
March 2005: The GNU Make Standard Library
February 2005: Learning GNU Make Functions with Arithmetic
January 2005: Self-documenting Makefiles
December 2004: Learning Make with the Towers of Hanoi
November 2004: Makefile Optimization $(shell) and := go together
October 2004: Makefile Debugging: Tracing Macro Values
September 2004: Setting a Makefile variable from outside the Makefile
August 2004: The Trouble with Hidden Targets
July 2004: Dumping every Makefile variable
June 2004: Printing the value of a Makefile variable

I'll update the list on my web site later.

Tuesday, January 19, 2010

Stay classy, SoftwareFX, stay classy

Seriously, the Haitian earthquake is not a lead generation exercise.

Friday, January 15, 2010

Met Office has confirmed that it's ok to use the extra six stations

I wrote the other day about files that have normals but no standard deviations in the Met Office land surface temperature record. In parallel, I asked the Met Office to double check my working:

It appears that there are a small number of stations in the file that have normals but no standard deviations. Looking at just those whose 'Normals source' is 'Data' it appears there are six where it should be possible to calculate the standard deviations (since the normals are being calculated from the 1961 to 1990 data): 104880, 606700, 680140, 829830, 835790, and 961090. In each case it would appear that there's enough data in the 1961 to 1990 period to meet the '15 years of data' criteria in Brohan et al. 2006. And given that there are normals generated from 'Data', it would appear that you could get the standard deviations.

Is there any reason why I can't calculate them and safely include these six files? Both your code and my code currently discard these stations because the standard deviations are missing.

They've been kind enough to reply:

When the station standard deviations were calculated, a slightly different criterion was used to judge what was a sufficient number of years: 16 years was used instead of 15 years. There is little harm in relaxing that criterion as it appears only to affect these stations.

I've followed up asking if there was a rationale for 15 for normals, but 16 for standard deviations. But looks like those stations can be safely included.

Update The Met Office has replied to the 16 vs. 15 question.

Thursday, January 14, 2010

Mendelian Randomization: getting genes to run randomized trials for you

One of the core elements of my day job is dealing with causal relations: we try to understand what cause caused an effect. An area where much work has been done in understanding causal relationships is medicine where randomized controlled trials are used to understand the relationship between taking a medicine and some outcome.

But some things are hard to perform a trial on. It's all very well if you have a medicine to try out, but what if you want to know if, for example, having low serum cholesterol is associated with an increased risk of cancer?

That's not an idle question, the Framingham Heart Study, was thought to have shown a relationship between serum cholesterol and cancer (e.g. The serum cholesterol-cancer relationship: an analysis of time trends in the Framingham Study). But the question is: given that there appears to be a relationship, is it causal? Does low serum cholesterol cause cancer?

It could be that it's the other way around (called reverse causation: Low Cholesterol May Be Marker of Undiagnosed Cancer): if you are likely to get cancer you are likely to have low serum cholesterol. Or it could be that there's a confounding factor: something causes both low serum cholesterol and cancer.

It turns out that genetics, and specifically the fact that genes are randomly assigned during meiosis (in humans, for example, half the genes come from the mother and half from the father). Gregor Mendel's law of independent assortment says that the alleles of genes are chosen randomly from the possible alleles when a baby is being formed from the genetic material of mother and father.

This Mendelian Randomization means that it's possible to have nature perform a randomized trial for you. If you can find an allele that affects the trait you are trying to understand you can use it to sample the population to look for a cause and effect relationship.

In the case of low serum cholesterol there's a specific allele associated with Apolipoprotein E. The variant Apo E2 is associated with low serum cholesterol. And because of Mendel's law of independent assortment it will be assigned randomly in the population.

In 1986 Martijn B. Katan published a letter in The Lancet pointing out that Apo E2 causes a rare disease where patients have almost zero serum cholesterol.

Since Apo E2 is randomly assigned by Mendel's laws it's enough to look at the population and examine cancer rates and their relationship to the presence of the Apo E2 gene. So a 'trial' can be run by selecting a control group from the population and examining the rate of Apo E2 in that control. Then a group with cancer is tested for Apo E2.

If there's really a connection between low serum cholesterol and cancer then the cancer group should have a higher prevalence of Apo E2 than the control. You can think of the presence of Apo E2 being random across the population, if it's less than random in the cancer group (i.e. there's more or less than expected) then a causal relationship can be inferred. One way to see that is to look at a causal diagram of the relationships.

The arrows in the diagram represent causal relationships.

1. There's an arrow from Apo E2 to serum cholesterol because it is known that this allele causes low serum cholesterol.

2. The hypothesis is expressed in the arrow from low serum cholesterol to cancer. It's that arrow that's being determined.

3. There are other factors (age, diet, location, illnesses) which could affect both serum cholesterol and cancer.

4. There's no arrow leading to Apo E2 because it is completely determined by Mendel's laws. There's also no arrow from Apo E2 directly to the other factors because they are not affected by Apo E2.

5. There's no arrow directly from Apo E2 to cancer because there's no known direct relationship between the two.

(Note that these assumptions have to be justified. For example, #1 needs biological justification, as does #5).

With those relationships in place it's just a matter of performing the statistical test on the control group and cancer group to see if more Apo E2 is present in the cancer group (there's more on that in Mendelian randomization as an instrumental variable approach to causal inference).

This technique has been used to show a causal relationship between alcohol intake and blood pressure (see Alcohol Intake and Blood Pressure: A Systematic Review Implementing a Mendelian Randomization Approach) and to show no causal relationship between a mother's BMI and the fatness of her offspring (see Exploring the Developmental Overnutrition Hypothesis Using Parental–Offspring Associations and FTO as an Instrumental Variable).

And what of low serum cholesterol and cancer? A study (Apolipoprotein E Genotype, Plasma Cholesterol, and Cancer: A Mendelian Randomization Study) from 2009 concludes: "These findings suggest that low cholesterol levels are not causally related to increased cancer risk."

Thanks, Mendel!

Wednesday, January 13, 2010

Utter crap reporting from The Daily Telegraph

Here's The Daily Telegraph on the Avatar depression story. The reporting is crapola.

On one website, Avatar Forums, the topic "Ways to cope with the depression of the dream of Pandora being intangible" has more than 1,000 posts.

Ivar Hill, a 17-year-old fan from Sweden, wrote on a similar site: “When I woke up this morning after watching Avatar for the first time yesterday, the world seemed grey. It was like my whole life, everything I’ve done and worked for, lost its meaning … It just seems so meaningless. I still don’t really see any reason to keep doing things at all. I live in a dying world.”

As reported before there are well less than 1,000 posts in that thread. Not sure why the subs changed 'gray' to 'grey'. That's a copy/paste quote from someone. Surely a (sic) would be more appropriate.

Also, when they say "another site", they actually mean "another thread".

But the real mistake is here:

Stacy Kaiser, a psychotherapist, said obsession with the film was masking more serious problems in the fans' lives. "They’re seeing Avatar, they're lonely people, a lot of them don’t have a lot going on in their lives right now," she said. "The movie opened up a portal for them to express their depression.”

Dear Heidi Blake (author of the Telegraph story), that quote is from Jo Piazza from (she used to be a gossip writer for the New York Daily News) who wrote the original story that got this ball rolling. When you were watching the report on CNN you managed to write down the wrong name. It was the GOSSIP JOURNALIST and not the psychotherapist who said that.

So basically this entire Daily Telegraph story is a rewrite of a story followed by a misquote from a CNN TV story. Wow! Actually involved zero reporting.

Monday, January 11, 2010 jumps the shark by writing a story about a forum containing 129 people

So has a stunningly ridiculous story called Audiences experience 'Avatar' blues. Now, I'm not saying being depressed is ridiculous, that's a very serious issue, but the story is rubbish.

Here's the critical paragraph:

On the fan forum site "Avatar Forums," a topic thread entitled "Ways to cope with the depression of the dream of Pandora being intangible," has received more than 1,000 posts from people experiencing depression and fans trying to help them cope. The topic became so popular last month that forum administrator [...] had to create a second thread so people could continue to post their confused feelings about the movie.

So, dig into that forum and you'll find three interesting things:

1. There are actually only 576 messages (not more than 1,000). Hey, what's an error of 40% between friends? I realize that 576 doesn't sound as impressive as 1,000.

2. 129 different people (or at least registered users) have posted to that thread. Hmm. I realize that 129 people in some forum discussing a topic doesn't sound as impressive as 1,000 posts.

3. Of the 129, 60% of the posts have been made by 16 people. In fact, 50% are the work of 10 people. Also 72 of the people (from 129) posted once. I realize that 16 people actively discussing a topic doesn't sound as good as 1,000 posts.

So a small number of people discussing feeling depressed after seeing Avatar is enough for a front-page story!?

Now, how many of the 129 people (or the 16 if you prefer) were already feeling depressed before they saw the movie?

UPDATE: I've updated the figures (see comments).

UPDATE: The story made it to CNN on TV. Most interesting part is where the story's author says that she talked to the people in the thread and they were "lonely to begin with" and they are "lonely people". They "don't have a lot going on in their lives right now" and that the movie "didn't create depression".

Six additional stations for the Met Office land surface climate station record

Running through some final diagnostics on the Met Office land surface climate station records I revisited a minor issue that had been in the back of my mind: some of the stations have 'normals' but 'no standard deviations'.

There are 39 stations that fall into that category. Of those 30 have normals from WMO data, 3 are 'extrapolated' and 6 come from the data itself ('Data'). The final six are interesting because if the normals are from the 1961-1990 data present in the station file itself it should be possible to calculate the standard deviations as well.

The six files that are affected are: 10/104880, 60/606700, 68/680140, 82/829830, 83/835790, 96/961090.

Since these files have no standard deviations they are not being used by my program, or the program released by the Met Office. However, it is possible to calculate the standard deviations for all those locations because there is 15 years of data for each month between 1961 and 1990 (that's the criteria in Brohan et al. 2006.

So here for your use are the standard deviations for those files (one decimal place of accuracy):

10/104880: 3.6 3.3 2.4 1.1 1.4 1.2 1.7 1.1 1.5 1.3 1.5 1.8
60/606700: 1.8 1.9 1.7 1.4 1.2 0.9 1.0 0.7 0.7 1.3 1.6 1.7
68/680140: 1.5 1.3 0.9 1.0 1.0 1.1 0.7 1.0 1.0 0.8 1.0 1.0
82/829830: 1.0 1.4 0.8 0.9 1.0 1.0 0.6 0.7 0.7 1.2 0.9 1.4
83/835790: 1.5 1.5 1.1 1.8 1.2 1.2 2.2 1.1 1.7 1.2 1.1 0.7
96/961090: 0.7 0.6 0.6 0.6 0.6 0.6 0.5 0.5 0.4 0.5 0.6 0.5

Update The Met Office says it's OK to use these stations.

Sunday, January 10, 2010

Blog Greatest Hits 2009

At the start of May 2009 I added Google Analytics to and so it's easy to find out the most popular stories on my site for 2009. Here are the top 11 by page views:

  1. Just give me a simple CPU and a few I/O ports

  2. Letter to Her Majesty The Queen: Strange that this one should be so high

  3. Tonight, I'm going to write myself an Aston Martin: An entry from 2008 that's a perennial favourite

  4. Parsing HTML in Python with BeautifulSoup

  5. Double-checking Dawkins: An entry from 2007 that gets constant traffic

  6. Data Visualization Disease

  7. Hello John, it's Gordon Brown: The end of the Alan Turing petition campaign

  8. How I got 50,000 page views by simply being me

  9. What is jsHub?

  10. If you build it, they will ignore it: How I marketed The Geek Atlas

  11. In which I switch on a 30 year old computer and it just works

Saturday, January 09, 2010

Simplifying my OLPC XO-1 Temperature Sensor

Back in 2008 I wrote about a little circuit to turn the OLPC XO-1 Measure application into a digital thermometer. That circuit required two 9 volt batteries, 11 components and a PCB (plus connectors)

A few weeks ago I got asked about making a commercial version of the probe which naturally led to thinking about how to simplify the circuit. I've now got the entire circuit down to a single component that costs 50p in bulk. I've eliminated all the rest (except the connectors) and the circuit is entirely powered by the OLPC itself.

I actually tried a total of four designs for this circuit.

Design 1: The Original The original circuit looked like this:

It works, but it's a bit awkward since it requires those external batteries.

Design 2: Dump the op-amp One simple thing to do is just make a parallel adder with a few resistors and a reference voltage (the original 0.45v from Design 1) from a voltage divider and not worry about all the stability that the op-amp brings.

Here's that circuit under test. It works, but it can be made even simpler.

Design 3: Diode bias After mentioning this project on Hacker News jacquesm suggested trying an even simpler circuit: the original LM35 temperature sensor with a diode inserted between its ground pin and ground to bias ground to the forward voltage drop of the diode (typically 0.6v) which would bring the output voltage for household temperatures into the range of the OLPC.

That worked nicely (I had a 1N4007 lying around which has a forward voltage of up to 1.1v) and here's the circuit.

Design 4: Switch the sensor And then I discovered that there's a pin compatible replacement for the LM35 called the TMP36 which output 0.75v at 25C with 0.01v per C. That means that at 0C it outputs 0.5v and at 100C it output 1.75v. That puts its output inside the range the OLPC can sense. And it can run on the 5V from the USB port. And it has low output impedance.

So the final circuit has a single component. Here it is under test:

And here it is soldered to a connector ready for connection to the OLPC via my original USB/Mic In connector cable.

So if you want to make a really simple temperature probe for the OLPC XO-1 then get a TMP36.

Now all I need to do is find a supplier of stainless steel probes to put the TMP36 in and I can start making them.

PS Ever since I had eye surgery and stopped wearing glasses I've been worried about splashing solder in my eyes. So I wear a pair of Panther Vision LED Lighted Safety Glasses which protect the eyes and let you see what you are doing at the same time.

Don't you wish your boyfriend was hot like me

Friday, January 08, 2010

How I got 50,000 page views by simply being me

Today on Hacker News there's a post about a user who 'engineered' two postings that resulted in 60,000 page views. That post annoyed me because the last thing people like is to know that they've been manipulated.

The other day I blogged about being a geek with an Ikea train set and for some reason that post really captured the imagination of a certain part of the Internet.

I hadn't expected that post to be so popular, and I certainly didn't tailor it to any community. It was just me being me.

But within hours it was on the top of Hacker News, on the front page of Reddit and Wired, and being tweeted widely.

I try to make my blog genuine, if you follow it then you'll be getting a raw feed of me and that could cover all sorts of topics. On the other hand there are many blogs that pander (to varying degrees) to different communities. Part of the reason I follow very few RSS feeds is that much blog writing is vapid self-promotion.

Thursday, January 07, 2010

Reply from the Met Office regarding the 'bug' I thought I had found

A few days ago I posted what I thought was a bug in code from the Met Office. It turns out that the code was not in error, the Met Office explanation of what the fields are was. Here's the Met Office reply:

The bug isn't a bug. That is what the code is supposed to do.
When we first put this web page out:

We mistakenly said that data before the "first good year" were
not used. This was not true and we have since amended the page.

So, what looked like a bug was an error in the documentation given by the Met Office.

PS At the time of writing the page does not appear to have been amended (it still talks about the First Good Year), nor does it seem to acknowledge the real dataset problem I found.

Wednesday, January 06, 2010

The Ikea Lillabo Processing code

By popular demand... here's the code, written in Processing that actually draws the train sets. I hadn't released it because I didn't think it was very interesting, but you are welcome to it.
// --------------------------------------------------------------------------
// Small program to draw pictures of Ikea Lillabo track layouts using
// instructions derived from my Perl program.
// Written by John Graham-Cumming (
// Released under the General Public License, version 2
// --------------------------------------------------------------------------

// This is the cursor position (x, y) coordinates and angle to the
// horizontal

float x, y, a;

// The length in pixels of a single straight piece

float len = 40;

// See the Perl program for a full explanation, but there are 8 curves
// in a circle and from that the radians of curve arc, the length of the
// straight line between the curve ends and the curve angle to the
// horizontal can be calculated.

float curves_in_circle = 8;
float radians_in_curve = 2 * PI / curves_in_circle;
float curve_angle = radians_in_curve / 2;
float curve_length = len * 2 * cos(PI/2 - radians_in_curve/2);

// The Processing equivalent of main()
void setup() {

// Set up the basic parameters for the drawing: a 1000x1000 canvas,
// with a white background. None of the elements drawn will be filled
// and the lines will be four pixels wide.


// These are the nine possible layouts discovered by the Perl program
// and were copy/pasted here. Clearly this would be better done
// dynamically with this program reading the Perl program's output.

int layouts = 9;
String s[] = new String[layouts];


// (row, col) is the row and column position of the next layout to draw
// starting from the top left of the canvas. Since we know there are
// 9 the loop below lays them out 3x3. h is the height of space
// reserved for a layout.

int row = 0;
int col = 0;
int h = 250;
int w = h + 50;

for ( int j = 0; j < layouts; j++ ) {

// Start 200 pixels from the top left corner and with an initial
// angle of 0

a = 0;
x = 200 + w * col;
y = 200 + h * row;

if ( col == 3 ) {
col = 0;

for ( int i = 0; i < s[j].length(); i++ ) {
switch(s[j].charAt(i)) {
case 'B':

case 'C':

case 'A':

case 'S':

// Function to draw a piece and update (x, y) and a
void draw_piece( float l, // Length of piece to be drawn
float ang ) // The angular change due to the piece
// If the ang is zero then this is a straight piece so use line(), if
// non-zero then it's a curve and so use arc()

if ( ang == 0 ) {

// (dx, dy) is the end of the piece truncated (the 0.8 multiplier)
// to leave a little gap between pieces.

float dx = x + l * 0.8 * cos(a + ang);
float dy = y + l * 0.8 * sin(a + ang);
line( x, y, dx, dy );
} else {
int h = (ang<0)?-1:1;

// (ox, oy) is the location of the centre of the circle on which the
// arc we are drawing lies. s and e are the starting and ending angles
// of arc to draw. Note that s must be less than e. Note the 1.5 here
// is used to shorten the arc to leave a small gap between pieces.

float ox = x - h * len * cos(PI/2-a);
float oy = y + h * len * sin(PI/2-a);
float s = a;
float e = a + ang * 1.5;
if ( s > e ) {
float t = e;
e = s;
s = t;

// The PI/2 adjustment here is needed because the angles in s and e are
// derived from a which is to the horizontal and the arc() function needs
// angles to the vertical

arc( ox, oy, len*2, len*2, s - h * PI/2, e - h * PI/2 );

// Update (x,y) and a to be at the end of the new piece that's been
// added and with the correct orientation.

x += l * cos(a + ang);
y += l * sin(a + ang);
a += 2 * ang;

// Four functions to draw the four pieces giving them different colours.

void bridge()

void straight()

void clockwise()

void anticlockwise()

Tuesday, January 05, 2010

More fun with toys: the Ikea LILLABO Train Set

As further proof of my unsuitability to be a child minder (see previous post) I found myself playing with an Ikea LILLABO 20-piece basic set train.

The train set has 16 pieces of track (12 curves, two straight pieces and a two part bridge) and 4 pieces of train. What I wondered was... how many possible looping train tracks can be made using all 16 pieces?

The answer is... 9. Here's a picture of the 9 different layouts.

The picture was generated using a little program written in Processing. The bridge is red, the straight pieces are green and the curves are blue or magenta depending on whether they are oriented clockwise or anticlockwise. The curved pieces can be oriented in either way.

To generate those layouts I wrote a small program which runs through all the possible layouts and determines which form a loop. The program eliminates duplicate layouts (such as those that are mirror images of each other).

It outputs a list of instructions for building loops. These instructions contain 15 characters C (clockwise curve), A (anticlockwise curve), S (straight piece) and B (the entire bridge). Here are the building instructions for all the shapes in the picture above:


PS You can actually create more possible layouts than these because the Ikea pieces have a lot of play in them and can be abused to form other shapes. But where's the fun in that?

PPS Ikea can actually double the fun by adding just two more pieces. If they added two more curved pieces the number of possible tracks doubles to 18. If they added four curved pieces then the number of possible tracks goes to 130.

PPPS Here's the code
use strict;
use warnings;

# ----------------------------------------------------------------------------
# Program to determine all the possible valid closed loop train tracks
# possible using the Ikea Lillabo 20-piece basic train set (see
# Note that
# although this is 20-piece there are only actually 16 pieces of track
# (the train itself is in four pieces).
# Written by John Graham-Cumming (
# Released under the General Public License, version 2
# v1.0 - First public release
# ----------------------------------------------------------------------------

# Universal constant derived from 1 Kings 7:23 :-)

my $PI = 3.1415927;

# The length of a single straight piece. For the purposes of the
# program this is set at one but could be made into a real value if
# you wanted to output actual track lengths.

my $unit = 1;

# The number of curved pieces that make a full circle. Determined
# empirically. And the angle of the segment determined by a single
# curved piece in a circle.

my $curves_in_circle = 8;
my $radians_in_curve = 2 * $PI / $curves_in_circle;

# The 15 possible pieces: a straight edge (which has $unit length) and
# is used as the measurement for everything else, a bridge (which is
# twice the length of the straight edge; it is actually supplied in
# two pieces but for the purposes of this program is considered to be
# a single piece) and a curve (using $radians_in_curve above the
# length of the straight line between the ends of the curve is
# calculated).
# Each entry consists of three parts:
# length: the length in a straight line between the ends of the piece
# at its centre.
# angle: the angle (in radians) between the straight line through the
# piece and a tangent to the curve at the piece's start. This only
# applies to curved pieces where the straight line is the line joining
# its two endpoints.
# count: how many of these pieces are supplied.
# Note that curves can be placed in either a clockwise or
# anticlockwise direction. The program detects this by looking at the
# angle. If it's non-zero then the piece has two orientations.

my %pieces = (
Bridge => { length => $unit * 2,
angle => 0,
count => 1 },

Straight => { length => $unit,
angle => 0,
count => 2 },

# Here's a curved piece, the angle a is $radians_in_curve, the
# length l of a side is $unit. So length is the distance between
# the points labelled s and f. Bisect the angle a you get a right
# angle triangle with hypotenuse of length $unit and angle at the
# vertex of $radians_in_curve/2. So the angle b is $PI/2 -
# $radians_in_curve/2. By simple trigonometry the length is twice
# $unit * cos($PI/2-$radians_in_curve/2).
# s
# C
# . . C
# . b C
# l . . C
# . C
# . . C
# . a C
# . . . . . . . C
# o f
# To calculate the angle to the tangent at point s (the angle c),
# note that the angle formed by os and the tangent is a right angle
# (since os comes from the centre of the circle). So b+c is $PI/2
# but b is $PI/2 - $radians_in_curve/2 and so c is
# $radians_in_curve/2
# s
# .
# . . .
# . b c .
# l . . .
# . .
# . . .
# . a .
# . . . . . . . . . . .
# o f
Curve => { length => 2 * $unit * cos($PI/2-$radians_in_curve/2),
angle => $radians_in_curve/2,
count => 16 }

# This structure contains the position of the end of the current
# placed track (the X and Y coordinates) and the angle of the end of
# the piece of track in radians to the horizontal. Initially this is
# defined as location (0,0) and with an angle of 0.
# The goal will be to place pieces such that the cursor ends up at the
# %origin.

my %origin = (
angle => 0,
x => 0,
y => 0

# Some basic test tracks to make sure that the code does what I
# expect. This hash contains track layouts and the expected final
# position of the cursor for tests

my %tests = (
  B => { x => 2 * $unit, y => 0, angle => 0 },
  S => { x => $unit, y => 0, angle => 0 },
  C => { x => 0.707, y => 0.293, angle => 0.785 },
  A => { x => 0.707, y => -0.293, angle => -0.785 },
  AA => { x => 1, y => -1, angle => -1.570 },
  CC => { x => 1, y => 1, angle => 1.570 },
  BSS => { x => 4 * $unit, y => 0, angle => 0 },
  CCCC => { x => 0, y => 2 * $unit, angle => 0 },
  AAAA => { x => 0, y => -2 * $unit, angle => -3.14 },
  CCCCCCCC => \%origin,
AAAAAAAA => \%origin,
CCCCSCCCCS => \%origin,

foreach my $t (sort keys %tests) {
my %ac = %origin;
my @as = split( '', $t );
build_track( \%ac, \@as );
if ( !cursor_match( \%ac, $tests{$t} ) ) {
die "Test $t failed: $ac{x} $ac{y} $ac{angle}\n";

# To avoid getting the same track in different directions we keep
# track of tracks that we've found for testing. The keys of this hash
# are the strings that describe the track layout (e.g. CCCCCCSSAAAAAA
# without the bridge piece).

my %found;

# Since there are only two interesting pieces of track (curve and
# straight) the algorithm for trying out all the combinations of
# pieces is broken into two parts. The outer loop runs through all
# combinations of curved pieces joined together with each piece
# getting to go clockwise or anticlockwise. This is done by
# generating all binary numbers between 0 and 2^12-1 (i.e. all 12 bit
# binary numbers) and using the bits to indicate whether a track piece
# goes clockwise (a 0) or anticlockwise (a 1).
# To avoid seeing the same track twice in symmetrical layouts the
# curved pieces are constrained to always finish with an anticlockwise
# piece. This is done by making the top bit in this loop be 0 by
# reducing the top number to 2^11-1

foreach my $curve (0..2**($pieces{Curve}{count}-1)-1) {

# Create an array containing the instructions for inserting
# pieces. The array is alpha and has entries B, C, A and S. A means
# add curved piece anticlockwise, S means add straight piece, C
# means a curved piece clockwise and B a bridge. This array will be
# fed into move_cursor to build the final track.
# The following piece of code uses the sprintf function to extract
# the bits from the $curve value into an array and then transform a
# 0 bit to A and a 1 bit to C.

my @instructions = map {$_?'C':'A'}
split( '', sprintf( "%0$pieces{Curve}{count}b", $curve ) );

# Then for each combination of curved pieces it's just a question of
# inserting the straight pieces in all possible positions. If there
# are 12 curved pieces then there are 13 possible places each
# straight piece can be inserted.
# To make it possible to modify the number of straight pieces the
# program doesn't assume that there are only two (which could just
# have been done with a pair of loops). So an array is initialized
# that will contain the positions of the n straight pieces and it is
# used to create the combinations.
# The initial position of the straight pieces is 0 and then 0
# (assuming just two) indicating that they are inserted before the
# first curved piece.

my @straight;

foreach my $i (0..$pieces{Straight}{count}-1) {
$straight[$i] = 0;

# Now run through all the possible valid straight positions and
# insert them into the curved positions

while (1) {
my @in = @instructions;

# Add the straight pieces to the list of 'instructions' at the
# appropriate places and then place the pieces. This creates a
# new array, @build, containing the track layout.

my @build;
my $s = 0;
my $c = 0;
while ( ( $s < $pieces{Straight}{count} ) ||
( $c < $pieces{Curve}{count} ) ) {
if ( $s < $pieces{Straight}{count} ) {
if ( $straight[$s] == $c ) {
push @build, ('S');
$s += 1;

if ( $c < $pieces{Curve}{count} ) {
push @build, ($in[$c]);
$c += 1;

# Now add the bridge at end. Since there's only one bridge this
# simplification is made and it makes no difference to the outcome
# since all other combinations of straights and curves will be
# generated.

push @build, ('B');

my %cursor = %origin;
build_track( \%cursor, \@build );

# Finally see if this track layout results in a loop. The test
# for this is simple: is the last piece (represented by the
# %cursor) at the %origin position and orientation?
# If it matches in one direction then pop off the bridge, reverse
# the track, push back the bridge and try to rebuild. This will
# eliminate tracks that don't actually match up because the final
# piece approaches the bridge from the wrong direction.

if ( cursor_match( \%cursor, \%origin ) ) {
pop @build;
%cursor = %origin;
my $build1 = join('',@build);
@build = reverse @build;
my $build2 = join('',@build);
push @build, ('B');
my $build1a = $build1;
$build1a =~ tr [AC] [CA];
my $build2a = $build2;
$build2a =~ tr [AC] [CA];

if ( !exists( $found{$build1} ) ) {
build_track( \%cursor, \@build );
if ( cursor_match( \%cursor, \%origin ) ) {
my $build3 = undef;
if ( $build1 =~ /^(.*)SS(.*)$/ ) {
$build3 = join('',reverse split('',$1)) .
'SS' . join('',reverse split('',$2));
if ( !defined( $build3 ) ||
!exists( $found{$build3} ) ) {
print @build, "\n";
$found{$build1} = 1;
$found{$build2} = 1;
$found{$build1a} = 1;
$found{$build2a} = 1;
if ( defined( $build3 ) ) {
$found{$build3} = 1;

# Move the straight pieces to their next positions, this is where
# the while loop will exit once all the positions have been tried.

foreach my $i (0..$pieces{Straight}{count}-1) {
my $j = $pieces{Straight}{count}-1-$i;
if ( $straight[$j] < $pieces{Curve}{count} ) {
$straight[$j] += 1;
} else {
$straight[$j] = -1;

foreach my $i (1..$pieces{Straight}{count}-1) {
if ( $straight[$i] == -1 ) {
$straight[$i] = $straight[$i-1];

my $terminate = 1;

foreach my $i (0..$pieces{Straight}{count}-1) {
if ( $straight[$i] != $pieces{Curve}{count} ) {
$terminate = 0;

if ( $terminate ) {

# Subroutine used to place a piece of track. It updates a %cursor
# hash passed to it
sub move_cursor
my ( $c, # Reference to the %cursor hash to be updated
$p, # Name of the piece be added
$h ) = @_; # Handedness for curved piece (1 or -1), omit if
# non-curved

if ( !defined( $h ) ) {
$h = 0;

$c->{x} += $pieces{$p}{length} * cos($c->{angle} +
$h * $pieces{$p}{angle});
$c->{y} += $pieces{$p}{length} * sin($c->{angle} +
$h * $pieces{$p}{angle});

if ( $h != 0 ) {
$c->{angle} += 2 * $h * $pieces{$p}{angle};

# Subroutine to build an entire track from a list of track pieces
# (with single character identifiers)
sub build_track
my ( $c, # Reference to the %cursor to update
$in ) = @_; # Reference to the list of build instructions

# This maps the characters in the @$in array into piece names and
# handedness (for the curved pieces). Note that direction is
# missing for the straight pieces and the undefined value is used in
# the following loop and passed to move_cursor for those
# pieces. move_cursor knows how to deal with this.

my %m = (
'A' => { piece => 'Curve', direction => -1 },
'B' => { piece => 'Bridge' },
'C' => { piece => 'Curve', direction => 1 },
'S' => { piece => 'Straight' } );

foreach my $i (@$in) {
move_cursor( $c, $m{$i}{piece}, $m{$i}{direction} );

# Determine whether two cursors represent the same position and
# orientation. Return 1 if so, 0 otherwise.
sub cursor_match
my ( $c, $d ) = @_; # References to two %cursor hashes

my $dx = round($$c{x}-$$d{x});
my $dy = round($$c{y}-$$d{y});
  my $da = round(($$c{angle}-$$d{angle})/$PI);
$da -= int($da);

return ( ( $dx == 0 ) &&
( $dy == 0 ) &&
( $da == 0 ) );

# Perform rounding on a number to two decimal places
sub round
my ( $a ) = @_;

return int($a*100 + 0.5*($a <=> 0))/100;

PPPS Here's how I determined that a full circle was made up of eight curved pieces with a radius of one straight piece:

Update I received a message from Ikea about this:

Dear John

Thank you for sharing our passion for childrens products. It is very good for us to get input like this. I will forward your idea to the product developer who is responsible for our toys and maybe this improvements will be something for us to concider.
Thanks again for taking your time to give us feed back.

Best regards
Maria Thörn
Range manager childrens IKEA