### Anatomy of a CRUTEM3 bug

Back in February 2010 I spotted (with the help of a reader) a bug in the software used to generate the CRUTEM3 data from the Met Office. This was subsequently confirmed by the Met Office as having a small effect of overestimating the error bounds on the temperature trend.

Back then I wrote:

One particular file called

First, the answer to my question "I also don't understand why when there are less than 30 years of data the number 30 appears to still be used." is simple. The number 30 is hard-coded. That's a small difference with the published paper.

The answer to why the correction (dividing the number 30 by the number of stations in the grid square) works requires following through the code and its related mathematics. Brace yourself from some forumlae. What I believe happened was that the programmer sensibly realized that there was no point doing all these square roots and then squaring the result and the forumla could be simplified as it was being turned into code. This sort of simplification is a good idea because it helps reduce floating point errors that might get introduced. But in the simplification process a term got lost.

Here's a review of the key terminology and the relationship with the code:

And here's a simplification of the formula that shows how to go from what's in the paper to what's in the code:

The

The effect is that rather than dividing by

What had puzzled me at the time was why the programmer would have introduced 'dividing the number of stations by

Back then I wrote:

I have no idea why the correction given in this blog post by Ilya and I works: perhaps it indicates a genuine bug in the software used to generate CRUTEM3, perhaps it means Ilya and I have failed to understand something, or perhaps it indicates a missing explanation from Brohan et al. I also don't understand why when there are less than 30 years of data the number 30 appears to still be used.I had no idea, because until today I had never seen the source code used for the generation of CRUTEM3. But the recent release of another batch of emails from CRU led to an intriguing web site that had been hidden from view, but publicly accessible if you knew the URL. In one of the emails a Met Office scientist called Philip Brogan proposes (on October 8, 2007) to release the CRUTEM3 source code having made a copy of it onto his personal site:

We could also put the source code online - I've copied the internal software and documentation to a temporary site so you can see it (http://xxxxxxxx/xxxxxx/xxxxxx/crutem3/xxxxxxx/ - please keep this URL secret) - this could be added to http://hadobs.metoffice.com/crutem3 if that would help to damp the conspiracy theories. What do you think?Amazingly, as reported here the 'secret URL' is still working four years later with a complete archive (apparently) of the CRUTEM3 code.

One particular file called

`grid_se.pm`contains the function that performs the 'station error' calculation I had looked at way back:# Calculate the station error for a grid cell sub station_error { my $Climatology = shift; my $Sd = shift; # Climatology error my $Clim_error = 0; for ( my $i = 0 ; $i < scalar(@$Climatology) ; $i++ ) { if ( $Climatology->[$i] eq 'D' ) { $Clim_error += ( $Sd->[$i]**2 ) / 30; } # Small error in data-based climatology elsif ( $Climatology->[$i] eq 'E' ) { $Clim_error += $Sd->[$i]**2 / 15; } # Modest error in extrapolated climatology elsif ( $Climatology->[$i] eq 'W' ) { $Clim_error += 0.3 * $Sd->[$i]**2; } # Largest error in WMO climatology else { die "Dud value for climatology flag: $Climatology->[$i]"; } } $Clim_error /= scalar(@$Climatology); # Observations error my $Obs_error = ( 0.03**2 ) / scalar(@$Climatology) ; # 0.03C per station observational error # Homogeneity error my $Hom_error = ( 0.4**2 ) / scalar(@$Climatology); # 0.4C per station homogeneity error # Return the total error my $Total_error = sqrt( $Obs_error + $Clim_error + $Hom_error ); return $Total_error; }The code is written in Perl (as was the code that the Met Office released for calculating partial CRUTEM3 results). It's satisfying to see this file because it answers my questions above and is a simple example of how errors happen in real code. I can easily imagine myself making a similar error.

First, the answer to my question "I also don't understand why when there are less than 30 years of data the number 30 appears to still be used." is simple. The number 30 is hard-coded. That's a small difference with the published paper.

The answer to why the correction (dividing the number 30 by the number of stations in the grid square) works requires following through the code and its related mathematics. Brace yourself from some forumlae. What I believe happened was that the programmer sensibly realized that there was no point doing all these square roots and then squaring the result and the forumla could be simplified as it was being turned into code. This sort of simplification is a good idea because it helps reduce floating point errors that might get introduced. But in the simplification process a term got lost.

Here's a review of the key terminology and the relationship with the code:

And here's a simplification of the formula that shows how to go from what's in the paper to what's in the code:

The

`$Hom_error`and the`$Obs_error`are correctly calculated by the code, but the`$Clim_error`is not. The critical line is`$Clim_error /= scalar(@$Climatology);`. The line either needs to be repeated or changed to`$Clim_error /= (scalar(@$Climatology)*scalar(@$Climatology));`.The effect is that rather than dividing by

`n^2`the formula is divided by`n`. And that explains what I saw all that time ago. It's as if the value of`m`(30 in the code) is divided by the number of stations`n`. If that's done the`1/n`cancels one of the`n`s in`n^2`.What had puzzled me at the time was why the programmer would have introduced 'dividing the number of stations by

`n`'. He didn't, it's just that the effect is the same.Labels: climate change

*If you enjoyed this blog post, you might enjoy my travel book for people interested in science and technology: The Geek Atlas. Signed copies of The Geek Atlas are available.*

<$BlogCommentBody$>

Create a Link

<< Home