## Thursday, November 24, 2011

### Anatomy of a CRUTEM3 bug

Back in February 2010 I spotted (with the help of a reader) a bug in the software used to generate the CRUTEM3 data from the Met Office. This was subsequently confirmed by the Met Office as having a small effect of overestimating the error bounds on the temperature trend.

Back then I wrote:
I have no idea why the correction given in this blog post by Ilya and I works: perhaps it indicates a genuine bug in the software used to generate CRUTEM3, perhaps it means Ilya and I have failed to understand something, or perhaps it indicates a missing explanation from Brohan et al. I also don't understand why when there are less than 30 years of data the number 30 appears to still be used.
I had no idea, because until today I had never seen the source code used for the generation of CRUTEM3. But the recent release of another batch of emails from CRU led to an intriguing web site that had been hidden from view, but publicly accessible if you knew the URL. In one of the emails a Met Office scientist called Philip Brogan proposes (on October 8, 2007) to release the CRUTEM3 source code having made a copy of it onto his personal site:
We could also put the source code online - I've copied the internal software and documentation to a temporary site so you can see it (http://xxxxxxxx/xxxxxx/xxxxxx/crutem3/xxxxxxx/ - please keep this URL secret) - this could be added to http://hadobs.metoffice.com/crutem3 if that would help to damp the conspiracy theories. What do you think?
Amazingly, as reported here the 'secret URL' is still working four years later with a complete archive (apparently) of the CRUTEM3 code.

One particular file called grid_se.pm contains the function that performs the 'station error' calculation I had looked at way back:
# Calculate the station error for a grid cell
sub station_error {

my $Climatology = shift; my$Sd          = shift;

# Climatology error
my $Clim_error = 0; for ( my$i = 0 ; $i < scalar(@$Climatology) ; $i++ ) { if ($Climatology->[$i] eq 'D' ) {$Clim_error += ( $Sd->[$i]**2 ) / 30;
}    # Small error in data-based climatology
elsif ( $Climatology->[$i] eq 'E' ) {
$Clim_error +=$Sd->[$i]**2 / 15; } # Modest error in extrapolated climatology elsif ($Climatology->[$i] eq 'W' ) {$Clim_error += 0.3 * $Sd->[$i]**2;
}    # Largest error in WMO climatology
else { die "Dud value for climatology flag: $Climatology->[$i]"; }
}
$Clim_error /= scalar(@$Climatology);

# Observations error
my $Obs_error = ( 0.03**2 ) / scalar(@$Climatology)
;      # 0.03C per station observational error

# Homogeneity error
my $Hom_error = ( 0.4**2 ) / scalar(@$Climatology);   # 0.4C per station homogeneity error

# Return the total error
my $Total_error = sqrt($Obs_error + $Clim_error +$Hom_error );