## Tuesday, December 22, 2009

### There's a hole in my bucket, dear Liza, dear Liza

So the Met Office released source code for analysis of the recently released land surface temperature data and there appears to be a bug in it.

If you take a look at lines 87-91 of station_gridder.perl there's a loop that reads data from the observation files and extract data between the start and end years. Unfortunately, this appears to be wrong and the program ends up reading suspect data and using it.
 # Push anomaly for each year and month to @GridA for (      my $i =$Station->{start_year} ;     $i <=$Station->{end_year} ;     $i++ ) The$Station hash (actually hash reference) contains entries for start_year, end_year and first_good_year. These correspond to the entries Start year, End year and First Good year in the observation files.

The First Good year is documented as "First Good Year — data before that year are suspect." (see here). My program uses that to remove any suspect data, the Met Office version does not.

This means that it adds suspect data to the averages. I don't know if this really matters, but since they say data before those years is suspect I was assuming that it shouldn't be used.

You can fix the program by changing the lines to:
 # Push anomaly for each year and month to @GridA for (      my $i =$Station->{first_good_year} ;     $i <=$Station->{end_year} ;     \$i++   )

If you run the two versions you'll see a slight change in the trend (which affects the early period when data was considered suspect). Here's a chart showing the original version (the red line) and the corrected version (the green line).

Notice something funny? There's a gap at 1855. In fact there's no data output for 1855. Digging in, this turns out to happen because there are no valid southern hemisphere measurements at all in 1855.

Hence there's no global average for 1855.

That's a bit odd, but not serious. But it makes me suspect something: I'll bet a mince pie that this code the Met Office has released is not the code they actually use to create CRUTEM3. I bet they wrote it especially for this release.

Of course, I could be wrong about that.

UPDATE. See this blog posting for the Met Office's reply. Looks like the bug is not a bug, it's an error in their documentation.

Labels:

If you enjoyed this blog post, you might enjoy my travel book for people interested in science and technology: The Geek Atlas. Signed copies of The Geek Atlas are available.

neiljg said...

Hi John,
Everyone treats this software and data as if they are marking a school maths paper . The question everyone must ask until it must be addressed is where is the Quality Assurance and Software Control, Quality Audit results etc on data and software used for trillions of dollars of expenditure. Businesses are forced to do this in dealing with the Government before supplying anything. Obviously there has been no Software Verification as you have shown.
I suggest to all you technically qualified bloggers that you hammer this issue.

1:59 AM
Bob said...

This code is using fairly recent Perl coding style. Did this flavor of Perl exist 10 years ago? What is the minimum age of the code, by the internal evidence of its coding style?

So, assume that the code has been evolving year-over-year, and has been occasionally refactored.

Probably a good thing, from a code quality perspective. Its ancestor might have been something in Fortran 77...

6:59 AM
Francis Turner said...

Um they wrote it in Perl? really? hmm given that everything else seems to have been written in Fortran and IDL I'm kinda suspicious too.

7:45 AM
John Graham-Cumming said...

@Bob

What makes you think this has a recent style?

The only thing that stood out to me when I read the code was that they have a function called dclone() which looks a lot like a special version of Storable's dclone() function. That's been around since 2000.

It's odd that they called the function dclone() and not copy_field() or something similar. My guess is that they were using Storable and decided to remove the dependency on it by writing a little function to replace the bit of functionality they need.

So, it's possible that this Perl script has been extracting from something pre-existing rather than having been written from scratch.

2:33 PM
steven said...

One thing to look for is the smoothing code. I don't have time now, but the issue cropped up a couple of years back.

6:03 PM
John Graham-Cumming said...

This code does simple annual non-rolling averages for smoothing. My code uses the published 21 point binomial filter for smoothing.

But I also wrote an annualizer for comparison purposes.

6:35 PM
Jeff said...

John,

I'm unfamiliar with Pearl. The code provided doesn't seem to save any of the gridded anomaly data.

Can you provide me with some instruction as to how you got the info from the gridder into the trend software and out of the trend.

I'm using strawberry pearl with perl express, any info on whether that's a good idea would be helpful also.

Jeffid1 at gmail.com

10:00 PM
Jeff said...

never mind, I figured it out.

Nice job BTW

10:18 PM