Thursday, February 25, 2010

Something a bit confusing from UEA/CRU

UEA and CRU have issued a document that they have submitted to the Parliamentary Select Committee on Science and Technology who are looking into the taking of email and documents from CRU. The document can be found here.

In it there are two interesting paragraphs concerning software:

3.4.7 CRU has been accused of the effective, if not deliberate, falsification of findings through deployment of “substandard” computer programs and documentation. But the criticized computer programs were not used to produce CRUTEM3 data, nor were they written for third-party users. They were written for/by researchers who understand their limitations and who inspect intermediate results to identify and solve errors.

3.4.8 The different computer program used to produce the CRUTEM3 dataset has now been released by the MOHC with the support of CRU.

It's 3.4.8 that's surprising. I assume that they are referring to the code released by the Met Office on this page (MOHC = Met Office Hadley Centre). On that page they say (my emphasis):

station_gridder.perl takes the station data files and makes gridded fields in the same way as used in CRUTEM3. The gridded fields are output in the ASCII format used for distributing CRUTEM3.

My reading of "in the same way as" has always been that this code is not the actual code that they used for CRUTEM3 but something written to operate in the same manner. In which case 3.4.8 is either incorrect, or referring to some other code that I can't lay my hands on.

Has anyone seen any other CRUTEM3 code released by the Met Office?

More information

Looking into this a bit further there's a description of the CRUTEM3 data format on the CRU site here. Here's what it says:

for year = 1850 to endyear
for month = 1 to 12 (or less in endyear)
format(2i6) year, month
for row = 1 to 36 (85-90N,80-85N,75-70N,...75-80S,80-85S,85-90S)
format(72(e10.3,1x)) 180W-175W,175W-170W,...,175-180E

In that the interesting thing is the format command. That is an IDL command (and not a Perl command). The first one pads the year and month to 6 characters, the second one outputs a row of 72 values each 10 characters wide in exponent format with three characters after the decimal point (the 1x gives a single space of separation).

The other oddness is that the NetCDF files that are available for download were not produced by Perl, they were produced by XConv (specifically, version 1.90 on Mon Feb 22 18:26:48 GMT 2010). And I've tested XConv and it can't read the output of the Perl program supplied by the Met Office.

It's not definitive, but all that points to the Perl programs released by the Met Office not being the actual programs used to produce CRUTEM3. Which leads me back to my original question: has anyone seen any other CRUTEM3 code released by the Met Office?

PS I think the Perl code released by the Met Office was likely written by Philip Brohan (he's the lead author on the CRUTEM3 paper), the style is very, very similar to this code. Given that he's written a lot of Perl code, perhaps I'm simply wrong and the Perl code released by the Met Office is the actual CRUTEM3 generating code.

Update Confusion cleared up by Phil Jones of CRU talking to the Parliamentary committee. He stated that CRU has not released their code for generating CRUTEM3 because it is written in Fortran. The code released by the Met Office (the Perl code) is their version that produces the same result.

Here's the relevant exchange (my transcript):

Graham Stringer MP: So have you now released the code, the actual code used for CRUTEM3?

Professor Jones: Uh, the Met Office has. They have released their version.

Stringer: Well, have you released your version?

Jones: We haven't released our version. But it produces exactly the same result.

Stringer: So you haven't released your version?

Jones: We haven't released our version, but I can assure you...

Stringer: But it's different.

Jones: It's different because the Met Office version is written in a computer language called Perl and they wrote it independently of us and ours is written in Fortran.

It's worth noting that above I said that the format command is present in IDL, it's also present in Fortran which jibes with Professor Jones' statement above.

Later the same day Graham Stringer asked a panel about scientific software and here's part of the response from Professor Julia Slingo representing the Met Office:

Slingo: I mean, around the UEA issue, of course, we did put the code out. Um, at Christmas time. Before Christmas, to, along with the data. Because, we, I felt very strongly that we needed to have the code out there so that it could be checked.

(The rest of her answer doesn't concern CRUTEM3. It was a discussion of code used for climate modeling; I'm going to ignore what she said as it seems to have little bearing on the code I've looked at).

3 comments:

Dave said...

3.4.8 The different computer program used to produce the CRUTEM3 dataset has now been released by the MOHC with the support of CRU.

"station_gridder.perl [the different computer program] takes the station data files and makes gridded fields [arrays]in the same way as used in CRUTEM3."

I read that to say that CRUTEM3 requires input data with (perhaps) a particular grid size or array shape. IOW, station_gridder.perl produces CRUTEM3-compatible input data files.

What am I missing?

MikeAinOz said...

That bit of code looks like pseudo-code, popular with software engineers for it's precision. I use it sometimes myself but mine looks more like pascal than FORTRAN. Pseudo-code is no language and all languages at the same tome.

apolytongp said...

Why not release the original? Who cares if it is Fortran? Release the perl too, if needed. But why retain the original? Is there some embarressing commenting or the like?