Ilya went on to say that he was having a hard time reproducing the Met Office's numbers. I encouraged him to write a blog post with an example. He did that (and it looks like he had to create a blog to do it). Sitting in the departures lounge at SFO I read through his blog post and Brohan et al.. Ilya's reasoning seemed sound, his example was clear and I checked his underlying data against that given by the Met Office.

The trouble was Ilya's numbers didn't match the Met Office's. And his numbers weren't off by a constant factor or constant difference. They followed a similar pattern to the Met Office's, but they were not correct. At first I assumed Ilya was wrong and so I checked and double checked has calculations. His calculations looked right; the Met Office numbers looked wrong.

Then I wrote out the mathematics from the Brohan et al. paper and looked for where the error could be. And I found the source. I quickly emailed Ilya and boarded the plane to dream of CRUTEM and HadCRUT as I tried to sleep upright.

*Mathematical Interlude*

The station error consists of three components: the measurement error, the homogenisation error and the normal error. The first two are estimated in the paper as 0.03°C and 0.4°C respectively. The normal error is calculated from the standard deviation information in the station files.

The formula for the normal error for a single month,

*i*, is as follows:

Unfortunately, the paper uses rather sloppy mathematical language because the

*N*on the left is not the

*N*on the right, the subscript i isn't defined, and so I am going to express this a bit more clearly as follows:

This means that normal error for month i is the standard deviation for month i (that's

*σ*) divided by the square root of the number of years used to generate the normal values in the station files (which I call

_{i}*m*). Typically we have:

_{i}because 30 years of data from 1961 to 1990 are used. In cases where less than 30 years are available (because of missing data) then a number less than 30 is used.

Now to get the station error,

*ε*, the three error components are joined together by quadrature as follows:

_{i}That works for any grid square where for any month there's just a single station reporting a temperature, but in general there are more. So when there are many station errors they are averaged using a root mean square and then divided by the square root of the number of stations.

Suppose there are n stations each with a station error

*ε*(to which I've added the subscript j to differentiate them) then the final station error for a month i is as follows:

_{i,j}*Return to narrative*

What Ilya had discovered was that the formula above (from the paper) works only when there is a single station in a grid square. When there were two or more it failed; that's when he approached me asking for help.

What I discovered at the airport was that if you replaced the number 30 with 15 the formula worked and the values for station errors for grid squares containing exactly two stations were now correct.

Both Ilya and I came to the same conclusion that in fact the number 15 wasn't picked from thin air, but in fact was 30 divided by 2 (the number of stations in the grid square). We both tested this hypothesis on squares with more than two stations and found that it worked.

So it appears that the normal error used as part of the calculation of the station error is being scaled by the number of stations in the grid square. This leads to an odd situation that Ilya noted: the more stations in a square the worse the error range. That's counterintuitive, you'd expect the more observations the better estimate you'd have.

*Examples*

Ilya had shown me an example in 1947, but I didn't want to take his word for it (although he later showed me a program to check all the stations errors so I should have believed him), and so I took a look at three locations in January 1850. For these three locations all the data underlying CRUTEM3 had been released:

1. The grid square which consists of the single station

**723660**: this corresponds to the grid square with corner 35N, 105W. Here the Met Office data gives station errors of:

`0.5072 0.5424 0.4857 0.4962 -1e+30 -1e+30 0.4407 0.4407 -1e+30 0.4756 -1e+30 0.5186`. The strange negative numbers are missing data (it's missing because in the underlying file there are no normals for 1850 in those months, although the actual normals aren't needed for the station error calculation so it doesn't matter). Using the formula from the paper give the correct answer:

`0.5072 0.5424 0.4857 0.4962 0.4486 0.4756 0.4407 0.4407 0.4661 0.4756 0.5072 0.5186`. This makes sense since our correction value of 1 for 1 station in the square doesn't change the formula.

There is, however, something else wrong with this. The paper says that if less than 30 years of data are available the number

*m*should be set to the number of years. In 723660 there are only 17 years of data, so this station error appears to have been incorrectly calculated based on 30 years.

_{i}2. The grid square which consists of the two stations

**753041, 756439**: this corresponds to the grid square with corner 35N, 80W. Here the Met Office data gives station errors of:

`0.6168 0.569 0.5452 0.4008 0.4345 0.3642 0.3373 0.353 0.3881 0.4624 0.4076 0.5767`and using the formula from the paper (without our correction):

`0.4801 0.4496 0.4346 0.3472 0.3669 0.3264 0.3116 0.3202 0.3399 0.3836 0.3511 0.4545`. If a correction of 2 is used so that each

*σ*is divided by the square root of 15 instead of 30 the correct values are generated.

_{i}3. The grid square which consists of the four stations

**720388, 724080, 756192, 756490**: this corresponds to the grid square with corner 35N, 75W. Here the Met Office data gives station errors of:

`0.5073 0.4409 0.4329 0.3361 0.3286 0.2905 0.2712 0.2807 0.2973 0.3739 0.3325 0.4613`and using the formula from the paper (without our correction):

`0.3074 0.2807 0.2775 0.2417 0.2391 0.2264 0.2204 0.2233 0.2286 0.2552 0.2404 0.2887`. If a correction of 4 is used so that each

*σ*is divided by the square root of 7.5 instead of 30 the correct values are generated.

_{i}*Conclusion*

I have no idea why the correction given in this blog post by Ilya and I works: perhaps it indicates a genuine bug in the software used to generate CRUTEM3, perhaps it means Ilya and I have failed to understand something, or perhaps it indicates a missing explanation from Brohan et al. I also don't understand why when there are less than 30 years of data the number 30 appears to still be used.

If these are bugs then it indicates that CRUTEM3 will need to be reissued because the error ranges will be all wrong.

I've emailed the Met Office asking them to help. If you see an error in our working please let us know!

## 9 comments:

Have you considered metadata? As in, did you include the altitude, proximity to buildings, vegetation, other obstacles and other known factors that affect the raw data sources and is taken into account on analysis of such data for tracking change.

Or did you just use the raw data?

/ Per

"..Ilya Goz.. correctly pointed out that although a subset had been released, for some years and some locations on the globe that subset was in fact the entire set of data and so the errors could be checked..."

The apparent errors Ilya and you have apparently revealed is indeed interesting. But what is also interesting is that

"a subset"which only applies for"some locations on the Globe"are in fact"the entire set of data"?Am I to understand then, that HADCRUT3, which is supposed to represent the temperatures for the entire Globe, only represents those of some locations on the Globe, and that to inaccurately?

The question arises, which locations? What parts of the Globe are left out and not represented? Why is this? If they were represented how would that affect the temperatures?

@Per: all Ilya and I have done is tried to reproduce the CRUTEM3 station errors from the data published by the Met Office using the method in Brohan et al. 2006. I'm not sure how or why I would use metadata.

@Richard: you make it sound like it's a revelation that the global temperature is derived from sampling. But that's no surprise given that there are a finite number of thermometers and the number of them has changed with time. To deal with the 'subset' problem Brohan et al. 2006 uses sampling of another set of temperatures (the NCEP/NCAR set) to come up with an error range for the true temperatures.

I'm slightly confused by your examples - isn't it correct that more samples in a month.grid would give a smaller error value? Or are you saying that in this case the standard deviation will be smaller so the error needs to be corrected to match the smaller samples where N=30?

@Sean: You'd expect the error to be smaller with more samples in a grid square, but because of the division by the number of stations it's actually getting larger.

Your post is a bit confusing in that you use the word "correct" in several places to mean "matches the published Met number".

Or at least that's my understanding of your post. I think this is what is confusing others such as Sean Houlinhane.

A slight rewrite would clarify things a lot. I suggest using terms like "Met" and "Brohan 2006 method" "what I believe to be the correct method" in various spots to clarify what you are saying.

Yes, it does appear to be a major error. Good work!

Cheers

John:

I'm confused why the standard error of the mean (i.e. "normal error") should be included at all as a source of error. These samples that are used to calculate the standard deviation for the month are not pulled from the same population of temperatures. They are pulled from separate days. Why should this error be added at all? I'm not saying there isn't a good reason, but, I don't see it.

Excellent work.

Let me suggest this. I suggest you visit several

blogs and post a comment. Try some of the pro global warming sites.

Just a nice polite note

I suggest "in it for the gold", tamino's open mind, eli rabbetts blogspot.

Post a Comment