Ilya went on to say that he was having a hard time reproducing the Met Office's numbers. I encouraged him to write a blog post with an example. He did that (and it looks like he had to create a blog to do it). Sitting in the departures lounge at SFO I read through his blog post and Brohan et al.. Ilya's reasoning seemed sound, his example was clear and I checked his underlying data against that given by the Met Office.
The trouble was Ilya's numbers didn't match the Met Office's. And his numbers weren't off by a constant factor or constant difference. They followed a similar pattern to the Met Office's, but they were not correct. At first I assumed Ilya was wrong and so I checked and double checked has calculations. His calculations looked right; the Met Office numbers looked wrong.
Then I wrote out the mathematics from the Brohan et al. paper and looked for where the error could be. And I found the source. I quickly emailed Ilya and boarded the plane to dream of CRUTEM and HadCRUT as I tried to sleep upright.
The station error consists of three components: the measurement error, the homogenisation error and the normal error. The first two are estimated in the paper as 0.03°C and 0.4°C respectively. The normal error is calculated from the standard deviation information in the station files.
The formula for the normal error for a single month, i, is as follows:
Unfortunately, the paper uses rather sloppy mathematical language because the N on the left is not the N on the right, the subscript i isn't defined, and so I am going to express this a bit more clearly as follows:
This means that normal error for month i is the standard deviation for month i (that's σi) divided by the square root of the number of years used to generate the normal values in the station files (which I call mi). Typically we have:
because 30 years of data from 1961 to 1990 are used. In cases where less than 30 years are available (because of missing data) then a number less than 30 is used.
Now to get the station error, εi, the three error components are joined together by quadrature as follows:
That works for any grid square where for any month there's just a single station reporting a temperature, but in general there are more. So when there are many station errors they are averaged using a root mean square and then divided by the square root of the number of stations.
Suppose there are n stations each with a station error εi,j (to which I've added the subscript j to differentiate them) then the final station error for a month i is as follows:
Return to narrative
What Ilya had discovered was that the formula above (from the paper) works only when there is a single station in a grid square. When there were two or more it failed; that's when he approached me asking for help.
What I discovered at the airport was that if you replaced the number 30 with 15 the formula worked and the values for station errors for grid squares containing exactly two stations were now correct.
Both Ilya and I came to the same conclusion that in fact the number 15 wasn't picked from thin air, but in fact was 30 divided by 2 (the number of stations in the grid square). We both tested this hypothesis on squares with more than two stations and found that it worked.
So it appears that the normal error used as part of the calculation of the station error is being scaled by the number of stations in the grid square. This leads to an odd situation that Ilya noted: the more stations in a square the worse the error range. That's counterintuitive, you'd expect the more observations the better estimate you'd have.
Ilya had shown me an example in 1947, but I didn't want to take his word for it (although he later showed me a program to check all the stations errors so I should have believed him), and so I took a look at three locations in January 1850. For these three locations all the data underlying CRUTEM3 had been released:
1. The grid square which consists of the single station 723660: this corresponds to the grid square with corner 35N, 105W. Here the Met Office data gives station errors of: 0.5072 0.5424 0.4857 0.4962 -1e+30 -1e+30 0.4407 0.4407 -1e+30 0.4756 -1e+30 0.5186. The strange negative numbers are missing data (it's missing because in the underlying file there are no normals for 1850 in those months, although the actual normals aren't needed for the station error calculation so it doesn't matter). Using the formula from the paper give the correct answer: 0.5072 0.5424 0.4857 0.4962 0.4486 0.4756 0.4407 0.4407 0.4661 0.4756 0.5072 0.5186. This makes sense since our correction value of 1 for 1 station in the square doesn't change the formula.
There is, however, something else wrong with this. The paper says that if less than 30 years of data are available the number mi should be set to the number of years. In 723660 there are only 17 years of data, so this station error appears to have been incorrectly calculated based on 30 years.
2. The grid square which consists of the two stations 753041, 756439: this corresponds to the grid square with corner 35N, 80W. Here the Met Office data gives station errors of: 0.6168 0.569 0.5452 0.4008 0.4345 0.3642 0.3373 0.353 0.3881 0.4624 0.4076 0.5767 and using the formula from the paper (without our correction): 0.4801 0.4496 0.4346 0.3472 0.3669 0.3264 0.3116 0.3202 0.3399 0.3836 0.3511 0.4545. If a correction of 2 is used so that each σi is divided by the square root of 15 instead of 30 the correct values are generated.
3. The grid square which consists of the four stations 720388, 724080, 756192, 756490: this corresponds to the grid square with corner 35N, 75W. Here the Met Office data gives station errors of: 0.5073 0.4409 0.4329 0.3361 0.3286 0.2905 0.2712 0.2807 0.2973 0.3739 0.3325 0.4613 and using the formula from the paper (without our correction): 0.3074 0.2807 0.2775 0.2417 0.2391 0.2264 0.2204 0.2233 0.2286 0.2552 0.2404 0.2887. If a correction of 4 is used so that each σi is divided by the square root of 7.5 instead of 30 the correct values are generated.
I have no idea why the correction given in this blog post by Ilya and I works: perhaps it indicates a genuine bug in the software used to generate CRUTEM3, perhaps it means Ilya and I have failed to understand something, or perhaps it indicates a missing explanation from Brohan et al. I also don't understand why when there are less than 30 years of data the number 30 appears to still be used.
If these are bugs then it indicates that CRUTEM3 will need to be reissued because the error ranges will be all wrong.
I've emailed the Met Office asking them to help. If you see an error in our working please let us know!