As I've worked through Uncertainty estimates in regional and global observed temperature changes: a new dataset from 1850 to reproduce the work done by the Met Office I've come up against something I don't understand. I've written to the Met Office about it, but until I get a reply this blog post is to ask for opinions from any of my dear readers.

In section 6.1 Brohan et al. talk about the problem of coverage bias. If you read this blog post you'll see that in the 1800s there weren't many temperature stations operating and so only a small fraction of the Earth's surface was being observed. There was a very big jump in the number of stations operating in the 1950s.

That means that when using data to estimate the global (or hemispheric) temperature anomaly you need to take into account some error based on how well a small number of stations act as a proxy for the actual temperature over the whole globe. I'm calling this the coverage bias.

To estimate that Brohan et al. use the NCEP/NCAR 40-Year Reanalysis Project data to get an estimate of the error for the groups of stations operating in any year. Using that data it's possible on a year by year basis to calculate the mean error caused by limited coverage and its standard deviation (assuming a normal distribution).

I've now done the same analysis and I have two problems:

1. I get much wider error range for the 1800s than is seen in the paper.

2. I don't understand why the mean error isn't taken into account.

Note that in the rest of this entry I am using smoothed data as described by the Met Office here. I am applying the same 21 point filter to the data to smooth it. My data starts at 1860 because the first 10 years are being used to 'prime' the filter. I extend the data as described on that page.

First here's the smooth trend line for the northern hemisphere temperature anomaly derived from the Met Office data as I have done in other blog posts and without taking into account the coverage bias.

And here's the chart showing the number of stations reporting temperatures by year (again this is smoothed using the same process).

Just looking at that chart you can see that there were very few stations reporting temperature in the mid-1800s and so you'd expect a large error when trying to extrapolate to the entire northern hemisphere.

This chart shows the number of stations by year (as in the previous chart), it's the green line, and then the mean error because of the coverage bias (red line). For example, in 1860 the coverage bias error is just under 0.4C (meaning that if you use the 1860 stations to get to the northern hemisphere anomaly you'll be too hot by about 0.4C. You can see that as the number of stations increases and global coverage improves the error drops.

And more interesting still is the coverage bias error with error bars showing one standard deviation. As you might expect the error is much greater when there are fewer stations and settles down as the number increases. With lots of stations you get a mean error near 0 with very little variation: i.e. it's a good sample.

Now, to put all this together I take the mean coverage bias error for each year and use it to adjust the values from the Met Office data. This causes a small downward change which emphasizes that warming appears to have started around 1900. The adjusted data is the green line.

Now if you plot just the adjusted data but put back in the error bars (and this time the error bars are 1.96 standard deviations since the published literature uses a 95% confidence) you get the following picture:

And now I'm worried because something's wrong, or at least something's different.

1. The published paper on HadCRUT3 doesn't show error bars anything like this for the 1800s. In fact the picture (below) shows almost no difference in the error range (green area) when the coverage is very, very small.

2. The paper doesn't talk about adjusting using the mean.

So I think there are two possibilities:

A. There's an error in the paper and I've managed to find it. I consider this a remote possibility and I'd be astonished if I'm actually right and the peer reviewed paper is wrong.

B. There's something wrong in my program in calculating the error range from the sub-sampling data.

If I am right and the paper is wrong there's a scary conclusion... take a look at the error bars for 1860 and scan your eyes right to the present day. The current temperature is within the error range for 1860 making it difficult to say that we know that it's hotter today than 150 years ago. The trend is clearly upwards but the limited coverage appears to say that we can't be sure.

So, dear readers, is there someone else out there who can double check my work? Go do the sub-sampling yourself and see if you can reproduce the published data. Read the paper and tell me the error of my ways.

UPDATE It suddenly occurred to me that the adjustment that they are probably using isn't the standard deviation but the standard error. I'll need to rerun the numbers to see what the shape looks like, but it should reduce the error bounds a lot.

UPDATE Here's what the last graph looks like if I swap out the standard deviation for the standard error.

That's more like it, I'm going to guess that this what Brohan et al. are doing (without saying it explicitly). But that doesn't explain why their error seems to remain constant. Anyone help with that?

UPDATE The Met Office has replied to my email with an explanation of what's going on with the mean and standard deviation and I'll post it shortly.

UPDATE Please read this post which shows that my code contained an error in interpreting longitude which results in a chart that looks like the one from the Met Office.

## 21 comments:

In the 1800's temperature measurement was indeed an uncertain affair. The Standardized Stevenson Screen for thermometer exposure did not come along until a couple of years after the creation of the U.S. Weather Bureau by an act of Congress in 1890 as passed by president Benjamin Harrison.

Deployment worldwide of the new standard screen took up to 10 years.

Prior to this, thermometer exposure was a haphazard affair, with some being in screens, some placed on the north side of buildings, some in tree shade, and some in various locally created shelter designs. The magnitude of error of exposure often exceeded the resolution of the thermometer.

So yes, during the 1800's, uncertainty was high.

Another issue I have seen discussed is the distribution of stations with latitude vs time and how that is addressed in the averaging.

It may be that the errors are done by grid cell, then added. IE If you have 1 C std deviation in two grid cells, and 1000 thermometers, The total standard error will depend on the exact proportion of thermometers in each location. If the cells are uncorrelated the errors should add in quadrature, if not there will be a cross correlation term.

SEtotal = (SE1^2/N1 + SE2^2/N2)^.5

@David L. That's addressed in two ways in the paper: first by gridding and second by taking into account the number of stations in each grid square.

I'd urge anyone who's really interested in this topic to actually read the linked paper because it goes into a lot of detail about how this stuff is done.

It is simply nonsense to use standard error as means of gauging uncertainty when talking about historical data.

1. The observed numbers are the observed numbers. There is no random selection out of the set of all possible temperatures at any given point in time.

2. The observed locations are the observed locations. Locations where temperature was measured were not chosen randomly and it is not possible to now randomly select another set of locations to observe.

In conclusion, it is simply nonsense to use standard error (i.e. the estimate of the standard deviation of the

sampling distributionof mean temperature) to gaugeuncertainty. One has to use the standard deviation of temperatures, at a minimum, while keeping in mind that locations werenotselected randomly.We simply do not know what we do not know, so the actual uncertainty is even greater than what standard deviation indicates.

Unless CRU has released their unadjusted data, you're wasting your time with the homogenized stuff. I understand the value of replicating scientific work, and focusing on one step at a time.

A nice repository of raw data (land only) is ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily I'm currently processing the raw data. Feel free to drop by & file a comment, or suggestion. I understand the inherent limitations of a simple average, but it does come in handy as a quick initial look at your data. An average also provides a baseline against which any subsequent analysis can be compared to catch programming/logic errors.

John,

Don't be so hasty to dismiss your results. After all, your resulting 95% interval graph is the answer that is more intuitive to anyone that has a changing number of data sets. It only makes sense that you can know a "global" temperature with less confidence if you have fewer instruments across the globe (and less spread out) than, in the limit, an infinitely large global network of instruments. That the peer-reviewed report doesn't have larger confidence intervals in the past only adds to questions of its validity.

John, read your post via WUWT and thought you might be interested in a brief analysis I did some time back.

Basically, I modelled temperature as a function of the station numbers (coverage is similar, but may well give somewhat different results; I’ve not tried). It is then possible to remove the effect of the station number variation from the temperature record. Then I compared the result against satellite records for the overlap period using linear fits and the corrected version fell between the two satellite measurements.

Could it be coincidence? It seems unlikely.

My notes are here: http://www.trevoole.co.uk/Questioning_Climate/userfiles/Influence_of_Station_Numbers_on_Temperature_v2.pdf

Hi,

You might want to hit up RomanM at his blog. Have you posted up your code?

hey john, head over here

http://statpad.wordpress.com/

RomanM, retired stats prof, regular at CA. he might be able to help out

Dont laugh at using a picture of a book cover, but: http://bishophill.squarespace.com/storage/Hockey%20Stick%20Jacket3.png?__SQUARESPACE_CACHEVERSION=1260826454141

error bars do seem to be of the magnitude that modern ~just within 95% confidence interval of 1800s temps.

Maybe you are doing it right and they started with what they wanted to see and made the data fit? thank heavens for the internet, the CRU would have gotten away with the lies without it.

Fact is, measurement uncertainty, effects of sampling et al plague all measurements and the statistical analysis that follows. In the 1800s the statistical theories you're talking about were unknown to pretty much everyone and nobody had any clear systematic framework or protocols for measurement except by way of doing their own thing. (Fisher didn't publish his book until 1925... tho I'd say it's true that it's hard to say if that made things better or worse.)

Estimating the error bars for measurements made in the 1800s is, for practical purposes, impossible. Only for recent measurements could a sensible debate be had - and we don't seem to be able to even get there.

The whole thing's a crock - and that's even if you ignore the elephant in the room - the questionable nature of the choosing of statistical methods on a population of data of which we still don't in any real sense understand the dimensions of.

Looking forward to the cru response

Someones you get to the point where ancedotal evidence seems more accurate. The bees, flowers, trees etc live and die by temperature fluctuations.

The following work I did some time ago might help you visualize how much data was available in 1800s:

http://www.unur.com/climate/ghcn-v2/

http://www.unur.com/climate/hadcrut3-1850-2007.html

It would seem that estimating a global temperature from incomplete coverage data would produce extremely large error bars. However, it should be possible to estimate a fairly accurate "temperature anomaly" even with relatively poor surface coverage.

If the original measurements were done in a consistent manner (regardless of whether they accurately reflect the true local temperature) you should be able to accurately detect a change (anomaly) from the mean at each measurement station.

If the Met graph is showing temperature anomaly then you would expect the error range (or the standard deviation) to be rather small.

In contrast, any attempt to use ancient measurements (let alone proxies) to estimate global average temperature (not temperature anomaly) would necessarily produce huge error bars. In addition to the need for representative surface coverage, much would depend upon on the accuracy of the original measurements (not to mention the effectiveness of the "adjustments" applied to the original data).

JGC,

I have some posts at my place on data homogenization that might be useful to you. They are about how to properly account for uncertainty.

http://wmbriggs.com

Of what import is it in the 3rd chart that the coverage bias error is now 0.2C, vs -0.15C in the mid-late 1950s when the number of met stations was at its peak?

One would think that the coverage bias error being half what it was in 1860 would not be a good thing. Am I wrong?

I guess the small standard deviation may argue that the coverage bias error isn't important, but I don't know.

The issue of errors has been troubling me since I read JGC's blog and I'm still not sure I have a firm enough grip on it. My inclination is to say that we should be quoting the standard deviation and not the standard error when computing the global average. It has been argued that since one is computing a mean from a set of values of grid cell temperatures (or anomalies) one should use the standard error. However, these grid cell temperatures are not all measurements of the same grid cell. They are measurements of different grid cells that exhibit a range of different temperatures. Accordingly we should propagate the standard deviation associated with each grid cell across the summation and then divide by the number of grid cells. This will give us the standard deviation of the 'global average temperature'. This tells us we are 95% (or whatever interval one cares to use) confident that the estimate we have come up with lies within (say 2 sigma) of the true global temperature..

For us to use the standard error we would have to take n independent estimates of the global mean temperature, average these and divide the standard deviation of these estimates by sqrt (n-1). I don't think this is done with the global average temperature measurements. Therefore I conclude that the appropriate measure is the standard deviation and not the standard error.

If my conclusion is correct then our estimate of the global temperature in 1880 is not significantly different from our estimate of the 2008 global temperature.

What we need is a statistician to review the Brohan paper. Others also make some very relevant points about sampling distributions not being random that need to be considered. There is also the issue of propagating errors through to cells with missing values etc. I think there are something like 2592 5 x 5 degree cells yet in 1880 there were less than 200 stations reporting temperature. No doubt these were not randomly distributed either.

@Paul Dennis

I agree. The standard error doesn't seem right since there's no random sampling going on.

Post a Comment