Monday, November 30, 2009

The 'very artificial correction' flap looks like much ado about nothing to me

The other day I posted that some code from the CRU Hack that people were shouting about was actually dead code. It was subsequently pointed out to me that there was another version in which the code was not dead.

So, I figured I'd have a look at it and see what was going on and try to read the scientific papers behind this. I've come to the conclusion "move along, there's nothing to see here".

Firstly, here's couple of segments of the code that does the 'artificial correction':
2.5,2.6,2.6,2.6,2.6,2.6]*0.75 ; fudge factor
if n_elements(yrloc) ne n_elements(valadj) then message,'Oooops!'

Since I don't have the datasets that underlie this code I can't actually execute it, but I can follow it. Here's a line-by-line explanation of what's happening:

yrloc=[1400,findgen(19)*5.+1904]: this creates a list of numbers which are stored in yrloc. The list starts with the number 1400 and then consists of 1904, 1909, 1914, 1919, etc. all the way up to 1994. findgen(19) creates the list 0, 1, 2, up to 18. The *5. multiplies these numbers by 5 to obtain the list 0, 5, 10, up to 90. The +1904 adds 1904 to them to get the list 1904, 1909, 1914 up to 1994.

So the final value of yrloc is [1400, 1904, 1909, 1914, 1919, 1924, 1929, 1934, 1939, 1944, 1949, 1954, 1959, 1964, 1969, 1974, 1979, 1984, 1989, 1994]. The square brackets around the numbers make it into a list (which is called a vector in this computer language).

valadj=[0., 0., 0., 0., 0., -0.1, -0.25, -0.3 ,0. ,-0.1 ,0.3 ,0.8 , 1.2, 1.7, 2.5, 2.6, 2.6, 2.6, 2.6, 2.6]*0.75 creates a list called valadj where each element of the list in square brackets is multipled by 0.75 (that's the *0.75 bit).

So the final value stored in valadj is [0, 0, 0, 0, 0, -0.075, -0.1875, -0.225, 0, -0.075, 0.225, 0.6, 0.9, 1.275, 1.875, 1.95, 1.95, 1.95, 1.95, 1.95]

if n_elements(yrloc) ne n_elements(valadj) then message,'Oooops!' This is checking that yrloc and valadj have the same number of elements (i.e. the two lists are the same length). This is important because the rest of the code relies on lining these two lists up and treating them as pairs.

For example, the 1400 in yrloc corresponds to the first 0 in valadj and 1994 in yrloc corresponds to the final 1.95 in valadj. Here's a simple plot of the data with yrloc on the horizontal (x-axis) and valadj:

Now it gets a little bit more complicated.

yearlyadj=interpol(valadj,yrloc,x). The interpol function is doing a linear interpolation of the data in valadj and yrloc and it uses that to look up the adjustment necessary for the years that are mention in x. You'll have to go read the original code to see where x comes from but basically there's code to read maximum latewood density (MXD) values from a file and transform that into two lists: x (which contains the years) and densall. As above these have the same number of entries and there's a correspondence between the entries similar to the yrloc and valadj explained above.

What the interpol function is doing is saying given the linear interpolation of the datapoints we know about (the 'aritificial correction') make a list of the corrections necessary for the years in x. It works like this; here's a graph of the linear interpolation from 1400 to 1994.

Now, suppose that x contains data for 1653. To find the adjustment necessary the program essentially looks at the graph above, finds 1653 on the x-axis and then the corresponding value on the y-axis (which, in this case, is 0 since there's no adjustment between 1400 and 1924). Or if it needs a value for 1992 it finds 1992 on the x-axis and find the corresponding point on the y-axis (which in this case will be between the datapoints for 1989 and 1992 along the line that joins them: in this case it will find an adjustment of 1.95 since the graph is flat at that point. Final example, suppose x contains data for the year 1975. Follow the x-axis to 1965 and then up to the line between the datapoints at 1964 and 1969. Here the line is a slope and the value at the intersection is 1.395.

So the program ends up with yearlyadj filled with the adjustments necessary for the data in densall corresponding to the years in x. Finally, the program makes the adjustment with the line densall=densall+yearlyadj. On a year-by-year basis the values in yearlyadj are added to the values in densall. The first value of yearlyadj is added to the first value of densall, the second value of yearlyadj is added to the second value of densall and so on.

Now this program is dated at September 1998 by Ian Harris at CRU. And if we look there's a paper by Harris (with others) published in Philosophical Transactions of the Royal Society B. The paper is Trees tell of past climates: but are they speaking less clearly today?. The entire paper is about how the record in tree-rings doesn't match up in recent decades with actual temperatures, but it did in the past.

There's a nice picture of this in the paper:

Ignore the dotted line. The thin line shows that average summer temperatures, the thick line the predicted temperature from the maximum latewood density. They are going along fine until around 1930 when the actual temperature starts to exceed the predicted temperature and the gap gets (mostly) greater and greater. Real divergence occurs in the mid-1960s.

Now pop back to my first graph. There's a rough correspondence between the correction being made and the gap in CRU's graph. It's certainly not perfect but it looks to me like what's happening here is a scientist trying to make two datasets correspond to each other starting with a rough estimate of what it would take to make them work correctly.

And the paper in fact cautions that such divergence is a cause for concern about relying on the historical tree-ring record until we understand why there is such divergence. To quote the paper:
Long-term alteration in the response of tree growth to climate forcing must, at least to some extent, negate the underlying assumption of uniformitarianism which underlies the use of twentieth century- derived tree growth climate equations for retrodiction of earlier climates.

And the conclusion calls for understanding what's happening here.

So, if there's a smoking gun it was published in Philosophical Transactions of the Royal Society B where the authors point out clearly their concerns about the data and the need to understand why. In doing so they are bound to want to remove the divergence by understanding what happens. A first start is a stab at that with this 'artificial correction'.

In later years, they went looking for the divergence using principal component analysis. this can be seen in this file and this file). Their code tells you what it's doing:
; Reads in site-by-site MXD and temperature series in
; 5 yr blocks, all correctly normalised etc. Rotated PCA
; is performed to obtain the 'decline' signal!

So, they appear to have replaced their hacked-up adjustment above with actual analysis to try to understand what part of the MXD data is caused by some unknown 'decline' causing the difference. The 1998 paper speculates on what might be causing this difference, but the PCA is done just to find it statistically without knowing why.

So, given that I'm a total climate change newbie and haven't been involved in what looks like a great deal of political back and forth I'm going to take this with a grain of salt and say this looks OK to me and not like a conspiracy.

But all the talk of not releasing data and hiding from FOIA requests does make me uneasy. Science is better when other scientists can reproduce what you are doing, just as I once did.


If you enjoyed this blog post, you might enjoy my travel book for people interested in science and technology: The Geek Atlas. Signed copies of The Geek Atlas are available.


<$BlogCommentDateTime$> <$BlogCommentDeleteIcon$>

Post a Comment

Links to this post:

<$BlogBacklinkControl$> <$BlogBacklinkTitle$> <$BlogBacklinkDeleteIcon$>
Create a Link

<< Home