Thursday, February 25, 2010

Something a bit confusing from UEA/CRU

UEA and CRU have issued a document that they have submitted to the Parliamentary Select Committee on Science and Technology who are looking into the taking of email and documents from CRU. The document can be found here.

In it there are two interesting paragraphs concerning software:

3.4.7 CRU has been accused of the effective, if not deliberate, falsification of findings through deployment of “substandard” computer programs and documentation. But the criticized computer programs were not used to produce CRUTEM3 data, nor were they written for third-party users. They were written for/by researchers who understand their limitations and who inspect intermediate results to identify and solve errors.

3.4.8 The different computer program used to produce the CRUTEM3 dataset has now been released by the MOHC with the support of CRU.

It's 3.4.8 that's surprising. I assume that they are referring to the code released by the Met Office on this page (MOHC = Met Office Hadley Centre). On that page they say (my emphasis):

station_gridder.perl takes the station data files and makes gridded fields in the same way as used in CRUTEM3. The gridded fields are output in the ASCII format used for distributing CRUTEM3.

My reading of "in the same way as" has always been that this code is not the actual code that they used for CRUTEM3 but something written to operate in the same manner. In which case 3.4.8 is either incorrect, or referring to some other code that I can't lay my hands on.

Has anyone seen any other CRUTEM3 code released by the Met Office?

More information

Looking into this a bit further there's a description of the CRUTEM3 data format on the CRU site here. Here's what it says:

for year = 1850 to endyear
for month = 1 to 12 (or less in endyear)
format(2i6) year, month
for row = 1 to 36 (85-90N,80-85N,75-70N,...75-80S,80-85S,85-90S)
format(72(e10.3,1x)) 180W-175W,175W-170W,...,175-180E

In that the interesting thing is the format command. That is an IDL command (and not a Perl command). The first one pads the year and month to 6 characters, the second one outputs a row of 72 values each 10 characters wide in exponent format with three characters after the decimal point (the 1x gives a single space of separation).

The other oddness is that the NetCDF files that are available for download were not produced by Perl, they were produced by XConv (specifically, version 1.90 on Mon Feb 22 18:26:48 GMT 2010). And I've tested XConv and it can't read the output of the Perl program supplied by the Met Office.

It's not definitive, but all that points to the Perl programs released by the Met Office not being the actual programs used to produce CRUTEM3. Which leads me back to my original question: has anyone seen any other CRUTEM3 code released by the Met Office?

PS I think the Perl code released by the Met Office was likely written by Philip Brohan (he's the lead author on the CRUTEM3 paper), the style is very, very similar to this code. Given that he's written a lot of Perl code, perhaps I'm simply wrong and the Perl code released by the Met Office is the actual CRUTEM3 generating code.

Update Confusion cleared up by Phil Jones of CRU talking to the Parliamentary committee. He stated that CRU has not released their code for generating CRUTEM3 because it is written in Fortran. The code released by the Met Office (the Perl code) is their version that produces the same result.

Here's the relevant exchange (my transcript):

Graham Stringer MP: So have you now released the code, the actual code used for CRUTEM3?

Professor Jones: Uh, the Met Office has. They have released their version.

Stringer: Well, have you released your version?

Jones: We haven't released our version. But it produces exactly the same result.

Stringer: So you haven't released your version?

Jones: We haven't released our version, but I can assure you...

Stringer: But it's different.

Jones: It's different because the Met Office version is written in a computer language called Perl and they wrote it independently of us and ours is written in Fortran.

It's worth noting that above I said that the format command is present in IDL, it's also present in Fortran which jibes with Professor Jones' statement above.

Later the same day Graham Stringer asked a panel about scientific software and here's part of the response from Professor Julia Slingo representing the Met Office:

Slingo: I mean, around the UEA issue, of course, we did put the code out. Um, at Christmas time. Before Christmas, to, along with the data. Because, we, I felt very strongly that we needed to have the code out there so that it could be checked.

(The rest of her answer doesn't concern CRUTEM3. It was a discussion of code used for climate modeling; I'm going to ignore what she said as it seems to have little bearing on the code I've looked at).

Wednesday, February 24, 2010

The station errors in CRUTEM3 and HadCRUT3 are incorrect

I'm told by a BBC journalist that the Met Office has said through their press office that the errors that were pointed out by Ilya Goz and I have been confirmed. The station errors are being incorrectly calculated (almost certainly because of a bug in the software) and that the Met Office is rechecking all the error data.

I haven't heard directly from the Met Office yet; apparently the Met Office is waiting to write to me when they have rechecked their entire dataset.

The outcome is likely to be a small reduction in the error bars surrounding the temperature trend. The trend itself should stay the same, but the uncertainty about the trend will be slightly less.

Tuesday, February 16, 2010

The magic of sub-editors

In the print version of my Times article today there's been significant cutting to get it to fit into the space available. This is the magic work of sub-editors.

Here's the full text of the article with the words that remained in the sub-edited version (which appeared in the paper):

The history of science is filled with stories of amateur scientists who made significant contributions. In 1937 the American amateur astronomer Grote Reber built a pioneering dish-shaped radio telescope in his back garden and produced the first radio map of the sky. And in the 19th century the existence of dominant and recessive genes was described by a priest, Gregor Mendel, after years of experimentation with pea plants.

But with the advent of powerful home computers, even the humble amateur like myself can make a contribution.

Using my laptop and my knowledge of computer programming I accidentally uncovered errors in temperature data released by the Met Office that form part of the vital records used to show that the climate is changing. Although the errors don’t change the basic message of global warming, they do illustrate how open access to data means that many hands make light work of replicating and checking the work of professional scientists.

After e-mails and documents were taken from the Climatic Research Unit at the University of East Anglia late last year, the Met Office decided to release global thermometer readings stretching back to 1850 that they use to show the rise in land temperatures. These records hadn’t been freely available to the public before, although graphs drawn using them had.

Apart from seeing Al Gore’s film An Inconvenient Truth I’d paid little attention to the science of global warming until the e-mail leaks from UEA last year.

I trusted the news stories about the work of the IPCC, but I thought it would be a fun hobby project to write a program to read the Met Office records on global temperature readings and draw the sort of graphs (a graph) that show(ing) how it’s hotter now than ever before.

Since my training is in mathematics and computing I thought it best to write self-checking code: I’m unfamiliar with the science of climate change(climate science) and so having my program perform internal checks for consistency was vital to making sure I didn’t make a mistake.

To my surprise the program complained about average temperatures in Australia and New Zealand. At first I assumed I’d made a mistake in the code and used (having checked the results with) a pocket calculator to double check the calculations.

The result was unequivocal: something was wrong with the average temperature data in Oceania. And I also stumbled upon other small errors in calculations.

About a week after I’d told the Met Office about these problems I received a response confirming that I was correct: a problem in the process of updating Met Office records had caused the wrong average temperatures to be reported. Last month the Met Office updated their public temperature records to include my corrections.

Monday, February 15, 2010

The Times writes up my Met Office discoveries

Here's a major newspaper writing about what I found in the Met Office data:

A science blogger has uncovered a catalogue of errors in Met Office records that form a central part of the scientific evidence for global warming.

The mistakes, which led to the data from a large number of weather stations being discarded or misused, had been overlooked by professional scientists and were only
discovered when the Met Office’s Hadley Centre made data publicly available in December after the “climategate” e-mail row.

Thanks, Hannah Devlin.

And here's the bit I wrote:

The history of science is filled with stories of amateur scientists who made significant contributions. In 1937 the American amateur astronomer Grote Reber built a pioneering dish-shaped radio telescope in his back garden and produced the first radio map of the sky. And in the 19th century the existence of dominant and recessive genes was described by a priest, Gregor Mendel, after years of experimentation with pea plants.

But with the advent of powerful home computers, even the humble amateur like myself can make a contribution.

Update And now the story has been picked up by Nature. And Fox News.

Climate Change Skepticism: You're doing it wrong

The following is a popular picture used by climate change skeptics to attempt to show that there's something seriously wrong with the the surface temperature record which is used to show that the world is getting hotter.

It appears to show that two weather stations with Stevenson screens are situated right at the end of the runway of Rome Ciampino. It's not hard to put two and two together and see that the wash from the engines of departing jets would cause the temperature indicated by the thermometers to be much too high.

Between the two Stevenson screened boxes is an automated weather observation station used by the airport. So it too would be affected by aircraft wash.

Now, any pilot will tell you that knowing the barometric pressure at the airport (as reported by the QFE code) and the local temperature are vital data in setting the altimeter correctly when landing. Aircraft altimeters work on atmospheric pressure and when approaching the airport the pilot is told the current pressure so they can set their instrument correctly.

The temperature also matters because it can affect the altimeter reading when the weather is cold. Pilots need to know both accurately to land safely. So does it seem likely that at Rome airport the weather station is heated by aircraft wash?

Of course not, and if you zoom out from that picture and orient it North/South (rather than South/North) you'll see a different picture:

You can make our the stations near the top left-middle of the picture. They are far from the runway and positioned near an aircraft parking area.

If you are going to be a skeptic go with the Wikipedia definition: A scientific (or empirical) skeptic is one who questions the reliability of certain kinds of claims by subjecting them to a systematic investigation.. One picture does not a systematic investigation make.

Update That picture was alluded to in a Sunday Times article this weekend:

Watts has also found examples overseas, such as the weather station at Rome airport, which catches the hot exhaust fumes emitted by taxiing jets.

And The Daily Telegraph has a similar story:

A weather station at Rome airport was found to catch the hot exhaust fumes emitted by taxiing jets.

That statement is inaccurate. The weather station is not near a taxiway, it's near a parking area. And not even a parking area next to the terminal building.

And even if it was, one bad thermometer doesn't mean climate change can be thrown out the window.

Sunday, February 14, 2010

A bad workman blames his tools

One of the most depressing things about being a programmer is the realization that your time is not entirely spent creating new and exciting programs, but is actually spent eliminating all the problems that you yourself introduced.

This process is called debugging. And on a daily basis every programmer must face that fact that as they write code, they write bugs. And when they find that their code doesn't work, they have to go looking for the problems they created for themselves.

To deal with this problem the computer industry has built up an enormous amount of scar tissue around programs to make sure that they do work. Programmers use continuous integration, unit tests, assertions, static code analysis, memory checkers and debuggers to help prevent and help find bugs. But bugs remain and must be eliminated by human reasoning.

Some programming languages, such as C, are particularly susceptible to certain types of bugs that appear and disappear at random, and once you try figuring out what's causing them they disappear. These are sometimes called heisenbugs because as soon as you go searching for them they vanish.

These bugs can appear in any programming language (and especially when writing multi-threaded code where small changes in timing can uncover or cover race conditions). But in C there's another problem: memory corruption.

Whatever the cause of a bug the key steps in finding an eliminating a bug are:

  1. Find the smallest possible test case that tickles the bug. The aim is to find the smallest and fastest way to reproduce the bug reliably. With heisenbugs this can be hard, but even a fast way to reproduce it some percentage of the time is valuable.

  2. Automate that test case. It's best if the test case can be automated so that it can be run again and again. This also means that the test case can become part of your program's test suite once the bug is eliminated. This'll stop it coming back.

  3. Debug until you find the root cause. The root cause is vital. Unless you fully understand why the bug occurred you can't be sure that you've actually fixed it. It's very easy to get fooled with heisenbugs into thinking that you've eliminated them, when all you've done is covered them up.

  4. Fix it and verify using #2.

Yesterday, a post appeared on Hacker News entitled When you see a heisenbug in C, suspect your compiler’s optimizer. This is, simply put, appalling advice.

The compiler you are using is likely used by thousands or hundreds of thousands of people. Your code is likely used by you. Which is more likely to have been shaken out and stabilized?

In fact, it's a sign of a very poor or inexperienced programmer if their first thought on encountering a bug is to blame someone else. It's tempting to blame the compiler, the library, or the operating system. But the best programmers are those who control their ego and are able to face the fact that it's likely their fault.

Of course, bugs in other people's code do exist. There's no doubt that libraries are faulty, operating systems do weird things and compilers do generate odd code. But most of the time, it's you, the programmer's fault. And that applies even if the bug appears to be really weird.

Debugging is often a case of banging your head against your own code repeating to yourself all of the impossible things that can't ever happen in your code until one of those impossible things turns out to be possible and you've got the bug.

The linked article contains an example of exactly what not to conclude:

“OK, set your optimizer to -O0,”, I told Jay, “and test. If it fails to segfault, you have an optimizer bug. Walk the optimization level upwards until the bug reproduces, then back off one.”

All you know from changing optimization levels is that optimization changes whether the bug appears or not. That doesn't tell you the optimizer is wrong. You haven't found the root cause of your bug.

Since optimizers perform all sorts of code rearrangement and speed ups changing optimizer levels is very likely to change the presence or absence of a heisenbug. That doesn't make it the optimizer's fault; it's still almost certainly yours.

Here's a concrete example of a simple C program that contains a bug that appears and disappears when optimization level is changed, and exhibits other odd behavior. First, here's the program:

#include <stdlib.h>

int a()
int ar[16];

ar[20] = (getpid()%19==0);

int main( int argc, char * argv[] )
int rc[16];

rc[0] = 0;


return rc[0];

Build this with gcc under Mac OS X with the following simple Makefile (I saved it in a file called odd.c):


odd: odd.o

And here's a simple test program for run it 20 times and print the return code:


for i in {0..20}
./odd ; echo -n "$? "

If you run that test program you'd expect a string of zeroes, because rc[0] is never set to anything other than zero in the program. Yet here's sample output:

$ ./test
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

If you are an experienced C programmer you'll see how I made that 1 appear (and why it appears at different places), but let's try to debug with quick a printf

rc[0] = 0;

printf( "[%d]", rc[0] );


Now when you run the test program the bug is gone:

$ ./test
[0]0 [0]0 [0]0 [0]0 [0]0 [0]0 [0]0 [0]0 [0]0 [0]0 [0]0
[0]0 [0]0 [0]0 [0]0 [0]0 [0]0 [0]0 [0]0 [0]0 [0]0

Weird, so you move the printf:

rc[0] = 0;


printf( "[%d]", rc[0] );

and get the same odd result of a disappearing bug. And the same thing happens if you turn the optimizer on even without the printfs (this is the opposite of the situation in the linked article):

$ make CFLAGS=-O3
gcc -O3 -c -o odd.o odd.c
gcc odd.o -o odd
$ ./test
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

This all came about because the function a() allocates a 16 integer array called ar and then promptly writes past the end of it either 1 or 0 depending on whether the PID of the process is divisible by 19 or not. It ends up writing on top of rc[0] because of the arrangement of the stack.

Adding printfs or changing optimization level changes the layout of the code and causes the bad write to not hit rc[0]. But beware! The bug hasn't gone, it's just writing on some other bit of memory.

Because C programs are suspectible to this sort of error it's vital that good tools are used to check for problems. For example, the static code check splint and the memory analyzer valgrind help eliminate tons of nasty C bugs. And you should build your software with the maximum warning level (I prefer warn-as-error) and eliminate them all.

Only once you've done all that should you start to suspect someone else's code. And even when you do, you need to follow the same steps to reproduce the bug and get to the root cause. Most of the time, unfortunately, bugs are your fault.

Saturday, February 13, 2010

If you're searching remember your TF-IDF

Some people seem to be very good at searching the web, others seem to be very poor at it. What differentiates them? I think it's unconcious knowledge of something called TF-IDF (or term frequency-inverse document frequency).

If you clicked through to that Wikipedia link you were probably confronted by a bunch of mathematics, and since you are reading this you probably hit the back button as quickly as possible. But knowing about TF-IDF requires no mathematical knowledge at all. All you need is some common sense.

Put yourself in the shoes of a search engine. Sitting on the hard disks of its vast collection of computers are all the web pages in existence (or almost). Along comes a query from a human.

The first thing the search engine does is discard words that a too common. For example, if the search query contained the word 'the' there's almost no point using it to try to distinguish web pages. All the English ones almost certainly contain the word 'the' (just look at this one and count them).

With the common words removed the search engines goes looking for pages that match the remaining terms and ranks them in some useful order (a lot of Google's success is based on their ranking algorithm). One thing the search engine can take into account is how common the remaining words in the search query are.

For example, suppose the query was "the first imagineer". The search engine ignores 'the' and looks for pages containing "first imagineer". Obviously the results returned need to contain both words, but 'imagineer' is special: it's a rare word. And relatively rare words are a human's best searching friends.

A rare word allows the search engine to cut down the number of pages it needs to examine enormously, and that ends up giving the user better results. The ideal rare word is one that appears almost only on the sort of pages the end user is looking for, and appears in those pages frequently.

In nerdy terms 'appears in those pages frequently' is the TF (or term frequency), and 'almost only in the sort of pages end user is looking for' is the IDF (inverse document frequency).

Since 'imagineer' is a rare word, if the search engine finds a page on which that word occurs many times it's more likely to be relevant to the person searching that on a page where 'imagineer' appears only a few times.

Since 'first' is fairly common it's contribution to the search results is less clear. If 'first' appears many times on the page, but 'imagineer' only once then it's likely that the page is of lesser interest.

When you are searching give a few seconds thought to TF-IDF and ask yourself 'what words are most likely to appear only in the sort of pages I am looking for?' You'll likely get to where you wanted to go much faster that way.

PS If you find out who the first imagineer was, drop me a line.

Friday, February 12, 2010

So you think machine learning is boring...

Here's something I wrote for the company blog.

If you say the words 'machine learning' to people they either look confused or bored. Since the promise of Artificial Intelligence evaporated in the 1970s, machine intelligence seems to be one of those things that's a perpetual 20 years away.

But computer industry insiders know that many forms of machine learning are at work all the time. The most common and visible are recommendation systems like the one on that comes up with suggestions for other books you might like. But even that doesn't express the true power of state of the art algorithms.

But a helicopter doing backflips and a walking robot do.

Read the rest.

Thursday, February 11, 2010

Is your new technology crappy enough?

Email developed from a file transfer based protocol to the widely used SMTP (which was created in 1982). As a protocol it leaves much to be desired: senders can be forged, it's a playground for spammers, the ability to send anything other than plain text (ASCII) messages had to be added with duct tape at a later date, its error messages are cryptic and it has few facilities to deal with the complexities of back and forth exchanges of messages on the same subject. But email has one redeeming feature: its forward and store nature. The very essence of email is the ability to send a message, get it delivered to the recipient and allow that recipient to read the message when they choose to.

Instant messaging (starting with services like talk on Unix and other systems) are similarly poor. They allow two users (and sometimes more) to exchange text based message in real time. In general they do a poor job of doing anything else. Sending files or pictures often ends in disappointment as firewalls between recipients block the transaction. But for all its poor functionality IM is wildly popular, precisely because it allows simple communication in real time.

The same goes for SMS messaging. It's too short, character based and difficult to type (yes, Steve, even on an iPhone). Yet it took off. Although it is credited as having taken off because SMS charges were much lower than calling between mobile phones in Europe, I think there's another reason. SMS allows the recipient to decide what to do with the message. In that way, SMS is a polite technology. Unlike the telephone that interrupts the person answering it, SMS allows the essence of communication to get through quickly. (It is any wonder that MMS isn't similarly popular? If you send me a picture of your child doing something cute, we almost certainly want to have a conversation about it to satisfy human needs for affirmation.)

Skype is poor. For a long time I tried to use it for business purposes and it was too unreliable. Yet it's very successful at allowing people to talk cheaply for personal calls. The secret of its success, IMHO, is its ability to make the firewall problem that plagues IM go away. Even though calls vary wildly in quality, the user interface is difficult to use (and Skype insists on adding functionality that distracts from its core purpose), solving the firewall problem is its key value. Doing that made free voice and video calling possible.

Twitter is much derided, yet very, very popular. Why? It serves a need: asymmetric communication. Unlike Skype, IM, SMS or email, Twitter allows a person to graphcast: to send messages to people who have chosen to be attached to them. This functionality didn't exist in any previous form. This turns out to be very useful for marketing, and very useful for casual connections between people (without the people agreeing on the connection). In fact, Twitter shares an important characteristic with the web: it's permissionless.

Similarly, the web is rubbish. The web's greatest problem is that links rot, yet, in fact, that's the web's greatest triumph. Without trying to enforce connections between sites, the web allowed anyone to publish whatever they want... fast. Other competing hypertext systems were too complex and required too much central control to be successful. Twitter and the web work because there's no asking permission to follow or link.

Which brings me to Google Wave. What's Google Wave's single greatest piece of functionality that solves a real need? I can't see one. Google Wave looks like exactly what the designers said it was: "what email would look like if we designed it today". The trouble with designing systems after bathing in the mineral waters of Silicon Valley is you get functionality overload (as an aside, this is what makes Apple so successful: it's not what Apple puts in their products as much as what they leave out).

I wonder what the telephone would look like if it were designed in Mountain View today. "Well, Mr Bell, I'm sorry but your invention will never take off: the person on the other end is indistinct, I have to remember a number to get to them, I can't tell if they are even there to answer the call, I can't see them when I'm speaking, and how are we going to exchange pictures? Sorry, come back when you've invented telepresence".

So, next time you are inventing a communication technology (or, possibly, any technology), ask yourself: "Is this crappy enough to be successful?"

Wednesday, February 10, 2010

A year without TV

So it's been a year. A year ago I moved house and didn't unpack the TV. It was a nice TV: a 42" flat panel display which when connected to the right sort of cable receiver could be used to watch broadcast TV. With the TV in a box, I never subscribed to any cable or satellite service. My house is without a TV receiver of any kind.

And I don't miss it. Only occasionally do I get that urge to switch on the goggle box and just watch some nonsense to se changer les idées (take my mind off things).

The real revelation came when I was staying in a hotel and for the first time ever simply didn't switch the hotel TV on. I knew that there was nothing worth seeing on it.

That's not to say that I haven't watched TV programmes. I'm an avid follower of 24 which I've watched by buying season pass via the iTunes Store. I've made use on occasion of the BBC iPlayer watching a total of 11 programmes in the last year (thank you iPlayer for keeping a record). The most interesting of those programs were The Secret Life of Chaos and Chemisty: A Volatile History.

And I've watched lots of movies through the DVD subscription service LoveFilm (the most recent of which was Skin).

Amusingly, I succumbed to the idea of watching rubbish on TV and decided to watch the most recent episode of So you think you can dance. It was a wonderful reminder of why I don't have a TV. In fact, it was like being parachuted into a strange world filled with consistently ugly, shallow people wearing too much make up. It was only by being away from TV for so long that I saw it like that, I'm sure if I'd been watching TV all along I would never have noticed (just like the proverbial frog being slowly heated in a pan of water).

But the thing I miss the least is TV news. It's all about panic and fear and not about analysis. I seriously wonder how much harm TV news is doing to society.

The nice thing is that the Internet can kill TV without killing TV programmes. I'm very happy to pay to rent DVDs and pay to buy individual programs. If the BBC hadn't made the programmes cited above available freely I would have been very tempted to pay for them individually.

I realize that some readers will wonder why I would pay for content when I could probably download it and violate copyright for free. The main reason is guaranteed quality. I probably could spend my time searching for those programmes on some torrent site, but just as I don't want to waste time channel hopping for something good to watch, I don't want to waste my time downloading torrents only to find they are corrupt, incomplete or overdubbed in Urdu.

It's a simple trade-off: I'll give copyright holder $X if they'll guarantee that I get a high quality copy of their programme when I want it.

Another proposed solution is the PVR. This is a bizarre solution which when compared to paying to download programmes from the Internet seems almost ridiculous. It works like this. You pay to receive a random selection of TV programmes broadcast at times you do not select. You then pay to have a device that you must programme to wake up and record those programmes so that you can then watch them when you want. You have to tell the PVR what programmes you want to watch before you know about them; you can't subsequently decide to watch something.

The Rank Amateur

In 1937 an amateur American astronomer named Grote Reber completed construction of a 9 meter radio telescope in his back garden. By 1940 he had verified that there were radio signals coming from the heavens and by 1943 he had completed a radio frequency map of the sky. Reber, with his enormous hand-built dish, kick started radio astronomy and eventually sold his invention to the US government.

One of the biggest advances in the understanding of genetic inheritance was made by Gregor Mendel. Mendel was a Augustinian priest. In 1866 his paper on plant hybridization (Mendel had spent years observing and experimenting with pea plants) showed the existence of dominant and recessive genes. Mendel's discoveries went largely unnoticed until rediscovered two professional scienstists.

Amateurs like Reber and Mendel have made enormous contributions to science ever since science was called natural philosophy. So it's dismaying to see a professor from Oxford University write in The Guardian: "The most effective people at finding errors in scientific reasearch are scientists: it was professional glaciologists, after all, who exposed the error in the IPCC 2007 case study of Himalayan glaciers." To exclude the amateur is to deny a large part of the history of science.

Another amateur who made a big impact on science is Albert Einstein. Although Einstein had received a scientific education (after having been refused entry to the prestigious ETH Zurich) he was unable to find a research post and did all his pioneering work while working for the Swiss Patent Office.

I'm no Reber, Mendel or Einstein, but don't rule us amateurs out. In December 2009 the Met Office released thousands of records of temperature readings from around the globe stretching from the present day to 1850. These records form a vital part of the evidence that the globe is warming and the climate changing.

I thought it would be a fun hobby project to use those records to reproduce the worrying charts that show the increase in global temperatures. Since I'm a professional computer programmer I wrote software to process the Met Office data. You can see the result in this YouTube video.

Because I was working with unfamiliar data I put special functions into my program to ensure that I wasn't making any mistakes. To my surprise these functions began reporting that there was something wrong with temperature data in Australia and New Zealand. I whipped out a pocket calculator and checked that my program wasn't mistaken and then reported the problems to the Met Office. They quickly acknowledged that I was right.

Last month the Met Office released an update to CRUTEM3 and HADCRUT3, the critical data sets used to track global warming. The new version contains corrections for all the errors I reported.

Making a distinction between professional and amateur in science is artificial: what matters is the 'what' of science not the 'who'. And amateurs have by their very nature something that professionals don't need to have: passion. Without the comfort of a tenured position, a subsidized bed in an ivory tower, or a well funded laboratory, the only thing keeping amateurs going is a love for their subject.

PS In the comments a professional scientist wrote to object to my final paragraph. I urge you to read his comment since it makes good points. In my defense I didn't mean to say that professional scientists lack passion, just that that's all the amateurs have got. His point is that professionals need to have passion because the funding environment for science is so bad that they're certainly not in it for the money or security!

Tuesday, February 09, 2010

Wing Kong

Lufthansa is running a competition to name one of their A380 jets. You can submit your own entry here.

My suggestion is Wing Kong. If you like it you can vote for me on their site.

Wing Kong: it's big, it's powerful, it comes from a strange, dark place (well, Toulouse). It's also just a little bit romantic.

Monday, February 08, 2010

24 years of email

I first got an email address with an Internet @ in it in 1986. It was [email protected], or for those of you on JANET it was [email protected] (happily I only briefly used bang paths). In 24 years I think there have been three major end-user innovations: address books, MIME and email searching.

Address Books

Initially, I didn't need an email address book. Most of the people I was emailing were on the same domain (often the same machine) and so everything after the @ was irrelevant. And the number of people on email world-wide was so small that remembering their email addresses was easy (I don't mean remembering them all, just remembering the ones I needed to talk to).

And most people's domains hadn't reached the point where just using initials was unworkable. So most email addresses consisted of their initials. That made them short and rememberable. I don't recall anyone with a ridiculous address like [email protected]

But things changed: the Internet got bigger, people's addresses got more complex, I was communicating with more and more people. Hence address books.


The ability to send more than just plain text inside an email (even if it is actually being transmitted as 7-bit ASCII) was big. Prior to the introduction of MIME in 1992 there were some limited ways to send binary content in email (mostly using uuencode) but it was an ugly mess and mail clients often didn't know what to do with the contents and you were forced to save the mail to a file and manually unpack it.

Happily, MIME made that problem go away.

Email Searching

As email got considerably more widespread it became necessary to put it into folders to try and keep a handle on the volume. This led to the sort of trees of folders that are seen in programs like Microsoft Outlook. This is, IMHO, a less than optimal solution. The right solution is the sort of high-speed email searching offered by Google Mail. With it folders are completely irrelevant.

In fact foldering was such a pain that it was part of the reason I invented POPFile.

The Bad

Two bad things have happened since I started using email: spam (first spam was in 1978 on ARPANET, but I don't recall any unwanted messages during the late 1980s at all) and HTML email. HTML email has been a spammers playground and for messages I want to receive (i.e. everything other than marketing) it's almost useless.

Minor irritations are: vacation responders, people who don't edit replies sending me gigantic threads embedded in a message.

8 years

That's one major innovation every 8 years. With Google Mail being released in 2004 we've got another 2 years to wait for the next one. What do you think it will be? For me it has to be something to do with threading. That's still pretty messy, and Google Wave doesn't seem to have improved it. I don't think the little > is cutting it anymore.

Sunday, February 07, 2010

Something odd in the CRUTEM3 station errors

Out of the blue I got a comment on my blog about CRUTEM3 station errors. The commenter wanted to know if I'd tried to verify them: I said I hadn't since not all the underlying data for CRUTEM3 had been released. The commenter (who I now know to be someone called Ilya Goz) correctly pointed out that although a subset had been released, for some years and some locations on the globe that subset was in fact the entire set of data and so the errors could be checked.

Ilya went on to say that he was having a hard time reproducing the Met Office's numbers. I encouraged him to write a blog post with an example. He did that (and it looks like he had to create a blog to do it). Sitting in the departures lounge at SFO I read through his blog post and Brohan et al.. Ilya's reasoning seemed sound, his example was clear and I checked his underlying data against that given by the Met Office.

The trouble was Ilya's numbers didn't match the Met Office's. And his numbers weren't off by a constant factor or constant difference. They followed a similar pattern to the Met Office's, but they were not correct. At first I assumed Ilya was wrong and so I checked and double checked has calculations. His calculations looked right; the Met Office numbers looked wrong.

Then I wrote out the mathematics from the Brohan et al. paper and looked for where the error could be. And I found the source. I quickly emailed Ilya and boarded the plane to dream of CRUTEM and HadCRUT as I tried to sleep upright.

Mathematical Interlude

The station error consists of three components: the measurement error, the homogenisation error and the normal error. The first two are estimated in the paper as 0.03°C and 0.4°C respectively. The normal error is calculated from the standard deviation information in the station files.

The formula for the normal error for a single month, i, is as follows:

Unfortunately, the paper uses rather sloppy mathematical language because the N on the left is not the N on the right, the subscript i isn't defined, and so I am going to express this a bit more clearly as follows:

This means that normal error for month i is the standard deviation for month i (that's σi) divided by the square root of the number of years used to generate the normal values in the station files (which I call mi). Typically we have:

because 30 years of data from 1961 to 1990 are used. In cases where less than 30 years are available (because of missing data) then a number less than 30 is used.

Now to get the station error, εi, the three error components are joined together by quadrature as follows:

That works for any grid square where for any month there's just a single station reporting a temperature, but in general there are more. So when there are many station errors they are averaged using a root mean square and then divided by the square root of the number of stations.

Suppose there are n stations each with a station error εi,j (to which I've added the subscript j to differentiate them) then the final station error for a month i is as follows:

Return to narrative

What Ilya had discovered was that the formula above (from the paper) works only when there is a single station in a grid square. When there were two or more it failed; that's when he approached me asking for help.

What I discovered at the airport was that if you replaced the number 30 with 15 the formula worked and the values for station errors for grid squares containing exactly two stations were now correct.

Both Ilya and I came to the same conclusion that in fact the number 15 wasn't picked from thin air, but in fact was 30 divided by 2 (the number of stations in the grid square). We both tested this hypothesis on squares with more than two stations and found that it worked.

So it appears that the normal error used as part of the calculation of the station error is being scaled by the number of stations in the grid square. This leads to an odd situation that Ilya noted: the more stations in a square the worse the error range. That's counterintuitive, you'd expect the more observations the better estimate you'd have.


Ilya had shown me an example in 1947, but I didn't want to take his word for it (although he later showed me a program to check all the stations errors so I should have believed him), and so I took a look at three locations in January 1850. For these three locations all the data underlying CRUTEM3 had been released:

1. The grid square which consists of the single station 723660: this corresponds to the grid square with corner 35N, 105W. Here the Met Office data gives station errors of: 0.5072 0.5424 0.4857 0.4962 -1e+30 -1e+30 0.4407 0.4407 -1e+30 0.4756 -1e+30 0.5186. The strange negative numbers are missing data (it's missing because in the underlying file there are no normals for 1850 in those months, although the actual normals aren't needed for the station error calculation so it doesn't matter). Using the formula from the paper give the correct answer: 0.5072 0.5424 0.4857 0.4962 0.4486 0.4756 0.4407 0.4407 0.4661 0.4756 0.5072 0.5186. This makes sense since our correction value of 1 for 1 station in the square doesn't change the formula.

There is, however, something else wrong with this. The paper says that if less than 30 years of data are available the number mi should be set to the number of years. In 723660 there are only 17 years of data, so this station error appears to have been incorrectly calculated based on 30 years.

2. The grid square which consists of the two stations 753041, 756439: this corresponds to the grid square with corner 35N, 80W. Here the Met Office data gives station errors of: 0.6168 0.569 0.5452 0.4008 0.4345 0.3642 0.3373 0.353 0.3881 0.4624 0.4076 0.5767 and using the formula from the paper (without our correction): 0.4801 0.4496 0.4346 0.3472 0.3669 0.3264 0.3116 0.3202 0.3399 0.3836 0.3511 0.4545. If a correction of 2 is used so that each σi is divided by the square root of 15 instead of 30 the correct values are generated.

3. The grid square which consists of the four stations 720388, 724080, 756192, 756490: this corresponds to the grid square with corner 35N, 75W. Here the Met Office data gives station errors of: 0.5073 0.4409 0.4329 0.3361 0.3286 0.2905 0.2712 0.2807 0.2973 0.3739 0.3325 0.4613 and using the formula from the paper (without our correction): 0.3074 0.2807 0.2775 0.2417 0.2391 0.2264 0.2204 0.2233 0.2286 0.2552 0.2404 0.2887. If a correction of 4 is used so that each σi is divided by the square root of 7.5 instead of 30 the correct values are generated.


I have no idea why the correction given in this blog post by Ilya and I works: perhaps it indicates a genuine bug in the software used to generate CRUTEM3, perhaps it means Ilya and I have failed to understand something, or perhaps it indicates a missing explanation from Brohan et al. I also don't understand why when there are less than 30 years of data the number 30 appears to still be used.

If these are bugs then it indicates that CRUTEM3 will need to be reissued because the error ranges will be all wrong.

I've emailed the Met Office asking them to help. If you see an error in our working please let us know!

Wednesday, February 03, 2010

A compliment from The Times

The Times has kindly mentioned this blog as one of its Top 30 Science Blogs saying:

John Graham-Cumming is one of the few people out there who makes the nuts and bolts of computer programming actually sound interesting. Expect anything from an analysis of the statistical likelihood of election fraud in the last Iranian election to the unveiling of flaws in the Met Office’s global climate models.