Thursday, April 18, 2013

The importance of open code

Last February myself, Professor Darrel Ince and Professor Les Hatton had a paper published in Nature arguing for openness in the code used for scientific papers. The paper is called The Case for Open Computer Programs.

In a coda to that piece Darrel wrote the following:
Our intent was not to criticise; indeed we have admiration for scientists who have to cope with the difficult problems we describe. One thesis of the article is that errors occur within many systems and does not arise from incompetence or lack of care. Developing a computer system is a complex process and problems will, almost invariably, occur. By providing the ability for code to be easily perused improvement will happen. This is the result detailed in both the boxes in the article: the Met Office data is more accurate, admittedly by a small amount, and because of feedback to developers the geophysical software was considerably improved.
Recently, an important paper in economics has been in the news because its conclusions turn out to be inaccurate for a number of reasons. One of those reasons is a programming error using the popular Microsoft Excel program. This error, in an unreleased spreadsheet, highlights just how easy it is to make a mistake in a 'simple' program and how closed programs make reproducing results difficult.

The original paper by Reinhart and Rogoff is Growth in a Time of Debt and it concludes the following:
[...] the relationship between government debt and real GDP growth is weak for debt/GDP ratios below a threshold of 90 percent of GDP. Above 90 percent, median growth rates fall by one percent, and average growth falls considerably more.
They point to a serious problem with growth rates once the debt/GDP ratio is above 90%. As this is an important economic topic at the moment other economists have attempted to replicated their findings from the original data. One such reproduction is Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff which finds:
Herndon, Ash and Pollin replicate Reinhart and Rogoff and find that coding errors, selective exclusion of available data, and unconventional weighting of summary statistics lead to serious errors that inaccurately represent the relationship between public debt and GDP growth among 20 advanced economies in the post-war period. They find that when properly calculated, the average real GDP growth rate for countries carrying a public-debt-to-GDP ratio of over 90 percent is actually 2.2 percent, not -0:1 percent as published in Reinhart and Rogo ff. That is, contrary to RR, average GDP growth at public debt/GDP ratios over 90 percent is not dramatically different than when debt/GDP ratios are lower.
The coding error referred to there is a mistake in an Excel spreadsheet that excluded data for certain countries. And the original authors have admitted that that this reproduction is correct:

On the first point, we reiterate that Herndon, Ash and Pollin accurately point out the coding error that omits several countries from the averages in figure 2.  Full stop.   HAP are on point.   The authors show our accidental omission has a fairly marginal effect on the 0-90% buckets in figure 2.  However, it leads to a notable change in the average growth rate for the over 90% debt group.
All this brought to mind my own discovery of errors in code (first error, second error) written by the Met Office. Code that was not released publicly.

There's a striking similarity between the two situations. The errors made by the Met Office and by Reinhart and Rogoff were trivial and in the same type of code. The Met Office made mistakes calculating averages, as did Reinhart and Rogoff. Here's the latter's spreadsheet with the error:

The reality of programming is that it is very easy to make mistakes like this. I'll repeat that: very easy. Professional programmers do it all the time (their defense against this type of mistake is to have suites of tests that double check what they are doing). We should expect errors like this to be occurring all the time.

What's vital is that scientists (including the dismal kind) consider their code (be it in Excel or another language) as an important product of their work. Publishing of data and code must become the norm for the simple reason that it makes spotting errors like this very, very quick.

If Herndon, Ash and Pollin had had access to the original Excel spreadsheet along with the data they would have very quickly been able to see the original authors' error. In this case Excel even highlights for you the cells involved in the average calculation. Without it they are forced to do a ground-up reproduction. In this particular case they couldn't get the same results as Reinhart and Rogoff and had to ask them for the original code.

An argument against openness in code is that bad code may propagate. I call this the 'scientists protecting other scientists from themselves' argument and believe it is a bad argument. It is certainly the case that it's possible to take existing code and copy it and in doing so copy its errors, but I believe that the net result of open code will be better science not worse. Errors like those created by the Met Office and Reinhart and Rogoff can be quickly seen and stamped out while others are reproducing their work. 

A good scientist will do their own reproduction of a result (including writing new code); if they can't reproduce a result then, with open code, they can quickly find out why (if the reason is a coding error). With closed code they cannot and science is slowed.

It is vital that papers be published with data and code for the simple reason that even the best organizations and scientists make rudimentary errors in code that are hard to track down when the code is closed.

PS It's a pity that one year after the Met Office argued that for open data and code the code to reproduce CRUTEM4 is yet to be released. I hope, one day, that when papers are published the code and data will be available at the same time. We have the networking technology and storage space to do this.


Alex Kashko said...

It sounds to me like it was a DATA error not a CODE Error. If the mistakes had been in the CODE tests could have been run with simplified data.

I repeat. Omitting DATA is not a code Error. In a real science it would often be classed as scientific fraud. Int to days climate the suspicion arises that the data were omitted in order to further an agenda. I will however give the researchers the benefit of the doubt for now

The error was using EXCEL not giving the data to a professional data analyst outside the organisation.

Apart fron that I agree the spreadsheet should have been reviewed before publication.

I would suggest that all such critical documents be peer reviewed.

Frank T. Clark said...

I read the statement "Developing a computer system is a complex process and problems will, almost invariably, occur." with a smile. I propose that problems will always occur because after any process has been developed changes always occur in the process. Since it is impossible to predict what may change in the future and what unexpected use cases may already exist, failure will occur. It is only a matter of when and how badly.

Alex Kashko said...

The same holds for any complex system. Things will change. Even the Imperial Chinses civil service had to evolve.

People do tend to assume the future will mirror the past. This is why we have a recession,

Frank T. Clark said...

That is why support, review, revision, and maintenance are a continuous requirement for any complex system. :-)

Alex Kashko said...

Including the mechanisms of government. LIke Parliament or Congress/senate

Robert Sewell said...

On the other hand, if HAP had RR's Excel code from the beginning, would they have examined it for errors before using it to reproduce their results? If not, they would not have gotten a different answer that led them to question the original code.

Alex Kashko said...

Hmmm..... Two separate clean implementations of a requirements that produce results that differ significantly from each other suggests that one or both are wrong.

So I would say that while open code is a good way to find bugs coding twice from scratch with different algorithms might be a very strong test if the code were not available or extremely obscurely written. I understand safety critical code is written three times in different ways and a majority vote on the results is taken.

Open code is fine but needs open tests with well defined data as well. I would also wonder how open code would reveal problems that only arise in integration, when multiple components interact.