Wednesday, August 04, 2010

On the release of scientific source code

I was asked by The Times for commentary on the idea that releasing scientific source code could have negative effects:

This view is countered by programmer John Graham-Cumming, who found coding errors after trying to reproduce the CRU/Met Office's CRUTEM and HadCRUT global warming datasets. Working from the raw data released by the Met Office and the description of their process for generating the datasets in a scientific paper he decided to validate their work - a considerable effort that required writing code to implement the algorithm described in the paper. In doing so, he found a problem with the way the error ranges were calculated (amongst other errors), stemming from a bug in their code.

He says: "You could say that by not releasing their buggy code they forced me to find the bug in it by writing my own validation. But actually, if they'd released their code I would have been able to quickly compare the code and the paper and find the bug without the massive effort to write new code. And no one else had actually done this validation (including the Muir Russell review) and as a result the Met Office has been releasing incorrect data for a long time. Perhaps that's because the validation was so hard in the first place, whereas having code to check would have been easy."

The rest is here.


Michael said...

The article itself is hidden behind the Times pay wall. Thank you for posting your relevant portion.

Anonymous said...

Have you tried accessing the CCSM4/CESM1 source code repository?