Over on the Watts Up With That blog there was a story entitled Analysis: CRU tosses valid 5 sigma climate data. When I saw the headline I thought, "Yes, that's right, it's specifically mentioned in Uncertainty estimates in regional and global observed temperature changes: A new data set from 1850 that outlier anomalies above (and below) 5 sigma are removed. What's the big deal?"
Of course, to the readers of WUWT the big deal is the term 'valid' data. It's a tenet of climate skeptics that CRU was up to all sorts of shenanigans with data and no stone must be left unturned at attempting to achieve perfection. And also for WUWT readers the fact that the blog post highlighted a year when it was exceptionally cold adds fuel to the fire. Since 1936 was really, really cold its anomaly exceeded 5 sigma and so was excluded. It's not hard to see how someone who's a climate skeptic could think that implies that the temperature trend is incorrectly too warm.
But let's ignore the subtext and look at some of the claims. The blog post states: "When they toss 5 sigma events it appears that the tossing happens November through February". That's not hard to check, let's use the Met Office's own program to verify it. I made a small modification to the code available here.
Here's a chart showing the number of anomalies dropped by month.
For November through February there are a total of 219 anomalies removed. But many more are dropped in the summer. So the claim seems odd, but it's based on a claim that during those months the size of the removed anomaly is greater (oddly the analysis only looks at the top 100 months, not sure why they couldn't just look at them all).
Here's a chart that shows the average anomaly excluded by month.
So clearly the excluded data is greatest in the winter. But how much difference do these exclusions make? There are 805 removed anomalies and 3,274,355 used. So a total of 0.02% of the data is not used.
Now using the same program it's possible to plot a graph showing two different pieces of information to see whether dumping this tiny fraction of data makes any real difference.
1. The original trend with the 5 sigma removal
2. The trend with all the data included (i.e. nothing excluded).
Here's that picture.
mo_original is the output of the Met Office's program untouched, mo_all is with the 805 excluded data points. So, as might be expected from such a small amount of data, including it doesn't make much difference at all.