After all the Benford's Law posts I've made, I read an interesting article by Alexandra Scacco and Bernd Beber in the Washington Post about analyzing the last two digits of election results.
Their thesis is that the last two digits should be random (i.e. equally likely) in genuine results, and would be non-random in faked results because people have biases about which numbers they come up with when thinking of 'random' numbers.
Scacco and Beber have an annotated version of the article available and their actual paper.
I thought this was pretty interesting I decided to run my own analysis on the 2001 UK General Election and the 2009 Iranian Presidential Election. Scacco and Beber used a smaller set of Iranian data (the first set that I used) and so I ran my reanalysis against the larger set from per-country returns.
Start with the UK. The following chart shows the expected distribution of last and second-to-last digits (i.e. a uniform distribution: all digits are equally likely) and the actual counts. A quick application of the chi-squared test shows that there's a good fit: we can't reject the hypothesis that the UK digits are uniformly distributed (i.e. random).
Now switch to Iran. Once again I show the exact same analysis of the vote counts across all candidates across the country. And once again the chi-squared test shows that these are random.
These two show random distribution. The chi-squared test confirms that (the actual values are 11.125 for the last digit and 4.875 for the second to last digit. With 9 degrees of freedom the critical cut off point is 16.92 and neither of these exceeds that so we cannot reject the hypothesis that the Iranian digits are uniformly distributed.
It's an intriguing idea that just by looking at the numbers it would be possible to detect election fraud, but it equally seems to me that you could cherry pick your data to come up with your viewpoint.
For example, in my analysis the UK election is not Benford's Law distributed but the Iranian one is. Which is fraudulent? Either, both, neither?
Also, my analysis shows that both the UK and the Iranian election have randomly distributed last digits. Are either fraudulent? Or neither?
I think what's needed is a large scale analysis of election results to see where and when different mathematical tests work. Otherwise the correct preconditions aren't established (e.g. in the UK election Benford's Law probably fails because of the redistribution of constituencies) and you can end up finding your favorite conclusion in the data.