Today the BBC News made me aware of an upcoming book and associated blog about data visualization. Flicking through the blog I was instantly repelled by the horrible data visualizations presented as something to be proud of.
For example, consider the following snippet:
It purports to show you which countries are the most dangerous to fly to by charting the 'density' of fatal aircraft accidents by destination country. This visualization is wrong on so many levels that it's hard not to laugh:
1. The term 'density' is not defined and in fact he isn't showing a density at all, just the raw total numbers by country.
2. The underlying data is incorrect. The diagram specifically says that it uses 'fatal accidents' drawn from this database. Unfortunately, the person doing the visualization has used the total number of 'incidents' (fatal and non-fatal accidents). For example, he gives a value of 75 for the number of fatal accidents in Ecuador, whereas the database gives 38. The same applies for all the other countries.
3. He uses circles with dots in the middle to represent the size. So is the value being plotted proportional to the radius of these circles or the area? Not clear. For example, try to compare Russia and Canada; they look about the same. Now look at Russia and India, India looks smaller to me. So what's the truth?
Using his incorrect figures we have Russia with 626 accidents, India with 456 and Canada with 452. So Russia is a 1.38x more dangerous destination than Canada. Can you spot that from the diagram?
4. Take a look at Europe. Can you figure out which countries he's indicating in the diagram? It's almost impossible.
5. The US comes out as the top country in terms of fatal accidents with 2613 (actually, it's 1088) but that fails to take into account the one important thing: how many flights are there, or even how many people actually fly? The US could be the most dangerous if there are few flights and lots are fatal, but it could even be the safest if there are lots and lots of flights. What you actually want to answer is 'what's the probability of me dying if I fly to the US?' One way to calculate that would be total number of flights with a fatality / total number of flights; another way would be total number of fatalities / total number of passengers. Either way just knowing the total number of crashes doesn't tell you much.
So the diagram charts the wrong statistics, uses the wrong underlying data, and then presents it in a way that's hard to interpret. And this is what gets a book published?