Tuesday, August 31, 2010

A Geek Atlas reader writes from Austria

I received the following mail from one Thomas Kober from Vienna:

First of all, thank you for writing the Geek Atlas. I just returned from a journey to Paris with the Geek Atlas in my bag and I was not disappointed. With it, I probably would not have spent many exciting and cheerful hours wandering through the city and looking for more Arago Medallions.

Your book also inspired me to look for places myself, that were not in the Geek Atlas, and publish them on my blog. I already found some interesting places, eg. the grave of Urbain Le Verrier in the cemetry of Montparnasse or the statue of 16th century flemish mathematician Simon Stevin in the old town of Bruges.

I am also writing hands-on reviews of the places I visited with the Geek Atlas. I also wanted to give you some practical information I found out about them which you might consider helpful or interesting. The Observatory in Paris is closed through the whole of August (which was when I was visiting Paris, my fault as I did not check the home page beforehand).

And the Musee des Arts et Metiers is just amazing. Its one of the best museums I have ever been to and the best of all, it was for free (Its free for every EU citizen aged 25 or below). To see Pascals machines was a very special moment.

I would be happy if you find the time to visit my blog and maybe I can inspire you to visit one of the places I discovered as the Geek Atlas is inspiring me (The Simon Stevin statue will be my first post, which I will publish soon).

My next travels will be to Brno to visit the Mendel Museum and to Bratislava to perhaps discover a geeky place that you have missed :-)

Kind regards from Vienna,

Thomas Kober

PS: I rushed to the grave of Ludwig Boltzmann right on the day I bought the Geek Atlas, as although I am living in Vienna for almost 10 years, I didn't know about the equation on his grave...

I love to get mail like that. Anyone else had a great Geek Atlas experience?

Also, here's Thomas' first blog entry on the statue of Simon Stevin. Looks marvelous!

Friday, August 20, 2010

Keynote Speaker. What should I talk about?

So, I'm going to crowd source this one and see if there are any good topics you'd like to hear about.

I've just been invited to be a keynote speaker at one of the major technology conferences (I'd say the name but I don't want to upset the organizers until I've cleared it with them). It's on the West Coast next summer and is a conference I've attended and really respect (i.e. it's not an industry-fest were marketers talk to each other).

The topic of my talk is completely open.

So, dear readers. What should I talk about? If you think back on all the blog content what would you most like to hear more about?

Thursday, August 19, 2010

Solving Kevin Rose's email problem

Yesterdy, Kevin Rose wrote about his email problem.

My stats:
938 unread work emails.
1002 unread personal emails.

The madness has to stop. What was once a 30 minute annoyance is now my full-time job.

And he followed up with suggestions on how to deal with this problem. I faced a similar problem about 10 years ago when I was receiving a ton of varied email that needed sorting. My solution was to create POPFile.

POPFile became popular as a spam filter, but what I made it for was to sort my flood of mail into arbitrary categories of my choosing using the same technology (Naive Bayesian text classification) that became popular for spam filtering. It's POPFile that got be invited by Paul Graham to the MIT Spam Conference.

Here's what the POPFile site has to say:

POPFile is an automatic mail classification tool. Once properly set up and trained, it will scan all email as it arrives and classify it based on your training. You can give it a simple job, like separating out junk e-mail, or a complicated one-like filing mail into a dozen folders. Think of it as a personal assistant for your inbox.

I've been using it for years. Most recently I have it connect to my GMail account using IMAP and automatically label my messages. By changing labels I can teach POPFile when it makes mistakes. I've toyed with the idea of making this a paid service (especially since I have a really fast version of POPFile written in C that I sell to people), but I think the market's too small.

Just how many people have a POPFile-sized problem?

Wednesday, August 18, 2010

Tony Blair is overpaying for blood

The average human body contains about 5 litres of blood. When you donate blood a total of 470 ml is removed. From the 470 ml a single unit of blood (which consists of 450 ml) is extracted and made available to save someone else's life. So, a single human body contains roughly 11 units of blood.

According to the National Blood Service a unit of blood costs about £130. So a single human body contains about £1,430 worth of blood. As I write the BBC says that 331 British servicemen and women have died in Afghanistan and 179 in Iraq.

This week Tony Blair announced that he was planning to donate the £4.6m advance for his memoirs to the Royal British Legion. So, that's about £9,000 per serviceman or woman ordered into war by Blair who has given their life to protect Britain. He's seriously overpaying. Perhaps it's a penance ahead of the Pope's upcoming visit.

If Blair really wanted to make gesture he'd walk into a National Blood Service location and donate a unit of his own blood, instead of a drop of his fortune. Of course, Blair has already made a double donation to the Royal British Legion: first the injured soldiers, then the money to help them.

For my part I'll be doing what I've done for years: giving blood as often as I can and buying a poppy this winter to support the Royal British Legion.

And if you want to support the same charity consider the following. Blair's book costs £25 of which the publisher will probably get 40% of which Blair will likely get 15%. So, instead of buying the book, donate £1.50.

That's a good amount to give when buying a remembrance poppy as well. Come this November I'll look at those poppies and think of them as little signs that say "I donated to help one of Tony's injured soldiers instead of buying his autohagiography."

Clunk, click, every trip

Working in Mayfair in London I get to see a lot of wealthy people in the back seats of their Bentleys. And they are never, ever wearing a seat belt. That's illegal in the UK, but I imagine that these wealthy scofflaws understand that they are unlikely to get into trouble that actually hurts them (the fine for not wearing the belt is £500).

But you can't be a law of physics scofflaw, if you are unrestrained in a car then inertia's going to get you. Just ask Princess Diana. You do not want to be flying around inside a metal box.

Which brings me to taxis. Why do people not wear seat belts in the back of a taxi? When entering a taxi recall Scotty saying "A canna' change the laws of physics, Captain". Put the seat belt on.

And having flown a lot and seen a stewardess go flying when we hit unexpected turbulence: wear the seat belt on a plane all the time. In fact, I think there ought to be three signs in a plane: seat belt (always illuminated), no smoking (always illuminated) and a new "it's safe to get up" sign that goes on and off and is a replacement for the current seat belt sign.

And if you need any more convincing, just ask Jimmy Saville.

An interview with me about The Geek Atlas

John Baichtal interviewed me recently about The Geek Atlas:

John Baichtal: What is the most notable omission from the first book?

John Graham-Cumming: Hard to say because there's not one glaring omission that people bring up consistently. Everyone's got a favorite place and some aren't in the book. Lots of people wanted more NASA locations, but that would probably fill an entire Geek Atlas! Lots of people wanted more sites in Asia. If there's a second book I'd include some optical telescopes because for some reason (a mystery even to the author) there are none in the first book.

You can read the full interview here.

Monday, August 16, 2010

Is The Times making you stupid?

On Saturday I made a horrible mistake and bought a copy of The Times. In it was one of the stupidest articles I have ever read in a major newspaper. Its title? Is the internet making us stupid?. It's thousands of words of utter drivel claiming that:

For the past five centuries, ever since Gutenberg’s printing press made book reading a popular pursuit, the linear, literary mind has been at the centre of art, science and society. As supple as it is subtle, it’s been the imaginative mind of the Renaissance, the rational mind of the Enlightenment, the inventive mind of the Industrial Revolution, even the subversive mind of Modernism. It may soon be yesterday’s mind.

Let me begin with a detailed, thoughtful critique: bollocks. Seriously though, this 'linear mind' (which we apparently got from books) is the source of the Renaissance, Enlightenment and Industrial Revolution? Surely, it's a complete lack of linearity, as in lateral thinking, that's given us the world we live in.

But the article is far worse than that simple paragraph. Let's start at the beginning:

Over the past few years I’ve had an uncomfortable sense that someone, or something, has been tinkering with my brain, remapping the neural circuitry, reprogramming the memory. My mind isn’t going — as far as I can tell — but it’s changing.

I’m not thinking the way I used to think. I feel it most strongly when I’m reading. I used to find it easy to immerse myself in a book or a lengthy article. My mind would get caught up in the twists of the narrative or the turns of the argument and I’d spend hours strolling through long stretches of prose. That’s rarely the case any more. Now my concentration starts to drift after a page or two. I get fidgety, lose the thread, begin looking for something else to do. I feel as though I’m always dragging my wayward brain back to the text. The deep reading that used to come naturally has become a struggle.

The author is Nicholas Carr. According to Wikipedia Mr Carr was born in 1959 and thus is now 51. Like me, Mr Carr is aging, unlike me Mr Carr seems to be blaming changes in the operation of his mind on the Internet. I understand this, it's a way of avoiding talking about death and deterioration. And Mr Carr has found a wonderful way of dealing with this denial: he's written an entire book shouting at the Internet. (The Times article is pre-publicity for his new book called The Shallows which develops the "Internet is messing up your brain" theme).

He follows up his personal problems with the following survey:

Maybe I’m an aberration. But it doesn’t seem that way. When I mention my troubles with reading to friends, many say that they are suffering from similar afflictions.

Great, that's what's called homophily. You surround yourself with people with the same opinions and tastes. His friends are also likely his age and starting to see the effects of aging as well.

After quoting various people who've noticed that the Internet is used differently from a book, we get the following, stunningly wrong insight:

The net engages all our senses — except, so far, those of smell and taste — and it engages them simultaneously.

This is where I begin to wonder if something has fried Mr Carr's brain's ability to think clearly. "The net engages all our senses" (of which we have 5), "except, so far, those of smell and taste" (so just 3 then). The three senses are: sight, hearing and touch. Yes, the Internet engages sight all the time, but hearing is only part of the time and touch... well I guess if you count the feeling of the mouse in my hand.

Compare that with TV which engages sight and hearing all the time and touch through the remote control.

But he continues:

The net commands our attention with far greater insistency than our television or radio or morning newspaper ever did. Watch a kid texting his friends or a college student looking over the roll of new messages and requests on her Facebook page or a businessman scrolling through his e-mails on his BlackBerry. What you see is a mind consumed by a medium.

No, that's what Mr Carr sees. I see a kid communicating with his friends (via text message which has nothing to do with the Internet), a college student doing what college students do (organizing her social life) and a businessman worrying about a deal, or office politics, or what the day will bring.

But don't stop him now:

When we’re online we’re often oblivious to everything else going on around us.

Just like when we get into a good book. Oh sorry, that negates the argument about the Internet killing the book. I'll shut up.

As the psychotherapist Michael Hausauer notes, teenagers and other young adults have a “terrific interest in knowing what’s going on in the lives of their peers, coupled with a terrific anxiety about being out of the loop.”

And? That's hardly new. Teenagers have been doing that forever. When I was growing up it was yacking for hours on the phone tying up the one line the family had.

The net is, by design, an interruption system, a machine geared for dividing our attention.

I don't even know what this means, but it sounds great. I guess he'll use this line over and over again to pump the book. The net (by which he means the web, I assume) is not an interruption system (by design). Go grab Tim Berners-Lee and ask him what he was designing at CERN. He was designing an environment for scientists to navigate one of the most difficult and rich areas of science: particle physics.

Websites routinely collect detailed data on visitor behaviour, and a 2008 study found that in most countries people spend, on average, between 19 and 27 seconds looking at a page before moving on.

Funny how he doesn't compare that to how long it takes to read a page in a book and how many words there are on a web page. He continues (and I'll stop quoting) about the dangers of multitasking. These dangers seem real to me, but they aren't the Internet's fault.

And then he finishes:

What the net diminishes is the ability to know, in depth, a subject for ourselves, to construct within our own minds the rich and idiosyncratic set of connections that give rise to a singular intelligence.

It's a horrible thought. But is it true? Mr Carr has failed to convince me, perhaps I'll have to buy his book.

UPDATE. In a comment below it's pointed out that the article in The Times is very, very similar to one that appeared in the Atlantic Monthly. Happily, that article is not behind a paywall. You can read it here.

UPDATE. A reader reminded me that Socrates was worried about the impact of written arguments many centuries ago. People have been wailing about technology spoiling everything for a long time.

Friday, August 13, 2010

The best bit in "Cyber War"

Cyber War: The Next Threat to National Security and What To Do About It is Richard A. Clarke's fascinating book about threats to the US that come from computer networks.

Here's a shocking section:

In November 2008, a Russian-origin piece of spyware began looking around cyberspace for dot-mil addresses, the unclassified NIPRNET. Once the spyware hacked into NIPRNET computers, it began looking for thumb drives and downloaded itself onto them. Then the "sneakernet effect" kicked in. Some of those thumb drives were then inserted by their users into classified computers on the SIPRNET.

[...] Because the secret network is not supposed to be connected to the Internet, it is no supposed to get viruses and worms. Therefore, most of the computers on the network had no antivirus protection, no desktop firewalls or similar security software. In short, computers on DoD's most important network had less protection than you probably have on your home computer.

Within hours, the spyware had infected thousands of secret-level US military computers in Afghanistan, Iraq, Qatar, and elsewhere in the Central Command.

Recent nonsensical song lyrics

Seriously, what's the world coming to? Or at least the popular music world? Used to be that the most nonsensical lyrics came from the likes of Frank Sinatra going "doobie doo". But what about these three monstrosities?

Ke$ha in Tik Tok: pedicure on our toes. Where the hell else are you going to have a pedicure? Also, who gets up in the morning and brushes their teeth with a bottle of Jack? Just because they are not coming back. Huh? Brush your teeth properly if you're not coming back.

Alicia Keys in Empire state of mind (Part II) Broken Down: Concrete jungle where dreams are made of. Made of what? Or did she mean that dreams are made of? Dangling preposition as well. You can't just string clichés together you know. Oh wait...

Lady Gaga in Telephone: And I cannot text you with a drink in my hand, eh…. Huh? What she can hold the phone in one hand, but not text. Does she have a claw or something?

Look if you're going to have ridiculous lyrics at least be Meat Loaf and claim to do almost anything for love except that. Ok?

Thursday, August 12, 2010

iPhone girls are easy

(and the boys, too).

Yesterday, a story did the rounds about how iPhone users have more sex than Android or Blackberry users. The story was repeated as iPhone users have more sex, when the reality is the survey says more sexual partners (but they, too, headlined it as more sex).

There's a vast different between the two.

The data shows that a woman with an iPhone has twice as many sexual partners as a woman with an Android phone by age 30. For men it's a 66% increase in sexual partners.

Wednesday, August 11, 2010

How many clicks does it take?

To takeover someone's life?

Not many. If you look at a number of recent high profile hacks (Sarah Palin, Twitter) you'll notice that the key vector of attack is via email. Email is the Achilles' Heel of all our virtual lives. If you can control someone's email, you can control a large part of their life. And since email has gone from on a single computer to in the cloud, email attacks are easier than before.

If it's not obvious why, consider the following uses of email:

1. Private communications - email is used where letters used to be and for scheduling. If you could access someone's email you'd have a lot of information about where they'd be at what time. Also, if they are having an affair it is quite likely that it'll leave a trace in their email.

2. Social network - the contact information in the person's email gives you who they communicate with. This is vital information for social engineering attacks against the person. It's also useful information for spear-phishing since you could fake a mail from a trusted person.

3. Passwords - Many (most?) companies provide password reset by sending an email to the user. If you control the person's email you can reset their password on almost any other service and obtain access to those services.

4. Financial Services, Utilities - Most companies that used to mail you paper statements are now offering electronic versions. With access to their statements you have a lot of information to mess with them in real life.

So, how credible is an email takeover?

I recently discovered an interesting attack vector for GMail and Google Apps for Your Domain. Google has a feature that allows a user to link their Google accounts together for single sign in. This feature works with GMail and Google Apps for Your Domain.

If a user has a personal GMail account and a company Google Apps account and opts for single sign in they create a vulnerability where a malicious sysadmin could take over their personal account. Sysadmins are able to control the passwords of users in Google Apps. A sysadmin could change the password of any user they wish to target through the Google Apps control panel.


Clearly, they'd need a pretext, but I'm sure they could find one (such as 'company policy' that passwords be changed every n-weeks).

Once changed they can log in as the user. This will arouse little suspicion because Google Apps will report a log in from the same IP as the user normally logs in. Google would only say in the footer that they were logged in more than once from the same IP (and only during the attack).

From the corporate account the evil sysadmin can jump to the personal account. Clearly, resetting this password will raise suspicion so the attacker sets up an automatic forward of all mail to an email address they control (such as another GMail account). This can be done using a filter (say forwarding all mail containing the word 'the'):


Or using the Forwarding feature:


Conveniently forwarding doesn't leave a trace in the user's Sent Items. Only when manually forwarding is a record kept that is visible to the attacked user. Unless the user examined their settings carefully this attack would likely go unnoticed for a long time. When manually forwarding a copy of the forwarded message is attached to the original in a standard GMail thread. But automatic forwarding does nothing.

The attacker is then free to read the attacked person's mail through the account it is forwarded to, or even forward it on to create a chain to frustrate attacks to figure out who had created the attack.

My take: protect your email with your life (almost). Get a really good password and do not link your accounts. If your email is vulnerable, so are you.

UPDATE: A number of people have told me that this problem (the Google issue) isn't real because they can't reproduce it. I originally saw this happen before the recent introduction of the new single sign in from Google, so perhaps it has been addressed and I am incorrect about the specific details.

UPDATE: Now verified why this was happening. The user had explicitly used their Google Apps account and added GMail to it from that account. That caused the new GMail account to be logged in when going to Google Apps. Thus it's incorrect to say that this is caused by the linking feature. Apologies.

Tuesday, August 10, 2010

Percentage of top grossing US films by decade that depict killing

I obtained a list of top grossing US films from this site and then figured out whether they depict killing of people (or of animals if the animals can speak) and charted it by decade. Here's the chart:


Draw you own conclusions.

Monday, August 09, 2010

The Bayesian / Fibonacci VC

For a while I worked for a venture capital firm in California. The partners there were fantastic and I learnt a great deal from them and about how venture capital deals come together. Since I was a geek at a major VC firm I got access to all the startups in the area I was covering (at this time this was web services).

But I observed one amusing piece of VC behavior that you should watch out for: they sometimes act like a Fibonacci sequence in motion. Although VCs do a lot of technical and financial analysis on the startups they are considering investing in, and they spend a great deal of team digging into the team, they also, by nature, have to change with the times.

Part of changing with the times is continually updating their knowledge of the computer software and hardware marketplace and the trends that are being created in the minds of leading thinkers in Silicon Valley. That means they are constantly updating their view of the world by some hidden Bayesian process. They have some prior view of the world which is updated by talking to people. And since they are VCs they get to speak to the best people.

But they also sometimes go Fibonacci: the thing they are talking about now turns out to be some combination of things they recently heard. I saw this happen once in an amusing fashion where the CEO of a company told me that VC X had been asking him about whether he had 1/1 meetings with his team members. It seemed like an odd question.

I happened to know that the CTO of a different firm had told VC X that in his company 1/1s were non-existent and that this was creating a team morale problem. VC X had no reason to think that there was a problem in the other firm, he was just acting on f(n-1) (or perhaps f(n-1) + f(n-2)).

So, next time you are in front of a VC and he says something totally out of left field consider that he might have gone Fibonacci instead of Bayesian on you. You can probably ignore what he's saying. Or you might have just got an unexpected insight into one of the other firms that VC is dealing with.

Sunday, August 08, 2010

Shut up and ship

Over the weekend I got to hear about an attempt to avoid Internet censorship called Haystack. I thought on a technical level it might be interesting to read about how they want to get around the Iranian government's web filtering. It's an interesting topic because evading the Chinese government's firewall has been discussed in some technical circles for a while.

Alas, the Haystack web site has zero technical details. Worse, they plan to keep their software closed source. So, there's no way of evaluating their claim that their amazing software will help Iranian citizens evade Internet filtering in Iran. That hasn't stopped them getting in Newsweek and asking you to send them donations.

Now, it may well be the case that these folks are onto something, but I wouldn't trust a closed source piece of vaporware if I were trying to evade a government (any government). IMHO, the gold standard for hiding stuff from prying governmental eyes is PGP. It's open source and its design was discussed heavily in public and has been vetted. Or how about TrueCrypt? Open source, publicly vetted.

Worryingly, Haystack's only 'technical' detail is the following: "We use state-of-the-art elliptic curve cryptography to ensure that these communications cannot be read." Fair enough, but frankly that means nothing. They could be using AES, or RSA, or pretty much any good algorithm and I still wouldn't care. Two reasons: their implementation might be rubbish and enable attacks or their cryptography might be irrelevant because another technique (traffic analysis?) might make breaking Haystack possible. After all, all the Iranian government needs is a list of people running the software.

(Actually, using ECC might be a net negative. You don't really want to be messing around with something that's relatively (in crypto-years) new, patent encumbered, and slow. Using ECC indicates that either the people behind Haystack are either incredibly knowledge about cryptography or the opposite.)

And then there's the 'genius' (at least that's what Newsweek makes him out to be) who designed this software. His CV touts his degree in marketing and extensive experience with PHP. I guess he might have a hidden crypto background but I'm also guessing he's no Phil Zimmerman. I realize readers might be uncomfortable with an ad hominem criticism, but without any code or technical details all I can go on is the technical chops of the person behind Haystack.

Of course, there's a simple solution to my criticisms: shut up and ship. Ship an open source version of your code and let's take a look at it. Let the Iranian government have a look at it. Then we'll know if it's vaporware or regime-changing ware.

I had similar feelings about Diaspora who raised $200k in donations without showing a line of code. All they had to do was aspire to take on Facebook (with a privacy angle).

If it isn't clear, I detest this "get lots of press for my vaporware project, get people to donate, then work on something (or not)" approach.

Shut up and ship.

But perhaps I should give Austin Heap (Haystack's mastermind) the final word:

“I hope we are ready to take on the next country,” he replied. “We will systematically take on each repressive country that censors its people. We have a list. Don’t piss off hackers who will have their way with you. A mischievous kid will show you how the Internet works.”

I think I just threw up in my mouth a little.

Friday, August 06, 2010

An unexpected benefit of HN's noprocrast option

Some time ago I decided to enable Hacker News' noprocrast option with the default settings. I am allowed 20 minutes of time on Hacker News every three hours. I did this because I was spending too much time on the site, especially in the evenings in London when the conversation was active in Pacific Time.

What I expected to happen was that when I could read Hacker News, I'd be desperate to get through as many stories as possible to consume as much Hacker News as I could before my time was up. But the opposite happened.

I visit Hacker News and then I can't find anything interesting to read. Or one or two things will be worth it and the rest will be dross. I truly was procrastinating reading stuff that wasn't really interesting after all.

The other effect is that I've founded myself reading conversations before articles. Since I know that I can read an article at my leisure, but that I have 20 minutes to take part in a conversation I'll quickly go to the conversations that interest me, and leave the article for another time.

Overall, noprocrast has been a winner for me. The only thing that's suffered is my karma value.

Thursday, August 05, 2010

Within DIY reach: flying killer robots

It seems to me that it should be possible for a smart hobbyist to build a flying robot that recognizes people and kills them. Now, that might be a horrific idea, but not thinking about horrific ideas doesn't make them go away. What I'd like to illustrate in this blog post how a number of technologies could be brought together in a lethal fashion by someone with good computer skills.

First, flying autonomous robots. There's a tremendous amount of work been done on DIY drone aircraft (for example, there's a whole community over at DIY Drones building pacifist flying robots). The key to these is a combination of GPS and IMU. Add to that laser range finding and you've got the makings of a flying robot that can navigate indoors and outdoors.

GPS gives you location and altitude (and other sensors can give you close range altitude if you need to land on a floor, for example). GPS modules are cheap. A cheap IMU give you six axes of movement (three axes from a gyroscope giving you orientation and three axes from an accelerometer giving you acceleration). So with that you can navigate outdoors. All you need is an autopilot and those are also cheap.

Now, if you want to fly indoors then you are going to need a laser range finder so you don't bump into things. Put all that together with a quadcopter and now you've got something that can fly around indoors and navigate. Here's a little video of that.



Next you need to be able to spot people. Anyone with a recent digital camera or who's used Facebook will have noticed that it's possible to recognize people's faces quite easily. If we want to attack people then we can go with a simple face detection algorithm such as Viola Jones and it would be possible to find the middle of the person's face using a cheap camera mounted on the flying robot. Luckily, there's open source software that can do the face recognition for you.

It seems to me that it's a small step from that to shooting them.

So you could probably build a 'smart grenade' that could be tossed into a room and have it shoot people without blowing everything up. Of course, a grenade might be simpler and wouldn't be fooled by someone hiding their face from the robot.

And, of course, if I can imagine building such a device someone with time and greater resources can make this a reality. That means that the days of robots that kill us will soon be upon us.

PS Yes, this was a rather grim post. Regular readers will know that I post on a wide variety of topics (image forgery detection, Ikea train sets, Alan Turing, the Fermilab code, breaking Facebook's crypto keys, climate change and more). I can't prevent an active mind from thinking about some dark things amongst all the rest.

Wednesday, August 04, 2010

On the release of scientific source code

I was asked by The Times for commentary on the idea that releasing scientific source code could have negative effects:

This view is countered by programmer John Graham-Cumming, who found coding errors after trying to reproduce the CRU/Met Office's CRUTEM and HadCRUT global warming datasets. Working from the raw data released by the Met Office and the description of their process for generating the datasets in a scientific paper he decided to validate their work - a considerable effort that required writing code to implement the algorithm described in the paper. In doing so, he found a problem with the way the error ranges were calculated (amongst other errors), stemming from a bug in their code.

He says: "You could say that by not releasing their buggy code they forced me to find the bug in it by writing my own validation. But actually, if they'd released their code I would have been able to quickly compare the code and the paper and find the bug without the massive effort to write new code. And no one else had actually done this validation (including the Muir Russell review) and as a result the Met Office has been releasing incorrect data for a long time. Perhaps that's because the validation was so hard in the first place, whereas having code to check would have been easy."

The rest is here.

Tuesday, August 03, 2010

Off continent backups

Over at the day job I'm responsible for the IT side of things (inter alia), and one of the things I care about greatly are backups. Lots of them.

At previous startups where I've worked backups have been a total pain. It used to be the case that I'd end up with some DAT tape machine and a service company to rotate the tapes and getting at backed up information was difficult and slow. I wanted these backups taken off site so that in the case where the building was destroyed by fire, we'd still have a copy of our code.

No longer. Now I have my backups off continent.

I'm using two separate services for backup: JungleDisk and tarsnap. Both back up to Amazon S3 so my backups made in London end up in the US somewhere encrypted using keys that I've chosen.

I'm using JungleDisk for desktop backups. Not everyone does these, but some people, like the CEO, generate a lot of documents and presentations that are in their My Documents folders on their laptops. It's essentially that these roaming machines get backed up as that material isn't stored on our servers.

For the really heavy lifting I use tarsnap. Since the engineers are committing their code to a central repository we backup all our servers into a single tarsnap account from a simple cron job. For server backup tarsnap is ideal. It's a typical UNIX command line application that transmits and stores information both efficiently and securely.

I've recovered files from the JungleDisk backup when a laptop got damaged, and done a test recovery from tarsnap. Both are good at what they do. I can't recommend JungleDisk for servers, its UNIX configuration is a nightmare; I can't recommend tarsnap for the average desktop because it's too UNIXy. But the combination meets all our needs.

And in the worst case where the UK is destroyed, we'll move to the US and recover our backups from there.

Sunday, August 01, 2010