Thursday, June 01, 2006

CAPTCHAs fool humans 20% of the time

Over at SpamOrHam I use a CAPTCHA from to prevent malpeople from using bots to mess up the results of the web site.

There's only one problem with this plan.

People enter the CAPTCHA wrongly about 20% of the time.

Looking at the error logs for SpamOrHam shows that the site has offered 27,468 CAPTCHAs of which 5,326, or 19.39% have been entered incorrectly. I'm not tracking whether incorrectly means that the actual password was wrong, or if the person just didn't even bother to enter anything, but, nevertheless, a 20% error rate is very high.

And for me it manifests itself in complaints and people who give up on the site. That sucks, but it's the current only way to protect against bots attacking the site.

What's needed is a comparative study of the different ways of generating CAPTCHAs to figure out which ones are both effective against bots and effective against humans!


Michael Clark said...

How about switching over to a math problem in place of the very-very-slow CAPTCHA? Have a few different mathematical questions, use images for the numbers and the symbol.

I stopped contributing because the CAPTCHA system was taking many minutes to load. That could be affecting your stats if a failure to enter anything counts as a failure.

Newtronic said...

I can't rise above about 90% accuracy because of stupid things like thinking the digit one is a lower case l, or not being able to tell a upper case O from a lower case one.

Manni said...

These are shocking results. I always thought that CAPTCHAs suck in terms of usability and, especially, accessiblity, but 20% is a catastrophic result (even though I'd estimate that I fail to answer CAPTCHAs correctly some 10% of the time).

The Oddmuse wiki is now using the QuestionAsker-Extension and it seems to do pretty well.

Shaper said...

It's just a thought, but you don't differentiate anywhere between "people who enter CAPTCHAs wrongly" and "bots who fail CAPTCHA tests".

Could it be that most of those "failures" are actually the CAPTCHA doing its job?

Shalmanese said...

Wouldn't the obvious thing to do is to make the CAPTCHA more leniant?

ie: if you had f1O0d
it would accept:

Alex Schroeder said...

I'm running the Oddmuse wiki Manni linked to. I just looked at yesterday's log files, where I tracked all the answers. I'm only protecting a few well known pages, not all the pages on my wiki, which explains the low numbers.

grep Q: error.log.1 | wc -l
grep "A: ''" error.log.1 | wc -l
grep "A: 'http://" error.log.1 | wc

Thus: Only 10 out of 239 attempts did not answer the question (could be humans or bots), and 229 our of 239 pasted an URL into the field -- clearly bots at work.

Thus my stats seem to be looking good.

JoeChongq said...

I am glad you posted this. I have been wondering about this data for a while. Is there any way you can seperatly track the blank ones from the incorrect ones in the future?

I believe 20% probably is pretty accurate at least for this implementation since some letters are too similar. But unless you seperate the blank ones and obvious bot attempts, people are going to question the stats.

General spam bot attacks are unlikely to affect the results since there is no text entry form and the form page doesn't appear in search engines. But as you have already shown, there are idiots that want to damage this project by using a bot.