Turing test

Why CAPTCHAs are evil

Have you ever signed up for an internet forum or web app? Chances are, you've seen a CAPTCHA: a little image with distorted letters demanding that you prove that you are human. Or is that what it is really asking? Perhaps instead, it's asking that you prove that you're sighted.

Phoney Security

CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart), are supposed to prevent automated computer programs from posting spam messages on public forums. It's a kind of Turing test that assumes that computers cannot pass.

Advantage: OCR

This is far from the truth. Optical character recognition (OCR) is getting pretty good at this, and the various attempts at making this more difficult usually cause more problems for humans than computers. Adding noise to the image can be filtered out, and colours can be scanned.

Let's assume that the standard "what letters do you see?" CAPTCHAS can be broken by computers, with a little work. As can be seen from an article in WIRED discussing Ticketmaster's CAPTCHAs, large scale, automated circumvention has been lucrative for years now.

How easy is it for humans?

There are a number of cases where CAPTCHAs actively discriminate. Are you blind? Tough luck. We can forward you to an audio version, which is even worse. Are you dyslexic? If you have trouble reading words in a consistent font face, how easy is it to read deformed letters? Not so easy. What about people with poor motor control? They may need to use a toggle switch to manually type in your letters. This is a major inconvenience to sign up for an account.

The Arms Race

The next step in the arms race is to present the user with some images, and ask them to categorize them. This steps out of the realm of OCR, a mapping of image to characters, and into knowledge systems. This requires some domain specific knowledge, and potentially culture specific knowledge.

This adds additional problems, like when users are asked to select all the phone images. Telephones from the 1980s and earlier, especially early cordless and mobiles, look nothing like telephones today. Iconography relies heavily on cultural knowledge, and can easily lead to problems.

The Mechanical Turk

The other problem of course, is that there may be little distinction between malicious computer programs, and systems that employ a human factor. An automated system could serve up these CAPTCHA images in real time to human agents, who can solve them and pass them back to the system. Perhaps this doesn't scale as fast as a purely automated solution, but the fundamental issue is still there.

The Recommended Solutions

The W3 Web Content Accessibility Guidelines recommend, very broadly, a text-based cognitive test, to be offered as an alternative to either a visual or audio test. I've tried using an audio captcha once, but it was interfering with the JAWS screen reader I was testing at the time.

Oh, the irony!

The irony is that many users who rely on assistive technologies also use browser plugins to help bypass captchas. They take advantage of the fact that captchas don't work in order to gain access.

I would argue that the concept of captchas is fundamentally flawed. As the captcha technology becomes more sophisticated, so do the attacks. The problem is that with each stage of the arms race, successfully completing these tasks becomes more difficult for humans. CAPTCHAs don't address the real problem. The problem isn't determining whether a human is physically present on the end of a web browser. The problem is behavior based. What particular behaviour is a captcha intended to prevent? Spam messages on a forum? Unsolicited bulk email? Why not address these issues directly, rather than with a CAPTCHA?

Solving the Real Problem

There is existing technology to grade messages with heuristics. Spam filters have gotten quite good. If messages from a new member of your site are also held in a moderation queue, there can be an additional check for appropriate content.

You can also analyze the browsing behaviour. What patterns emerge? For a social site, how much time is spent on existing content? For an auction site, how much time has been spent looking at different items? For certain types of sites, such as local auction sites, there may not be enough behaviour to adequately analyze, especially if a user does not need to create an account in order to participate.

Why are CAPTCHAs so prevalent? They're available as third-party developer libraries, and don't require any changes to system architecture. It's a band aid solution to the wrong problem.

Language Matters and Computational Linguistics

A cover of Computational Linguistics

I'm rather disappointed with the second edition of Language Matters, by Donna Jo Napoli and Vera Lee-Schoenfeld. The second edition was published in 2010, which has some minor updates to the earlier 2003 edition, along with some added material.

Chapter 7, "Can computers learn language?" received only minor edits, changing references in the examples. They change the term VCR to a DVR. The example they use however, has not changed, nor has their conclusion.

The two examples they use are: 1) Record "Law and Order" at 9 P.M. on Channel 10.

2) If there's a movie on tonight with Harrison Ford in it, then record it. But if it's American Graffiti, then don't bother because I already have a copy of that.

As Napoli notes (Lee-Schoenfeld was not involved in the first edition), this task would involve asking the computer "to scan a list of TV programs, recognize which ones are movies, filter out the particular movie American Graffiti, determine whether Harrison Ford is an actor in the remaining movies, and then activate the "record" function on the DVR at all the appropriate times on all of the appropriate channels." (Language Matters 2nd Ed. p 99).  Napoli continues to suggest that "we'd be asking the computer to work from ordinary sentences, extracting the operations and then properly associating them with the correct vocabulary items, a much harder task". (Language Matters, 2nd edition, p 99).

Of interest here is that Napoli's summary does not follow the lexical and linguistic parsing of the command. In particular, Napoli filters out American Graffiti before performing any searches for Harrison Ford. This appears particularly strange to me, as the first step in parsing this statement would be the same whether by a linguist or a software parser. Parse the first sentence before attempting to add context from the second.

While Napoli and Lee-Schoenfeld make several bold, definitive statements throughout the text which I found lacking in support, in this case they seem to dismiss the concept as "much harder task".  This statement may have gotten a bare pass in 2003, but in 2010, it's a harder sell. Admittedly, the Jeopardy! showdown with IBM's Watson may not have yet occurred, but in a text revision, I would expect some level of research to validate these claims. There are several journals on computational linguistics available, such as the Journal of Computational Linguistics, which has been Open Access since March of 2009.

In particular, the example given above is domain specific. It deals with television specific language, for which there are databases of particular terms, such as movie titles and casting information.

Even before Watson, I would not have considered a problem of this scope to be extraordinarily difficult, primarily due to the limited domain. While a more general domain would increase the difficulty considerably, current research is looking more hopeful. While computers are still not ready to pass the Turing test, there are some indications that this may happen in the relatively near future.

Language Matters is a very accessible text, covering many aspects of language and linguistics to those without much experience in the field. Aside from the chapter on computers and language, this book provides a good introduction to a number of topics. I wish that in the revision process, the authors had revisited some of their conclusions in an active field of research.