Have you ever purchased tickets online and had to decipher one of those scrambled, hard-to-read words before proceeding to the check-out phase of your purchase? That funny word is called a "captcha" and it's intended to ensure you are a human and not a machine that is programed to game the system. Only a human being can recognize these words and then re-enter them from a computer keyboard.
It turns out that this process is doing more than just separating you from a machine. This article in today's New York Times explains that these catchas serve a second purpose: every one of them is a word from an old text that an OCR (Optical Character Recognition) program was unable to recognize. Such words are siphoned off into this program and presented as captchas. When you type the word, your effort is funneled into a sophisticated computer program that compares the letters you type with the letters typed by others for the same word, and does a few other quality control things (like checking the word in the text before and after this unknown word to create some kind of context) and finally, the computer determines the identity of this previously unknown word. The accuracy rate of this method is higher than that of an individual typist doing purposeful verification of the words and this costs almost nothing. Millions of words are sorted out this way every day.