Return to the CurtPalme.com main site CurtPalme.com Home Theater Forum
A forum with a sense of fun and community for Home Theater enthusiasts!
Products for Sale ] [ FAQ: Hooking it all up ] [ CRT Primer/FAQ ] [ Best/Worst CRT Projectors List ] [ Setup Tips & Manuals ] [ Advanced Procedures ] [ Newsletters ]

 
Forum FAQForum FAQ   SearchSearch   MemberlistMemberlist  Photo AlbumsPhoto Albums  RegisterRegister 
 MembershipClub Membership   ProfileProfile   Private MessagesPrivate Messages   Log inLog in 
Blu-ray disc release list and must-have titles. Buy the latest and best Blu-ray titles to show off in your home theater!

Made more changes to try and deter SPAMMERS
Goto page Previous  1, 2
 
Post new topic   Reply to topic   Printer-friendly view    CurtPalme.com Forum Index -> Forum Feedback
View previous topic :: View next topic  
Author Message
kal
Forum Administrator



Joined: 06 Mar 2006
Posts: 17850
Location: Ottawa, Canada

TV/Projector: JVC DLA-NZ7


PostLink    Posted: Wed Apr 07, 2010 8:11 pm    Post subject: Reply with quote


        Register to remove this ad. It's free!
I've changed our CAPTCHA to an even better one called "reCAPTCHA" which has better security but seems easier to read, has a refresh button in case the words are hard to read, and an audio playback option. See: http://recaptcha.net/

It's the same one that AVS uses. reCAPTCHA is owned by Google!

Best part is that it's helping digitize old books. The words are scanned out of old books and by having people enter them they're doing automatic OCR'ing on them! (Pretty cool if you ask me).




Quote:

Teaching computers to read: Google acquires reCAPTCHA
9/16/2009 09:20:00 AM



The image above is a CAPTCHA — you can read it, but computers have a harder time interpreting the letters. We tried to make it hard for computers to recognize because we wanted to give humans the scoop first, but we're happy to announce to everybody now that Google has acquired reCAPTCHA, a company that provides CAPTCHAs to help protect more than 100,000 websites from spam and fraud.

Since computers have trouble reading squiggly words like these, CAPTCHAs are designed to allow humans in but prevent malicious programs from scalping tickets or obtain millions of email accounts for spamming. But there’s a twist — the words in many of the CAPTCHAs provided by reCAPTCHA come from scanned archival newspapers and old books. Computers find it hard to recognize these words because the ink and paper have degraded over time, but by typing them in as a CAPTCHA, crowds teach computers to read the scanned text.

In this way, reCAPTCHA’s unique technology improves the process that converts scanned images into plain text, known as Optical Character Recognition (OCR). This technology also powers large scale text scanning projects like Google Books and Google News Archive Search. Having the text version of documents is important because plain text can be searched, easily rendered on mobile devices and displayed to visually impaired users. So we'll be applying the technology within Google not only to increase fraud and spam protection for Google products but also to improve our books and newspaper scanning process.

That's why we're excited to welcome the reCAPTCHA team to Google, and we're committed to delivering the same high level of performance that websites using reCAPTCHA have come to expect. Improving the availability and accessibility of all the information on the Internet is really important to us, so we're looking forward to advancing this technology with the reCAPTCHA team.

Posted by Luis von Ahn, co-founder of reCAPTCHA, and Will Cathcart, Google Product Manager


Kal

_________________

Support our site by using our affiliate links. We thank you!
My basement/HT/bar/brewery build 2.0
Back to top
View user's photo album (18 photos)
kschmit2




Joined: 09 Mar 2006
Posts: 1141
Location: Heidelberg, Germany


PostLink    Posted: Thu Apr 08, 2010 7:56 am    Post subject: Reply with quote

How exactly does that help them recognize computer-illegible text?

If a computer cannot decipher the word, then it cannot be used as a captcha, because the computer won't know if what you typed is actually what can be seen on screen.

Captchas only work when the computer already knows the answer.
Back to top
huggy




Joined: 02 Aug 2008
Posts: 927
Location: Melbourne,Australia


PostLink    Posted: Thu Apr 08, 2010 8:17 am    Post subject: Reply with quote

Kal
The 5 post thingy is great,may I suggest that to avoid clogging up threads by new users with "post count" posts,you could start a new "sticky" thread just for that purpose.
This is how it's done in our local DTV forum and works well.


Dave
Back to top
AnalogRocks
Forum Moderator



Joined: 08 Mar 2006
Posts: 26690
Location: Toronto, Ontario, Canada

TV/Projector: Sony 1252Q, AMPRO 4000G


PostLink    Posted: Thu Apr 08, 2010 1:34 pm    Post subject: Reply with quote

huggy wrote:
Kal
The 5 post thingy is great,may I suggest that to avoid clogging up threads by new users with "post count" posts,you could start a new "sticky" thread just for that purpose.
This is how it's done in our local DTV forum and works well.


Dave


Can you elaborate on this?

_________________
Tech support for nothing

CRT.

HD done right!
Back to top
View user's photo album (27 photos)
kal
Forum Administrator



Joined: 06 Mar 2006
Posts: 17850
Location: Ottawa, Canada

TV/Projector: JVC DLA-NZ7


PostLink    Posted: Thu Apr 08, 2010 2:10 pm    Post subject: Reply with quote

kschmit2 wrote:
How exactly does that help them recognize computer-illegible text?

If a computer cannot decipher the word, then it cannot be used as a captcha, because the computer won't know if what you typed is actually what can be seen on screen.

Captchas only work when the computer already knows the answer.

Good point. I don't know! It does seem like a catch-22 doesn't it? Something here obviously works (it is Google after all...Smile). We're just missing a piece of the puzzle explaining how it works...

Ok, here's the exaplanation:

Quote:
Our apparatus, called “reCAPTCHA,” is used
by more than 40,000 Web sites (6) and dem
onstrates that old print material can be tran
scribed, word by word, by having people solve
CAPTCHAs throughout the World Wide Web.
Whereas standard CAPTCHAs display images
of random characters rendered by a computer,
reCAPTCHA displays words taken from scanned
texts. The solutions entered by humans are used to
improve the digitization process. To increase effi
ciency and security, only the words that automated
OCR programs cannot recognize are sent to hu
mans. However, to meet the goal of a CAPTCHA
(differentiating between humans and computers),
the system needs to be able to verify the user’s
answer. To do this, reCAPTCHA gives the user
two words, the one for which the answer is not
known and a second “control” word for which
the answer is known. If users correctly type the
control word, the system assumes they are human
and gains confidence that they also typed the other
word correctly (Fig. 1). We describe the exact
process below.

We start with an image of a scanned page.
Two different OCR programs analyze the image;
their respective outputs are then aligned with
each other by standard string matching algo
rithms (7) and compared to each other and to an
English dictionary. Any word that is deciphered
differently by both OCR programs or that is not
in the English dictionary is marked as “suspicious.”
These are typically the words that the OCR pro
grams failed to decipher correctly. According to
our analysis, about 96% of these suspicious words
are recognized incorrectly by at least one of the
OCR programs; conversely, 99.74% of the words
not marked as suspicious are deciphered correctly
by both programs. Each suspicious word is then
placed in an image along with another word for
which the answer is already known, the two words
are distorted further to ensure that automated pro
grams cannot decipher them, and the resulting
image is used as a CAPTCHA. Users are asked to
type both words correctly before being allowed
through. We refer to the word whose answer
is already known as the “control word” and to
the new word as the “unknown word.” Each
reCAPTCHA challenge, then, has an unknown
word and a control word, presented in random
order. To lower the probability of automated pro
grams randomly guessing the correct answer, the
control words are normalized in frequency; for
example, the more common word “today” and
the less common word “abridged” have the same
probability of being served. The vocabulary of
control words contains more than 100,000 items,
so a program that randomly guesses a word would
only succeed 1/100,000 of the time (8). Addi
tionally, only words that both OCR programs
failed to recognize are used as control words.
Thus, any program that can recognize these words
with nonnegligible probability would represent an
improvement over state of the artOCR programs.

To account for human error in the digitiza
tion process, reCAPTCHA sends every suspi
cious word to multiple users, each time with a
different random distortion. At first, it is displayed
as an unknown word. If a user enters the correct
answer to the associated control word, the user’s
other answer is recorded as a plausible guess for
the unknown word. If the first three human
guesses match each other, but differ from both
of the OCRs’ guesses, then (and only then) the
word becomes a control word in other chal
lenges. In case of discrepancies among human
answers, reCAPTCHA sends the word to more
humans as an “unknown word” and picks the
answer with the highest number of “votes,”
where each human answer counts as one vote
and each OCR guess counts as one half of a
vote (recall that these words all have been pre
viously processed by OCR). In practice, these
weights seem to yield the best results, though
our accuracy is not very sensitive to them (as
long as more weight is given to human guesses
than OCR guesses). A guess must obtain at least
2.5 votes before it is chosen as the correct
spelling of the word for the digitization process.
Hence, if the first two human guesses match
each other and one of the OCRs, they are con
sidered a correct answer; if the first three guesses
match each other but do not match either of the
OCRs, they are considered a correct answer, and
the word becomes a control word. To account
for words that are unreadable, reCAPTCHA has
a button that allows users to request a new pair
of words. When six users reject a word before
any correct spelling is chosen, the word is dis
carded as unreadable. After all suspicious words
in a text have been deciphered, we apply a post
processing step because human users make a
variety of predictable mistakes (see supporting
online text). From analysis of our data, 67.87%
of the words required only two human responses
to be considered correct, 17.86% required three,
7.10% required four, 3.11% required five, and
only 4.06% required six or more (this includes
words discarded as unreadable).


Read the whole article here: http://recaptcha.net/reCAPTCHA_Science.pdf

It's quite ingenious because not only is it a more secure CAPTCHA, it also can be used for good. This is probably why the PHD's at Google took note and bought the company. Wink

Kal

_________________

Support our site by using our affiliate links. We thank you!
My basement/HT/bar/brewery build 2.0
Back to top
View user's photo album (18 photos)
AnalogRocks
Forum Moderator



Joined: 08 Mar 2006
Posts: 26690
Location: Toronto, Ontario, Canada

TV/Projector: Sony 1252Q, AMPRO 4000G


PostLink    Posted: Thu Apr 08, 2010 2:22 pm    Post subject: Reply with quote

Those annoying spammers still sign up though. I've nuked 3 post's since last night. Pricks!
_________________
Tech support for nothing

CRT.

HD done right!
Back to top
View user's photo album (27 photos)
kschmit2




Joined: 09 Mar 2006
Posts: 1141
Location: Heidelberg, Germany


PostLink    Posted: Thu Apr 08, 2010 6:07 pm    Post subject: Reply with quote

thx Kal, interesting read.

I should have come up with that Smile
Back to top
kal
Forum Administrator



Joined: 06 Mar 2006
Posts: 17850
Location: Ottawa, Canada

TV/Projector: JVC DLA-NZ7


PostLink    Posted: Thu Apr 08, 2010 6:15 pm    Post subject: Reply with quote

kschmit2 wrote:
thx Kal, interesting read.

I should have come up with that Smile


That's what I always say too. Wink Usually the best ideas are so deceptively simple that you wonder why nobody's thought of it first.

Kal

_________________

Support our site by using our affiliate links. We thank you!
My basement/HT/bar/brewery build 2.0
Back to top
View user's photo album (18 photos)
garyfritz




Joined: 08 Apr 2006
Posts: 12024
Location: Fort Collins, CO


PostLink    Posted: Thu Apr 08, 2010 7:02 pm    Post subject: Reply with quote

That is so cool!! What a brilliant use of resources.
Back to top
ecrabb
Forum Moderator



Joined: 13 Mar 2006
Posts: 15909
Location: Utah

TV/Projector: JVC RS40, Epson 5010


PostLink    Posted: Thu Apr 08, 2010 7:19 pm    Post subject: Reply with quote

Absolutely amazing. Brilliant is right.

SC
Back to top
View user's photo album (10 photos)
huggy




Joined: 02 Aug 2008
Posts: 927
Location: Melbourne,Australia


PostLink    Posted: Thu Apr 08, 2010 8:17 pm    Post subject: Reply with quote

AnalogRocks wrote:
huggy wrote:
Kal
The 5 post thingy is great,may I suggest that to avoid clogging up threads by new users with "post count" posts,you could start a new "sticky" thread just for that purpose.
This is how it's done in our local DTV forum and works well.


Dave


Can you elaborate on this?


One thread with the sole purpose of new users getting their post count up to 5,they can fill it up with whatever.

This is what I mean;
http://www.dtvforum.info/index.php?showtopic=44129&hl=red+text+thread


Dave
Back to top
Display posts from previous:   
Post new topic   Reply to topic   Printer-friendly view    CurtPalme.com Forum Index -> Forum Feedback All times are GMT
Goto page Previous  1, 2
Page 2 of 2
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum