[captcha] How does Google reCAPTCHA v2 work behind the scenes?

A new paper has been released with several tests against reCAPTCHA:

https://www.blackhat.com/docs/asia-16/materials/asia-16-Sivakorn-Im-Not-a-Human-Breaking-the-Google-reCAPTCHA-wp.pdf

Some highlights:

  • By keeping a cookie active for +9 days (by browsing sites with Google resources), you can then pass reCAPTCHA by only clicking the checkbox;
  • There are no restrictions based on requests per IP;
  • The browser's user agent must be real, and Google run tests against your environment to ensure it matches the user agent;
  • Google tests if the browser can render a Canvas;
  • Screen resolution and mouse events don't affect the results;

Google has already fixed the cookie vulnerability and is probably restricting some behaviors based on IPs.

Another interesting finding is that Google runs a VM in JavaScript that obfuscates much of reCAPTCHA code and behavior. This VM is known as botguard and is used to protect other services besides reCAPTCHA:

https://github.com/neuroradiology/InsideReCaptcha

UPDATE 2017

A recent paper (from August) was published on WOOT 2017 achieving 85% accuracy in solving noCAPTCHA reCAPTCHA audio challenges:

http://uncaptcha.cs.umd.edu/papers/uncaptcha_woot17.pdf

UPDATE 2018

Google is introducing reCAPTCHA v3, which looks like a "human score prediction engine" that is calibrated per website. It can be installed into different pages of a website (working like a Google Analytics script) to help reCAPTCHA and the website owner to understand the behaviour of humans vs. bots before filling a reCAPTCHA.

https://www.google.com/recaptcha/intro/v3beta.html