CloudFlare, Tor, and eliminating CAPTCHAs

Please consider subscribing to LWN

Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net.

By Nathan Willis
October 5, 2016

The Tor project has long had to struggle for a legitimacy against various outsiders' presumptions that the network is used mostly for nefarious or unsavory purposes. One of the bigger problems stemming from such assumptions in recent months has been alleged protective measures imposed by content-delivery networks (CDNs) when the CDN encounters traffic from a Tor exit node. The Tor user is forced to complete a CAPTCHA puzzle, often multiple times, causing inconvenience and privacy concerns. The main offender in such scenarios has been the CloudFlare CDN, which even went so far as to publicly claim that the majority of Tor traffic is malicious. But now it seems that CloudFlare and Tor may be close to reaching an accord; the CDN company has published a specification for a new authentication scheme it claims will protect users' privacy while still protecting sites against harmful traffic.

There is no shortage of web sites that discriminate against visitors arriving through a Tor exit node; even high-profile sites like Wikipedia often block certain features (such as account creation) for Tor users. But CloudFlare's treatment of Tor traffic is different in several respects. First, CloudFlare blocks even read access to the sites on its CDN, at least until the user completes the CAPTCHA challenge. Second, because CloudFlare is, essentially, an invisible service layer, users have no way of knowing in advance that the link they click on will take them to a CloudFlare site. The CAPTCHA interruption is, therefore, a surprise annoyance. Third, CloudFlare developed a reputation for indiscriminately blocking all Tor traffic, which other CDNs (such as Akamai) did not.

Being forced to solve a CAPTCHA challenge before even viewing a web site is frustrating enough, of course, but Tor users reported that CloudFlare's CAPTCHA system would all too frequently trap them in an endless loop—in addition to other problems it posed, such as requiring that JavaScript be enabled and requiring that the user can read English.

Flaring up

The CloudFlare problem has been irksome for several years, but matters became somewhat more heated in mid-2016. In March, CloudFlare ran a post on the company blog that made some dubious claims (at least, in the eyes of Tor fans) about the legitimacy of Tor users. Most notable was the claim that:

Based on data across the CloudFlare network, 94% of requests that we see across the Tor network are per se malicious. That doesn’t mean they are visiting controversial content, but instead that they are automated requests designed to harm our customers.

The post goes on to say that CloudFlare does not treat Tor users any differently than it does other users whose requests originate from an IP address that had previously been tagged with a high "threat score." The company also has a dedicated FAQ entry reiterating this position, noting that Tor exit-node IPs "may earn a bad reputation"—essentially, framing the issue as just another part of the CDN's larger reputation-based filtering system.

For its part, the Tor project was not amused by the "94%" claim; a day after the CloudFlare post, Tor ran a response challenging the number on several grounds. First, the Tor post noted, CloudFlare has not described its reputation system in any detail, although it alluded to pulling data from Project Honey Pot, which uses test machines rather than collecting data from real web servers. Second, the post pointed to research showing that CloudFlare blocks more than 80% of Tor exit nodes and seems to put those nodes on a block list that is either permanent or is reevaluated only on a time frame longer than the study could determine. Finally, it pointed to research from Akamai that concluded that the traffic through Tor exit nodes was virtually indistinguishable from that of other IP addresses.

The project also published a fact sheet [PDF] outlining the impact that CloudFlare's blocking policy has on Tor users, citing examples such as a user in Vietnam who had to complete 30 CAPTCHAs before CloudFlare allowed him to view a particular web site.

Technical approaches

But even as the war of words was raging, there were parties from both sides working toward a technical solution. In January 2016, developers George Tankersley and Filippo Valsorda published an initial proposal for a scheme that would allow Tor users to skip the CloudFlare CAPTCHA challenges. Instead, the user agent (such as Tor Browser) would send a cryptographic CAPTCHA-bypass token from a supply that it acquires in a secure out-of-band manner.

In February, CloudFlare CTO John Graham-Cumming joined the discussion on the Tor issue tracker related to CloudFlare's policy. The debate that followed included quite a bit of back-and-forth; Tor developers, for instance, repeatedly pointed out that CloudFlare's statistics seemed to track the number of Tor exit nodes that had ever been associated with one or more malicious actions—a number that does not indicate what percentage of Tor traffic is malicious.

The Tor project members also pressed for more detail on the rather vague "threat" that CloudFlare claimed was posed by Tor exit nodes. Graham-Cumming and other CloudFlare employees were able to elaborate; spam commenting and attempts to harvest email addresses have been detected, it seems, and the company regards Tor exit nodes as a potential distributed-denial-of-service (DDoS) vector. There was push back against that latter claim, however, with Tor developers noting that the entire Tor network's bandwidth is considerably smaller than most ISPs, making it unlikely to be chosen as a good DDoS platform. For his part, Graham-Cumming also responded to a number of technical requests, adding an option for CloudFlare customers to whitelist Tor for their own site, and tackling the CAPTCHA-loop problem.

Nevertheless, those changes were short-term fixes. Longer term, the team at CloudFlare decided to pursue the bypass-token proposal developed by Tankersley and Valsorda. On September 30, Alex Davidson, an intern at CloudFlare, announced the initial release of an implementation on the tor-access mailing list.

The technique originally described in January has two components. First, the client-side code involved is distributed as a browser extension rather than being delivered in the requested page as JavaScript. That enables a more robust code review, and it sidesteps several criticisms of CloudFlare's use JavaScript CAPTCHAs (requiring users to enable JavaScript, for example, troubled many Tor users who prefer to browse without JavaScript).

The second and more significant component is the authentication scheme itself. Each user would first collect a set of anonymous bypass tokens by visiting a "challenge service" and completing a single CAPTCHA-like challenge. Subsequently, whenever the user's browser encounters a CDN proxy, it would automatically send a token that would grant it access to the CDN-provided site. The original proposal suggests batches of 10,000 tokens be distributed at once.

Each token would be in JSON Web Token format; it would include a nonce and would be signed by the private key of the challenge service, so that the proxy can verify its authenticity. Furthermore, when the browser sends the token to the CDN proxy, the token is encrypted with a different public key belonging to the proxy, to prevent eavesdroppers from intercepting valid tokens and using them. The tokens are also meant to be blinded before they are sent to browsers, a measure meant to prevent the challenge service from tracking users, although Tankersley and Valsorda did not detail the blinding algorithm in their original document.

The proposal suggests that CDN proxies include a <meta> tag of type "blind-captcha-bypass-request" in the requested page HEAD to alert clients that bypass tokens are accepted, along with a key fingerprint for the appropriate challenge service. Clients with tokens can then send one in an HTTP header when requesting the page BODY; clients without tokens could be served the usual CAPTCHA or handled with some other fallback mechanism. When the CDN proxy receives a valid token, it would mark the nonce value as used to prevent later replays.

The CloudFlare implementation adds a specific blinding protocol built on RSA signatures, although it suggests that an elliptic-curve blinding protocol described [Wayback Archive] by Matthew Green might be preferable. Davidson also describes the token format with more specificity (namely that the nonce used is a random 30-byte sequence) and describes a JSON-based request-and-response protocol.

The CloudFlare work also reportedly includes an extension implementation for Tor Browser (referred to erroneously in the documentation as a browser plugin), although that code has not yet been released. In his email, Davidson said it "is not completely finished yet" but that it would be released as open-source software when ready.

Naturally, the scheme needs to be subjected to scrutiny before being deployed in the wild. Both the original proposal and Davidson's update note several potential security considerations. For example, malicious users could potentially stockpile large numbers of tokens for later reuse, or a man-in-the-middle attacker could inject <meta> tags causing a browser to exhaust its token supply.

What countermeasures could be added to thwart such attacks is an open question. Still, Davidson, Tankersley, and Valsorda have expressed their intent to submit the protocol as an IETF draft. Whether or not that process is successful, it seems likely that a version of the scheme will be made public in the somewhat near future, enabling testing under real-world conditions by Tor and CloudFlare. While the protocol could prove useful to many others, the major players in this current standoff both seem to be on board, which is good news for many Tor users.

(Log in to post comments)

CloudFlare, Tor, and eliminating CAPTCHAs

Posted Oct 6, 2016 9:02 UTC (Thu) by riteshsarraf (subscriber, #11138) [Link]

"..............protecting sites against harmful traffic."

Given how slow tor operates, I still wonder what "harmful traffic" can a tor user generate. The one thing I recollect is using garbage http traffic, targeted against sites, using tor as a proxy. But then again, given how slow tor operates, I don't see that practical.

CloudFlare, Tor, and eliminating CAPTCHAs

Posted Oct 6, 2016 11:02 UTC (Thu) by pabs (subscriber, #43278) [Link]

Tor is often faster than just the DNS lookups on my system.

CloudFlare, Tor, and eliminating CAPTCHAs

Posted Oct 7, 2016 1:15 UTC (Fri) by flussence (subscriber, #85566) [Link]

Harmful traffic isn't limited to automated things like DDoS and cracking attempts: it can also mean direct attacks against the human on the other end, the sort of abuses of telecommunications that would have been sorted out in the past by law enforcement making calls to the perp's ISP and/or VPN provider.

The trouble is that there's no technological solution to that problem that won't also affect Tor's original well-intentioned uses negatively in equal amount. The best they can do is to make automated attacks harder while letting humans (of any disposition) through.