This is a very clever attack. I wonder if the same attack can be used on other sites or if it exploits something about Github.
Github's data is difficult to cache and many pages load piecemeal using turbolinks which itself creates lots of un-cacheable requests (cacheable only until someone pushes a new commit).
So it would appear to be next to impossible to stop a distributed attack.
> I wonder if the same attack can be used on other sites or if it exploits something about Github.
It doesn't exploit anything GitHub-specific! The way it's done is applicable to any site. The reason is that it uses the <script> loophole around the Same Origin Policy (<script> can be loaded cross-domain, thanks Eich...). They basically just inject this every two seconds:
The browser will request that page expecting a script. And it'll get HTML, but that's not valid JS and just ignore it, but the DDOS is successful.
However, this also makes all sites with the malicious JS vulnerable to an XSS attack by GitHub, like GitHub is currently doing. If you visit that URL, you get this:
alert("WARNING: malicious javascript detected on this domain")
Though I think the same trick could be done with, say, <img> or <style>, and those wouldn't allow XSS (though <style> could fuck with the page, certainly). Sloppy coding, Chinese Government employee...
I was wondering if you can prevent this by looking at the "Accept" header in the request but it seems accept is "/" for scripts. I'd expect the browsers rather to send "application/javascript".
Then the answer from my server will be 400 because I don't have a javascript representation of this url.
> Though I think the same trick could be done with, say, <img> or <style>, and those wouldn't allow XSS
You can XSS with SVG "images" [1]. Though up-to-date browsers should be patched against this.
The other option is having an image which said the same as the alert() message. Again, using SVG, this needn't be much bigger file size than the JS response [2]
Sadly you can't guarantee that. If it was an advert, then yes, most likely it would be visible (baring ad blockers, but then they should hopefully block the attack anyway so that's a non-issue). But if it was a tracking image, then the dimensions would likely only be 1px^2.
I use those two specific examples (ad and tracking) because that seems to be the two instances in which this JS was MITM'ed.
> They basically just inject this every two seconds:
Isn't that the difficult part, ie. to intercept a request to a server and inject malicious code. In this case as all request goes through the great firewall, the hack was trivial.
The retaliation from GitHub is quite clever although not great for the user or Baidu.
Github's website is very easy to cache: read heavy, non-realtime. Especially if you consider they can go into a cache-friendly mode where they disable push notifications ("you just pushed to this branch", etc).
They look real time now, but that is best effort: they don't have to be. Nobody will lament Github suddenly saying "we're under heavy load, changes will take a minute to propagate and real time notifications are turned off".
Note that this attack is read-only; there's no creating new issues or PRs or any other write operation (that would be different).
It's comparable to Wikipedia, which has close to 100% cache hits on popular pages. (sorry can't find the source for this right now)
The git repositories are another story, but that's not so easily attacked through JS.
Still, with an attack of this magnitude, no matter how cacheable, you're going to feel it.
PS: I forgot about one thing; their HTTP interface to diffs. That's a huge surface of fresh data to request which will have to go to the backend. Like you could do with Wikipedia history diffs. Perhaps they would have to cut that off for users who do not have a cookie set from a project or user home page... Okay, I spoke too soon. Github has a huge amount of fresh data to request and a targeted attack on things like git diffs (let every user request a different diff) can't just be solved by HTTP caches.
> I forgot about one thing; their HTTP interface to diffs.
Right -- pretty much any page that uses pjax / turbolinks to load segments of the page: each is an expensive query going to the backend.
GH recently added a timeout for the diff page if it's too large which probably also caches the "too large" status. The sweet spot for an attack would be the pages that don't time out but still create a 95th percentile request.
Github's data is difficult to cache and many pages load piecemeal using turbolinks which itself creates lots of un-cacheable requests (cacheable only until someone pushes a new commit).
So it would appear to be next to impossible to stop a distributed attack.