Things leading up to this point (and obligatory cat picture)
As discussed in our previous post, our research has shown that while the XSSAuditor in Chrome is able to catch quite a few exploits targeting server-side reflected XSS, it is also prone to being bypassed in situations where the attacker does not need to inject full script tags. One of the underlying problems that the Auditor has to face is the fact that it has no idea how the user-provided data (i.e. the GET or POST parameters) is used on the server side. The strategy therefore is to approximate the flow of data by finding matching pieces that exist both in request and response.
Our proposed filter approach
Before we go into detail on our filter, let’s briefly abstract what a Cross-Site Scripting attack is. We argue that it actually is an attack in which the attacker provides data to an application. Since the application does not properly sanitize the data, at some point it ends up being interpreted as code. In general, this is the case for all injection attacks (think about SQL injections or command injections).
Also, in order to ensure that an attacker could not inject a script tag pointing to a remote URL (where all the code contained in the response would be untainted), we implemented rules in the HTML parser that did not allow tainted protocols or domains for such scripts. Similarly, we implemented checks in the DOM API that forbid assignment to such dangerous sinks with tainted values.
Evaluation of our approach
Blocking legitimate sites
In our compatibility crawl, we analyzed a total of 981,453 URLs, consisting of 9,304,036 frames. The following table shows the results of our crawl, the percentages being relative to the number of frames and total domains crawled, respectively.
|bLOCKING COMPONENT||Documents||DOmains||exploitable DOMAINS|
|HTML Parser||8,805 (0.095%)||73 (0.73%)||60|
|DOM API||182 (0.002%)||60 (0.60%)||8|
|SUM||14,966 (0.016%)||183 (1.83%)||90|
As the table clearly shows, there are a number of documents that use programming paradigms which are potentially insecure, such as passing user-provided JSON to eval. In some cases, the flows only occur when a certain criteria is matched. As an example, Google applies a regular expression to JSON-like input to verify that only JSON is passed and an attack is not possible. Interestingly enough, about half of all the domains on which we found policy-violation (our proposed policy, that is) usage of user-provided data could be exploited in exactly those blocked flows. Depending on the definition of a false positive – blockage of a secure component or blockage of any component – this cuts down our FP rate to 0.9% with respect to the domains we analysed. More importantly, blocking of a single functionality does not mean that the page is no longer usable – it rather means that a single component does not function as intended.
To allow pages that properly check the provided data before passing it to eval to continue using this pattern, our prototyped implementation supports and explicit untainting API. Calling untaint() on a string will return a completely taint-free version of the string. While this sounds dangerous in terms of an attacker being able to untaint a string, he is faced with a bootstrapping problem: in order to call the untaint function, he first has to execute his own code – which is blocked by our filter.
The last piece of the puzzle is the performance of the filter. Obviously, the added code causes additional operations that need to be carried out and thus, adds some overhead. The following figure shows the results of running the well-known benchmarks Kraken, Dromaeo, Sunspider and Octane against a vanilla Chromium, our patched version as well as Firefox and IE as comparisons. Note that since strings in these benchmarks do not originate from tainted sources, “Patched Chrome” depicts the overhead that occurs if no string is tainted (just by the added logic and memory overhead), whereas “Patched Chrome (worst)” assumes every string to be tainted to give an upper bound of the overhead.
Summarizing the results of the benchmark, our implementation adds about 7 to 17% overhead. This leaves us miles in front of IE and still faster than Firefox. We do believe that additional performance can be gained quite easily. V8 draws much of its superior speed from using assembler macros rather than C++ code to get “the easy path” on certain functions such as substrings. Changing these is, however, more error-prone which is why we opted to always jump of out the ASM code into the runtime when handling tainted strings. Extending the taint-passing logic into the ASM space would – in our minds – significantly improve the performance of our approach.
Since we wrote the conclusion already for the paper, let’s just c&p it here:
In case of client-side vulnerabilities, our approach reliably and precisely detects injected syntactic content and, thus, is superior in blocking DOM-based XSS. Although our current implementation induces a runtime overhead between 7 and 17%, we believe that an efficient native integration of our approach is feasible. If adopted, our technique would effectively lead to an extinction of DOM-based XSS and, thus, significantly improve the security properties of the Web browser overall.