Dont just sanitize but also escape – A fable of sanitize_text_field

Dont just sanitize but also escape – A fable of sanitize_text_field

Hey infosec fam!! Recently I was testing out an image upload functionality on a wordpress plugin that led me to stored XSS. In this blog post, I will be writing about an interesting story of how I got XSS even when a filter (sanitize_text_field) was in place.

Let’s get started!


The plugin has the image upload functionality where image can be uploaded either through the default list or from the WordPress uploads folder. I was testing out other parameters for XSS where I found that after the image is taken from the wp-uploads, the image path is mentioned in the response as a URL to wp-content/uploads.

This was interesting and the first thing that came to my mind was to break out of the src to insert an XSS payload. I inserted ” to break out of src attribute and insert onerror attribute so as to execute javascript.

I used this payload:- http://x.x.x.x/image"onerror=alert(1);//.png while intercepting through burp.

When I opened the list of the added items, the malformed HTTP URL of the image got loaded and I got an XSS popup. Yayyyyy!!

But did you noticed one thing? The double quotes in the payload got escaped.

Wondering why XSS happened in the first place?! Lets get to the code review and find out.

Code Review

Since I was testing on a wordpress plugin, I had the option to see whats happening behind the scenes.

So here comes the interesting part! The reason why you see a backslash(\) is because sanitize_text_field is used while taking the image input.

$results = $wpdb->insert(
                         'title' => sanitize_text_field($_POST['title']),
                         'url' => sanitize_text_field($_POST['url']),
                         'image_url' => sanitize_text_field($_POST['image_file']),
                         'sortorder' => sanitize_text_field($_POST['sortorder']),
                         'date_upload' => time(),
                         'target' => sanitize_text_field($_POST['target']),

Let’s talk about what this filter is and what does it sanitize!


From the wordpress official documentation

Sanitizes a string from user input or from the database.

Basic usage

<?php sanitize_text_field( $str ) ?>

So if $str contains " | ' | < | & characters, the function will add a backslash(\) so as to escape the original functionality of that character.

So when I inserted " as part of the payload, it did escaped the double quotes by adding \ therefore making it \"

But the thing to note here is, we don’t us sanitize_text_fields when the reflection/sink from your input is gonna be the part of src attribute

Still wondering why??

Let’s simplify. Here comes the concept of XSS contexts. You as an attacker need to understand and identify where in response, the user-controlled data appears. The next thing to identify is the filters or any other processing that is in place.

The user controlled data can be reflected/stored in:

  • HTML tags: When the context is between the HTML tags, you need to introduce new HTML tags that can execute javascript.
  • HTML tag attributes: When the context is into HTML tag attribute, you can terminate the attribute value with double quotes and insert a new attribute that can execute javascript. There’s also a possibility to close the current tag with “> and insert <script>alert(1)</script>. But you can expect <> to be escaped/blocked 99% times so mostly you will have to play around the `double quotes/single quotes`.
  • In javascript: When the context is in javascript, you can introduce new HTML tags that can trigger the execution of javascript. Example:
</script><img src=x onerror=prompt(1)>

If the user controlled input is inside a string literal, then you can easily breakout with

  • Javascript template literals: These are the string literals that allow embedded Javascript expressions. These expressions are evaluated and then concatenated with the surrounding text.

Consider a script that takes in username and print Hi <username>

document.getElementById('username').innerText = `Hi there!, ${user.userName}.`;

With template literals, there is no need to escape from the quotes. You just simply have to write your payload inside ${..}

var userName = '${alert(document.dmain}';  //user controlled input

And the alert will work.

That was the gist of what contexts are and how they work.

Let’s get back to context that we’ll be dealing with.

In our scenario, the context is inside HTML tag attribute. The backslash(\) added by sanitize_text_field now becomes a part of URL and therefore is unable to escape the double quotes and hence the src attribute is closed. The other part of the payload onerror=alert(1);// now becomes another valid attribute. // is added so that the remaining part of <img tag doesnt cause any barrier.

And tada!!! Its XXS 🙂

Research on other methods

In the recital above, what we learn is: Just having a filter in place doesnt necessarily mean that your code is attack-proof. A proper understanding of what filters are filtering out is important.

I tested down the payload on all the available filters an found 2 more places giving the same effect as sanitize_text_field.

  • wp_strip_all_tags
  • strip_tags

Here the outcome is pretty obvious from the name of filter that it will just stip tags.

So If by any chance these methods are used for a parameter whose sink is in src attribute, it still makes you vulnerable to the above scenario,

Method that can be used

These are a bunch of filters that can do the work for you. In this section. I have collated a list of such filters that behave differently for a single payload: " onerror=alert(1);//

  • urlencode : URL-encodes string

If the method is changed from sanitize_text_field to urlencode, the xss is now gone. See image below.

  • rawurlencode: URL-encode according to RFC 3986
  • esc_textarea: Escaping for textarea values.
  • esc_html: Escaping for HTML blocks.
  • esc_js: Escape single quotes, htmlspecialchar ” &, and fix line endings.
  • sanitize_key: Sanitizes a string key. Keys are used as internal identifiers. Lowercase alphanumeric characters, dashes, and underscores are allowed.

Every filter method is created for a reason and developers need to understand which one to use when.

Key takeaways

Finally, in this last section I want to list a few takeaways

  • Always go through the documentation of methods and read about what are they exactly filtering and does that match your use case.
  • If you are still unsure if the filter can protect you from XSS, try using filters that are nested. For example $str = esc_attr(esc_js($GET['param'])). The combination can be any according to the case in point. This is a nice approach I found while doing source-code review of different plugins.
  • As a pentester, do not skip testing out a functionality if you have already found filters in the source code review. Coz stories like that of sanitize_text_field can happen to anyone.

That’s all for this blog post. Thanks for reading! Hope you enjoyed reading my research.

See you in the next one! Until then, happy hunting 🙂


I am Shreya Pohekar. I love to build and break stuff. Currently, I'm working as iOS and angular developer. I am also a contributor to CodeVigilant project. My blogs are focused on Infosec and Dev and its how to's.

This Post Has 2 Comments

  1. Hritika Sharma

    Great going 😀

Leave a Reply