Within technical literature, URL encoding, UTF encoding, escape-encoding, percent-encoding, and Web encoding are used interchangeably. To obtain a better understanding of malicious attacks such as XSS or SQL injection attacks, you need to gain an insight into URL encoding techniques.
Web applications transfer data between the client and the server using the HTTP or HTTPS protocols. Normally, all user input may be passed to the server either in the HTTP headers (submitted through the cookie field or the post data field) or included in the query portion of the requested URL. If the data is transferred by a URL, it has to be specially encoded to obey the proper syntax rules of URLs.
The standard (RFC2396) distinguishes between two types of character class:
● The unreserved class comprises the characters:
○ a-z, A-Z, 0-9 _ . ! ~ * # ( )
● The reserved class contains the following characters:
○ ; / ? : @ & = + $ ,
Characters from the reserved class could conflict with the correct interpretation of a URL. Escape-encoding allows the correct syntax interpretation of these reserved characters. The URL encoding is achieved by a triplet sequence consisting of a percentage character (%) followed by the two hexadecimal digits representing the octet code of the original character.
The percentage character acts as the escape indicator within an URL and therefore has to be escaped itself as "%25" in order to be used as data in a URL. For URI encoding, we recommend that you ensure you do not escape or un-escape the same string more than once, since un-escaping a string that has already been un-escaped might lead to the misinterpretation of a percentage data character as another escaped character, or the converse in the case of escaping an already-escaped string.
Multiple escape-encoding at different layers of an application might circumvent security checks during the initial decoding pass. An example of multiple escape-encoding of this type is shown below using the character sequence “\” or “..\”.
The backslash “\” can be described as “%5c” or the following permutations:
Examples of Possible URL Attacks
These different escape-encoding sequences give an example of possible entry points for the following URL attacks.
URL Attack as a Multiple Decoding Attack
Example URL Attack
The directory list of C:\ is revealed.
URL Attack as an XSS Attack
Example URL Attack
URL Attack as an SQL Injection Attack
Example URL Attack
Executed database query:
SELECT preferences FROM logintable WHERE userid=’bob’; update logintable set password=’0wn3d’
What Do I Need to Do?
The different character encoding schemes and their variety of applications offer an infinite number of malicious encodings. The developer is therefore responsible for securing his or her application against encoding attacks of this type, in accordance with the following rules:
● Read the ‘Request for Comments’ (RFC) 3986 on Uniform Resource Identifier (URI):generic syntax carefully for the correct syntax processing of URLs (search at www.rfc-editor.org).
● User input has to be regarded as potentially malicious code.
● Avoid submitting data using the ‘GET’ method, because the data is appended to the URL and can be easily manipulated. It is better to use the ‘POST’ method instead.
● Do not rely on client-side content checks.
● Validate and sanitize all data on the server side.
Always restrict the type of acceptable data as much as possible using strict validation rules.
● Always perform independent validation and sanity checks of the supplied data.
● Ensure that the application does not repeat any character-decoding process. Decoding should be done by the operating system. If the data remains encoded or contains unacceptable characters, treat the data as malicious and reject the input.
● Thoroughly test your application for system behavior on encoded and incorrect data formats.