[http] Is a slash ("/") equivalent to an encoded slash ("%2F") in the path portion of an HTTP URL

I have a site that treats "/" and "%2F" in the path portion (not the query string) of a URL differently. Is this a bad thing to do according to either the RFC or the real world?

I ask because I keep running into little surprises with the web framework I'm using (Ruby on Rails) as well as the layers below that (Passenger, Apache, e.g., I had to enable "ALLOW_ENCODED_SLASHES" for Apache). I am now leaning toward getting rid of the encoded slashes completely, but I wonder if I should be filing bug reports where I see weird behavior involving the encoded slashes.

As to why I have the encoded slashes in the first place, basically I have routes such as this:

:controller/:foo/:bar

where :foo is something like a path that can contain slashes. I thought the most straightforward thing to do would be to just URL escape foo so the slashes are ignored by the routing mechanism. Now I am having doubts, and it's pretty clear that the frameworks don't really support this, but according to the RFC is it wrong to do it this way?

Here is some information I have gathered:

RFC 1738 (URLs):

Usually a URL has the same interpretation when an octet is represented by a character and when it encoded. However, this is not true for reserved characters: encoding a character reserved for a particular scheme may change the semantics of a URL.

RFC 2396 (URIs):

These characters are called "reserved", since their usage within the URI component is limited to their reserved purpose. If the data for a URI component would conflict with the reserved purpose, then the conflicting data must be escaped before forming the URI.

(does escaping here mean something other than encoding the reserved character?)

RFC 2616 (HTTP/1.1):

Characters other than those in the "reserved" and "unsafe" sets (see RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.

There is also this bug report for Rails, where they seem to expect the encoded slash to behave differently:

Right, I'd expect different results because they're pointing at different resources.

It's looking for the literal file 'foo/bar' in the root directory. The non escaped version is looking for the file bar within directory foo.

It's clear from the RFCs that raw vs. encoded is the equivalent for unreserved characters, but what is the story for reserved characters?

This question is related to http url encoding

The answer is


The story of %2F vs / was that, according to the initial W3C recommendations, slashes «must imply a hierarchical structure»:

The slash ("/", ASCII 2F hex) character is reserved for the delimiting of substrings whose relationship is hierarchical. This enables partial forms of the URI.

Example 2

The URIs

http://www.w3.org/albert/bertram/marie-claude

and

http://www.w3.org/albert/bertram%2Fmarie-claude

are NOT identical, as in the second case the encoded slash does not have hierarchical significance.


If you use Tomcat, add '-Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true' in VM properties.

https://tomcat.apache.org/tomcat-7.0-doc/config/systemprops.html#Security


encodeURI()/decodeURI and encodeURIComponent()/decodeURIComponent are utility functions to handle this. Read more here https://stackabuse.com/javascripts-encodeuri-function/


I also have a site that has numerous urls with urlencoded characters. I am finding that many web APIs (including Google webmaster tools and several Drupal modules) trip over urlencoded characters. Many APIs automatically decode urls at some point in their process and then use the result as a URL or HTML. When I find one of these problems, I usually double encode the results (which turns %2f into %252f) for that API. However, this will break other APIs which are not expecting double encoding, so this is not a universal solution.

Personally I am getting rid of as many special characters in my URLs as possible.

Also, I am using id numbers in my URLs which do not depend on urldecoding:

example.com/blog/my-amazing-blog%2fstory/yesterday

becomes:

example.com/blog/12354/my-amazing-blog%2fstory/yesterday

in this case, my code only uses 12354 to look for the article, and the rest of the URL gets ignored by my system (but is still used for SEO.) Also, this number should appear BEFORE the unused URL components. that way, the url will still work, even if the %2f gets decoded incorrectly.

Also, be sure to use canonical tags to ensure that url mistakes don't translate into duplicate content.


What to do if :foo in its natural form contains slashes? You wouldn't want it to Isn't that the distinction the recommendation is attempting to preserve? It specifically notes,

The similarity to unix and other disk operating system filename conventions should be taken as purely coincidental, and should not be taken to indicate that URIs should be interpreted as file names.

If one was building an online interface to a backup program, and wished to express the path as a part of the URL path, it would make sense to encode the slashes in the file path, as that is not really part of the hierarchy of the resource - and more importantly, the route. /backups/2016-07-28content//home/dan/ loses the root of the filesystem in the double slash. Escaping the slashes is the appropriate way to distinguish, as I read it.


Examples related to http

Access blocked by CORS policy: Response to preflight request doesn't pass access control check Axios Delete request with body and headers? Read response headers from API response - Angular 5 + TypeScript Android 8: Cleartext HTTP traffic not permitted Angular 4 HttpClient Query Parameters Load json from local file with http.get() in angular 2 Angular 2: How to access an HTTP response body? What is HTTP "Host" header? Golang read request body Angular 2 - Checking for server errors from subscribe

Examples related to url

What is the difference between URL parameters and query strings? Allow Access-Control-Allow-Origin header using HTML5 fetch API File URL "Not allowed to load local resource" in the Internet Browser Slack URL to open a channel from browser Getting absolute URLs using ASP.NET Core How do I load an HTTP URL with App Transport Security enabled in iOS 9? Adding form action in html in laravel React-router urls don't work when refreshing or writing manually URL for public Amazon S3 bucket How can I append a query parameter to an existing URL?

Examples related to encoding

How to check encoding of a CSV file UnicodeEncodeError: 'ascii' codec can't encode character at special name Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? The character encoding of the plain text document was not declared - mootool script UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) How to encode text to base64 in python UTF-8 output from PowerShell Set Encoding of File to UTF8 With BOM in Sublime Text 3 Replace non-ASCII characters with a single space