URL encoding the space character or 20

Question

When is a space in a URL encoded to    and when is it encoded to  20

User · Answer

This confusion is because URLs are still 'broken' to this day.

Take "http://www.google.com" for instance. This is a URL. A URL is a Uniform Resource Locator and is really a pointer to a web page (in most cases). URLs actually have a very well-defined structure since the first specification in 1994.

We can extract detailed information about the "http://www.google.com" URL:

+---------------+-------------------+
|      Part     |      Data         |
+---------------+-------------------+
|  Scheme       | http              |
|  Host         | www.google.com    |
+---------------+-------------------+

If we look at a more complex URL such as:

"https://bob:bobby@www.lunatech.com:8080/file;p=1?q=2#third"

we can extract the following information:

+-------------------+---------------------+
|        Part       |       Data          |
+-------------------+---------------------+
|  Scheme           | https               |
|  User             | bob                 |
|  Password         | bobby               |
|  Host             | www.lunatech.com    |
|  Port             | 8080                |
|  Path             | /file;p=1           |
|  Path parameter   | p=1                 |
|  Query            | q=2                 |
|  Fragment         | third               |
+-------------------+---------------------+

https://bob:bobby@www.lunatech.com:8080/file;p=1?q=2#third
\___/   \_/ \___/ \______________/ \__/\_______/ \_/ \___/
  |      |    |          |          |      | \_/  |    |
Scheme User Password    Host       Port  Path |   | Fragment
        \_____________________________/       | Query
                       |               Path parameter
                   Authority

The reserved characters are different for each part.

For HTTP URLs, a space in a path fragment part has to be encoded to "%20" (not, absolutely not "+"), while the "+" character in the path fragment part can be left unencoded.

Now in the query part, spaces may be encoded to either "+" (for backwards compatibility: do not try to search for it in the URI standard) or "%20" while the "+" character (as a result of this ambiguity) has to be escaped to "%2B".

This means that the "blue+light blue" string has to be encoded differently in the path and query parts:

"http://example.com/blue+light%20blue?blue%2Blight+blue".

From there you can deduce that encoding a fully constructed URL is impossible without a syntactical awareness of the URL structure.

This boils down to:

You should have %20 before the ? and + after.

Source

User · Answer

I would recommend  20   Are you hard-coding them   This is not very consistent across languages  though  If I m not mistaken  in PHP urlencode   treats spaces as   whereas Python s urlencode   treats them as  20   EDIT   It seems I m mistaken  Python s urlencode    at least in 2 7 2  uses quote plus   instead of quote   and thus encodes spaces as      It seems also that the W3C recommendation is the     as per here  http   www w3 org TR html4 interact forms html h-17 13 4 1  And in fact  you can follow this interesting debate on Python s own issue tracker about what to use to encode spaces  http   bugs python org issue13866   EDIT  2   I understand that the most common way of encoding     is as      but just a note  it may be just me  but I find this a bit confusing   import urllib print urllib urlencode                 gt  gt  gt      2B

User · Answer

A space may only be encoded to     in the  application x-www-form-urlencoded  content-type key-value pairs query part of an URL  In my opinion  this is a MAY  not a MUST  In the rest of URLs  it is encoded as  20   In my opinion  it s better to always encode spaces as  20  not as      even in the query part of an URL  because it is the HTML specification  RFC-1866  that specified that space characters should be encoded as     in  application x-www-form-urlencoded  content-type key-value pairs  see paragraph 8 2 1  subparagraph 1    This way of encoding form data is also given in later HTML specifications  For example  look for relevant paragraphs about application x-www-form-urlencoded in HTML 4 01 Specification  and so on   Here is a sample string in URL where the HTML specification allows encoding spaces as pluses   http   example com over there name foo bar   So  only after      spaces can be replaced by pluses  In other cases  spaces should be encoded to  20  But since it s hard to correctly determine the context  it s the best practice to never encode spaces as       I would recommend to percent-encode all character except  unreserved  defined in RFC-3986  p 2 3  unreserved   ALPHA   DIGIT    -                      The implementation depends on the programming language that you chose   If your URL contains national characters  first encode them to UTF-8 and then percent-encode the result

User · Answer

From Wikipedia  emphasis and link added       When data that has been entered into HTML forms is submitted  the form field names and values are encoded and sent to the server in an HTTP request message using method GET or POST  or  historically  via email  The encoding used by default is based on a very early version of the general URI percent-encoding rules  with a number of modifications such as newline normalization and replacing spaces with     instead of   20   The MIME type of data encoded this way is application x-www-form-urlencoded  and it is currently defined  still in a very outdated manner  in the HTML and XForms specifications    So  the real percent encoding uses  20 while form data in URLs is in a modified form that uses    So you re most likely to only see   in URLs in the query string after an

[url] URL encoding the space character: + or %20?

Examples related to url

Examples related to url-encoding