urlencode vs rawurlencode

Question

If I want to create a URL using a variable I have two choices to encode the string  urlencode   and rawurlencode     What exactly are the differences and which is preferred

User · Answer

Spaces encoded as  20 vs     The biggest reason I ve seen to use rawurlencode   in most cases is because urlencode encodes text spaces as    plus signs  where rawurlencode encodes them as the commonly-seen  20   echo urlencode  red shirt       red shirt  echo rawurlencode  red shirt       red 20shirt   I have specifically seen certain API endpoints that accept encoded text queries expect to see  20 for a space and as a result  fail if a plus sign is used instead  Obviously this is going to differ between API implementations and your mileage may vary

User · Answer

I believe spaces must be encoded as     20 when used inside URL path component   when used inside URL query string component or form data  see 17 13 4 Form content types    The following example shows the correct use of rawurlencode and urlencode   echo  http   example com          category     rawurlencode  latest songs           search q     urlencode  lady gaga      Output   http   example com category latest 20songs search q lady gaga     What happens if you encode path and query string components the other way round  For the following example   http   example com category latest songs search q lady 20gaga    The webserver will look for the directory latest songs instead of latest songs The query string parameter q will contain lady gaga

User · Answer

simple    rawurlencode the path  - path is the part before the     - spaces must be encoded as  20    urlencode the query string  - Query string is the part after the     -spaces are better encoded as       rawurlencode is more compatible generally

User · Answer

The difference is in the return values  i e   urlencode        Returns a string in which all   non-alphanumeric characters except -     have been replaced with a percent       sign followed by two hex digits and   spaces encoded as plus     signs  It   is encoded the same way that the   posted data from a WWW form is   encoded  that is the same way as in   application x-www-form-urlencoded   media type  This differs from the      RFC 1738 encoding  see rawurlencode      in that for historical reasons  spaces   are encoded as plus     signs    rawurlencode        Returns a string in which all   non-alphanumeric characters except -     have been replaced with a percent       sign followed by two hex digits  This   is the encoding described in    RFC   1738 for protecting literal characters   from being interpreted as special URL   delimiters  and for protecting URLs   from being mangled by transmission   media with character conversions  like   some email systems     The two are very similar  but the latter  rawurlencode  will replace spaces with a     and two hex digits  which is suitable for encoding passwords or such  where a     is not e g    echo   lt a href  ftp   user    rawurlencode  foo                ftp example com x txt  gt      Outputs  lt a href  ftp   user foo 20 40 2B 25 2F ftp example com x txt  gt

User · Answer

I believe urlencode is for query parameters  whereas the rawurlencode is for the path segments  This is mainly due to  20 for path segments vs   for query parameters  See this answer which talks about the spaces  When to encode space to plus     or  20   However  20 now works in query parameters as well  which is why rawurlencode is always safer  However the plus sign tends to be used where user experience of editing and readability of query parameters matter   Note that this means rawurldecode does not decode   into spaces  http   au2 php net manual en function rawurldecode php   This is why the   GET is always automatically passed through urldecode  which means that   and  20 are both decoded into spaces   If you want the encoding and decoding to be consistent between inputs and outputs and you have selected to always use   and not  20 for query parameters  then urlencode is fine for query parameters  key and value    The conclusion is   Path Segments - always use rawurlencode rawurldecode  Query Parameters - for decoding always use urldecode  done automatically   for encoding  both rawurlencode or urlencode is fine  just choose one to be consistent  especially when comparing URLs

User · Answer

urlencode  This differs from the      RFC 1738 encoding  see   rawurlencode    in that for historical   reasons  spaces are encoded as plus       signs

User · Answer

echo rawurlencode  http   www google com index html id asd asd      yields  http 3A 2F 2Fwww google com 2Findex html 3Fid 3Dasd 20asd   while  echo urlencode  http   www google com index html id asd asd      yields  http 3A 2F 2Fwww google com 2Findex html 3Fid 3Dasd asd   The difference being the asd 20asd vs asd asd  urlencode differs from RFC 1738 by encoding spaces as   instead of  20

User · Answer

It will depend on your purpose  If interoperability with other systems is important then it seems rawurlencode is the way to go  The one exception is legacy systems which expect the query string to follow form-encoding style of spaces encoded as   instead of  20  in which case you need urlencode    rawurlencode follows RFC 1738 prior to PHP 5 3 0 and RFC 3986 afterwards  see http   us2 php net manual en function rawurlencode php      Returns a string in which all non-alphanumeric characters except -    have been replaced with a percent     sign followed by two hex digits  This is the encoding described in    RFC 3986 for protecting literal characters from being interpreted as special URL delimiters  and for protecting URLs from being mangled by transmission media with character conversions  like some email systems      Note on RFC 3986 vs 1738  rawurlencode prior to php 5 3 encoded the tilde character     according to RFC 1738  As of PHP 5 3  however  rawurlencode follows RFC 3986 which does not require encoding tilde characters   urlencode encodes spaces as plus signs  not as  20 as done in rawurlencode  see http   us2 php net manual en function urlencode php      Returns a string in which all non-alphanumeric characters except -   have been replaced with a percent     sign followed by two hex digits and spaces encoded as plus     signs  It is encoded the same way that the posted data from a WWW form is encoded  that is the same way as in application x-www-form-urlencoded media type  This differs from the    RFC 3986 encoding  see rawurlencode    in that for historical reasons  spaces are encoded as plus     signs     This corresponds to the definition for application x-www-form-urlencoded in RFC 1866   Additional Reading   You may also want to see the discussion at http   bytes com groups php 5624-urlencode-vs-rawurlencode   Also  RFC 2396 is worth a look  RFC 2396 defines valid URI syntax  The main part we re interested in is from 3 4 Query Component       Within a query component  the characters                                  amp                   and     are reserved    As you can see  the   is a reserved character in the query string and thus would need to be encoded as per RFC 3986  as in rawurlencode

User · Answer

One practical reason to choose one over the other is if you re going to use the result in another environment  for example JavaScript   In PHP urlencode  test 1   returns  test 1  while rawurlencode  test 1   returns  test 201  as result   But if you need to  decode  this in JavaScript using decodeURI   function then decodeURI  test 1   will give you  test 1  while decodeURI  test 201   will give you  test 1  as result   In other words the space       encoded by urlencode to plus       in PHP will not be properly decoded by decodeURI in JavaScript   In such cases the rawurlencode PHP function should be used

User · Answer

Proof is in the source code of PHP   I ll take you through a quick process of how to find out this sort of thing on your own in the future any time you want  Bear with me  there ll be a lot of C source code you can skim over  I explain it   If you want to brush up on some C  a good place to start is our SO wiki   Download the source  or use http   lxr php net  to browse it online   grep all the files for the function name  you ll find something such as this   PHP 5 3 6  most recent at time of writing  describes the two functions in their native C code in the file url c   RawUrlEncode    PHP FUNCTION rawurlencode        char  in str   out str      int in str len  out str len       if  zend parse parameters ZEND NUM ARGS   TSRMLS CC   s    amp in str                                 amp in str len     FAILURE            return             out str   php raw url encode in str  in str len   amp out str len       RETURN STRINGL out str  out str len  0       UrlEncode    PHP FUNCTION urlencode        char  in str   out str      int in str len  out str len       if  zend parse parameters ZEND NUM ARGS   TSRMLS CC   s    amp in str                                 amp in str len     FAILURE            return             out str   php url encode in str  in str len   amp out str len       RETURN STRINGL out str  out str len  0       Okay  so what s different here   They both are in essence calling two different internal functions respectively  php raw url encode and php url encode  So go look for those functions   Lets look at php raw url encode  PHPAPI char  php raw url encode char const  s  int len  int  new length        register int x  y      unsigned char  str       str    unsigned char    safe emalloc 3  len  1       for  x   0  y   0  len--  x    y              str y     unsigned char  s x    ifndef CHARSET EBCDIC         if   str y   lt   0   amp  amp  str y      -   amp  amp  str y                          str y   lt   A   amp  amp  str y   gt   9                   str y   gt   Z   amp  amp  str y   lt   a   amp  amp  str y                          str y   gt   z   amp  amp  str y                         str y                       str y      hexchars  unsigned char  s x   gt  gt  4               str y    hexchars  unsigned char  s x   amp  15    else   CHARSET EBCDIC           if   isalnum str y    amp  amp  strchr   -     str y      NULL                str y                       str y      hexchars os toascii  unsigned char  s x    gt  gt  4               str y    hexchars os toascii  unsigned char  s x    amp  15    endif   CHARSET EBCDIC                       str y      0       if  new length             new length   y            return   char    str       And of course  php url encode   PHPAPI char  php url encode char const  s  int len  int  new length        register unsigned char c      unsigned char  to   start      unsigned char const  from   end       from    unsigned char   s      end    unsigned char   s   len      start   to    unsigned char    safe emalloc 3  len  1        while  from  lt  end            c    from             if  c                        to           ifndef CHARSET EBCDIC           else if   c  lt   0   amp  amp  c     -   amp  amp  c                                c  lt   A   amp  amp  c  gt   9                          c  gt   Z   amp  amp  c  lt   a   amp  amp  c                                c  gt   z                  to 0                     to 1    hexchars c  gt  gt  4               to 2    hexchars c  amp  15               to    3   else   CHARSET EBCDIC             else if   isalnum c   amp  amp  strchr   -    c     NULL                   Allow only alphanumeric chars and       -        escape the rest                to 0                     to 1    hexchars os toascii c   gt  gt  4               to 2    hexchars os toascii c   amp  15               to    3   endif   CHARSET EBCDIC             else                to     c                       to   0      if  new length             new length   to - start            return  char    start      One quick bit of knowledge before I move forward  EBCDIC is another character set  similar to ASCII  but a total competitor  PHP attempts to deal with both  But basically  this means byte EBCDIC 0x4c byte isn t the L in ASCII  it s actually a  lt   I m sure you see the confusion here   Both of these functions manage EBCDIC if the web server has defined it   Also  they both use an array of chars  think string type  hexchars look-up to get some values  the array is described as such      rfc1738         The characters                                 and   amp   are the characters which may be    reserved for special meaning within a scheme           Thus  only alphanumerics  the special characters   -            and    reserved characters used for their reserved purposes may be used    unencoded within a URL        For added safety  we only leave -   unencoded       static unsigned char hexchars      0123456789ABCDEF     Beyond that  the functions are really different  and I m going to explain them in ASCII and EBCDIC   Differences in ASCII   URLENCODE    Calculates a start end length of the input string  allocates memory Walks through a while-loop  increments until we reach the end of the string Grabs the present character If the character is equal to ASCII Char 0x20  ie  a  space    add a   sign to the output string  If it s not a space  and it s also not alphanumeric  isalnum c    and also isn t and    -  or   character  then we   output a   sign to array position 0  do an array look up to the hexchars array  for a lookup for os toascii array  an array from Apache that translates char to hex code  for the key of c  the present character   we then bitwise shift right by 4  assign that value to the character 1  and to position 2 we assign the same lookup  except we preform a logical and to see if the value is 15  0xF   and return a 1 in that case  or a 0 otherwise  At the end  you ll end up with something encoded  If it ends up it s not a space  it s alphanumeric or one of the  -  chars  it outputs exactly what it is    RAWURLENCODE    Allocates memory for the string Iterates over it based on length provided in function call  not calculated in function as with URLENCODE     Note  Many programmers have probably never seen a for loop iterate this way  it s somewhat hackish and not the standard convention used with most for-loops  pay attention  it assigns x and y  checks for exit on len reaching 0  and increments both x and y  I know  it s not what you d expect  but it s valid code    Assigns the present character to a matching character position in str  It checks if the present character is alphanumeric  or one of the  -  chars  and if it isn t  we do almost the same assignment as with URLENCODE where it preforms lookups  however  we increment differently  using y   rather than to 1   this is because the strings are being built in different ways  but reach the same goal at the end anyway  When the loop s done and the length s gone  It actually terminates the string  assigning the  0 byte   It returns the encoded string    Differences    UrlEncode checks for space  assigns a   sign  RawURLEncode does not  UrlEncode does not assign a  0 byte to the string  RawUrlEncode does  this may be a moot point  They iterate differntly  one may be prone to overflow with malformed strings  I m merely suggesting this and I haven t actually investigated    They basically iterate differently  one assigns a   sign in the event of ASCII 20   Differences in EBCDIC   URLENCODE    Same iteration setup as with ASCII Still translating the  space  character to a   sign  Note-- I think this needs to be compiled in EBCDIC or you ll end up with a bug  Can someone edit and confirm this  It checks if the present char is a char before 0  with the exception of being a   or -  OR less than A but greater than char 9  OR greater than Z and less than a but not a    OR greater than z  yeah  EBCDIC is kinda messed up to work with   If it matches any of those  do a similar lookup as found in the ASCII version  it just doesn t require a lookup in os toascii     RAWURLENCODE    Same iteration setup as with ASCII Same check as described in the EBCDIC version of URL Encode  with the exception that if it s greater than z  it excludes   from the URL encode  Same assignment as the ASCII RawUrlEncode Still appending the  0 byte to the string before return    Grand Summary   Both use the same hexchars lookup table URIEncode doesn t terminate a string with  0  raw does  If you re working in EBCDIC I d suggest using RawUrlEncode  as it manages the   that UrlEncode does not  this is a reported issue   It s worth noting that ASCII and EBCDIC 0x20 are both spaces  They iterate differently  one may be faster  one may be prone to memory or string based exploits  URIEncode makes a space into    RawUrlEncode makes a space into  20 via array lookups    Disclaimer  I haven t touched C in years  and I haven t looked at EBCDIC in a really really long time  If I m wrong somewhere  let me know   Suggested implementations  Based on all of this  rawurlencode is the way to go most of the time  As you see in Jonathan Fingland s answer  stick with it in most cases  It deals with the modern scheme for URI components  where as urlencode does things the old school way  where   meant  space    If you re trying to convert between the old format and new formats  be sure that your code doesn t goof up and turn something that s a decoded   sign into a space by accidentally double-encoding  or similar  oops  scenarios around this space 20    issue   If you re working on an older system with older software that doesn t prefer the new format  stick with urlencode  however  I believe  20 will actually be backwards compatible  as under the old standard  20 worked  just wasn t preferred  Give it a shot if you re up for playing around  let us know how it worked out for you   Basically  you should stick with raw  unless your EBCDIC system really hates you  Most programmers will never run into EBCDIC on any system made after the year 2000  maybe even 1990  that s pushing  but still likely in my opinion

User · Answer

1  What exactly are the differences and  The only difference is in the way spaces are treated   urlencode - based on legacy implementation converts spaces to    rawurlencode - based on RFC 1738 translates spaces to  20  The reason for the difference is because   is reserved and valid  unencoded  in urls     2  which is preferred      I d really like to see some reasons for choosing one over the other     I want to be able to just pick one and use it forever with the least fuss    Fair enough  I have a simple strategy that I follow when making these decisions which I will share with you in the hope that it may help   I think it was the HTTP 1 1 specification RFC 2616 which called for  Tolerant applications       Clients SHOULD be tolerant in parsing the Status-Line and servers      tolerant when parsing the Request-Line    When faced with questions like these the best strategy is always to consume as much as possible and produce what is standards compliant   So my advice is to use rawurlencode to produce standards compliant RFC 1738 encoded strings and use urldecode to be backward compatible and accomodate anything you may come across to consume   Now you could just take my word for it but lets prove it shall we     php  gt   url    lt  lt  lt  EOD   lt  lt  lt   gt   Which    of Alice s tasks saw  s   earnings    lt  lt  lt   gt  EOD  php  gt  echo  url  PHP EOL   Which    of Alice s tasks saw  s   earnings   php  gt  echo urlencode  url   PHP EOL   22Which 2C  25 of Alice 27s tasks saw  24s  40 earnings 3F 22 php  gt  echo rawurlencode  url   PHP EOL   22Which 2C 20 25 20of 20Alice 27s 20tasks 20saw 20 24s 20 40 20earnings 3F 22 php  gt  echo rawurldecode urlencode  url    PHP EOL   Which    of Alice s tasks saw  s   earnings   php  gt     oops that s not right    php  gt  echo urldecode rawurlencode  url    PHP EOL   Which    of Alice s tasks saw  s   earnings   php  gt     now that s more like it   It would appear that PHP had exactly this in mind  even though I ve never come across anyone refusing either of the two formats  I cant think of a better strategy to adopt as your defacto strategy  can you    nJoy

[php] urlencode vs rawurlencode?

Spaces encoded as `%20` vs. `+`

Examples related to php

Examples related to urlencode

Examples related to url-encoding

[php] urlencode vs rawurlencode?

Spaces encoded as %20 vs. +

Examples related to php

Examples related to urlencode

Examples related to url-encoding

Spaces encoded as `%20` vs. `+`