Java equivalent to JavaScript s encodeURIComponent that produces identical output

Question

I ve been experimenting with various bits of Java code trying to come up with something that will encode a string containing quotes  spaces and  exotic  Unicode characters and produce output that s identical to JavaScript s encodeURIComponent function   My torture test string is   A  B       If I enter the following JavaScript statement in Firebug   encodeURIComponent   A  B            mdash Then I get     22A 22 20B 20 C2 B1 20 22    Here s my little test Java program   import java io UnsupportedEncodingException  import java net URLEncoder   public class EncodingTest     public static void main String   args  throws UnsupportedEncodingException         String s      A   B             System out println  URLEncoder encode returns           URLEncoder encode s   UTF-8          System out println  getBytes returns           new String s getBytes  UTF-8     ISO-8859-1              mdash This program outputs   URLEncoder encode returns  22A 22 B  C2 B1  22 getBytes returns  A  B       Close  but no cigar  What is the best way of encoding a UTF-8 string using Java so that it produces the same output as JavaScript s encodeURIComponent   EDIT  I m using Java 1 4 moving to Java 5 shortly

User · Accepted Answer

Looking at the implementation differences, I see that:

MDC on encodeURIComponent():

literal characters (regex representation): [-a-zA-Z0-9._*~'()!]

Java 1.5.0 documentation on URLEncoder:

literal characters (regex representation): [-a-zA-Z0-9._*]
the space character " " is converted into a plus sign "+".

So basically, to get the desired result, use URLEncoder.encode(s, "UTF-8") and then do some post-processing:

replace all occurrences of "+" with "%20"
replace all occurrences of "%xx" representing any of [~'()!] back to their literal counter-parts

User · Answer

I used  String encodedUrl   new URI null  url  null  toASCIIString    to encode urls  To add parameters after the existing ones in the url I use UriComponentsBuilder

User · Answer

Guava library has PercentEscaper   Escaper percentEscaper   new PercentEscaper  -      false     -     are safe characters  false says PercentEscaper to escape space with   20   not

User · Answer

This is a straightforward example Ravi Wallau s solution   public String buildSafeURL String partialURL  String documentName          throws ScriptException       ScriptEngineManager scriptEngineManager   new ScriptEngineManager        ScriptEngine scriptEngine   scriptEngineManager              getEngineByName  JavaScript         String urlSafeDocumentName   String valueOf scriptEngine              eval  encodeURIComponent      documentName               String safeURL   partialURL   urlSafeDocumentName       return safeURL     public static void main String   args        EncodeURIComponentDemo demo   new EncodeURIComponentDemo        String partialURL    https   www website com document        String documentName    Tom  amp  Jerry Manuscript pdf        try           System out println demo buildSafeURL partialURL  documentName          catch  ScriptException se            se printStackTrace              Output  https   www website com document Tom 20 26 20Jerry 20Manuscript pdf  It also answers the hanging question in the comments by Loren Shqipognja on how to pass a String variable to encodeURIComponent    The method scriptEngine eval   returns an Object  so it can converted to String via String valueOf   among other methods

User · Answer

This is what I m using   private static final String HEX    0123456789ABCDEF    public static String encodeURIComponent String str        if  str    null  return null       byte   bytes   str getBytes StandardCharsets UTF 8       StringBuilder builder   new StringBuilder bytes length        for  byte c   bytes            if  c  gt    a    c  lt    z     c                      c  gt    A    c  lt    Z     c                      c  gt    0    c  lt    9     c     -     c                     builder append  char c           else             builder append                          append HEX charAt c  gt  gt  4  amp  0xf                       append HEX charAt c  amp  0xf               return builder toString        It goes beyond Javascript s by percent-encoding every character that is not an unreserved character according to RFC 3986     This is the oposite conversion   public static String decodeURIComponent String str        if  str    null  return null       int length   str length        byte   bytes   new byte length   3       StringBuilder builder   new StringBuilder length        for  int i   0  i  lt  length              char c   str charAt i           if  c                       builder append c               i    1            else               int j   0              do                   char h   str charAt i   1                   char l   str charAt i   2                   i    3                   h -   0                   if  h  gt   10                        h                             h -   a  -  0                       if  h  gt   6  throw new IllegalArgumentException                        h    10                                     l -   0                   if  l  gt   10                        l                             l -   a  -  0                       if  l  gt   6  throw new IllegalArgumentException                        l    10                                     bytes j       byte  h  lt  lt  4   l                   if  i  gt   length  break                  c   str charAt i                 while  c                      builder append new String bytes  0  j  UTF 8                         return builder toString

User · Answer

I have successfully used the java net URI class like so   public static String uriEncode String string        String result   string      if  null    string            try               String scheme   null              String ssp   string              int es   string indexOf                   if  es  gt  0                    scheme   string substring 0  es                   ssp   string substring es   1                             result    new URI scheme  ssp  null   toString              catch  URISyntaxException usex                   ignore and use string that has syntax error                     return result

User · Answer

I came up with my own version of the encodeURIComponent  because the posted solution has one problem  if there was a   present in the String  which should be encoded  it will converted to a space    So here is my class   import java io UnsupportedEncodingException  import java util BitSet   public final class EscapeUtils           used for the encodeURIComponent function        private static final BitSet dontNeedEncoding       static               dontNeedEncoding   new BitSet 256               a-z         for  int i   97  i  lt   122    i                        dontNeedEncoding set i                        A-Z         for  int i   65  i  lt   90    i                        dontNeedEncoding set i                        0-9         for  int i   48  i  lt   57    i                        dontNeedEncoding set i                                      for  int i   39  i  lt   42    i                        dontNeedEncoding set i                     dontNeedEncoding set 33                dontNeedEncoding set 45      -         dontNeedEncoding set 46                dontNeedEncoding set 95                dontNeedEncoding set 126                              A Utility class should not be instantiated              private EscapeUtils                                Escapes all characters except the following  alphabetic  decimal digits  -                                 param input                   A component of a URI         return the escaped URI component             public static String encodeURIComponent String input                if  input    null                        return input                     StringBuilder filtered   new StringBuilder input length             char c          for  int i   0  i  lt  input length      i                        c   input charAt i               if  dontNeedEncoding get c                                 filtered append c                             else                               final byte   b   charToBytesUTF c                    for  int j   0  j  lt  b length    j                                        filtered append                           filtered append  0123456789ABCDEF  charAt b j   gt  gt  4  amp  0xF                        filtered append  0123456789ABCDEF  charAt b j   amp  0xF                                                      return filtered toString               private static byte   charToBytesUTF char c                try                       return new String new char     c    getBytes  UTF-8                      catch  UnsupportedEncodingException e                        return new byte      byte  c

User · Answer

for me this worked   import org apache http client utils URIBuilder   String encodedString   new URIBuilder      setParameter  i   stringToEncode     build      getRawQuery      output  i encodedString    substring 2     or with a different UriBuilder  import javax ws rs core UriBuilder   String encodedString   UriBuilder fromPath        queryParam  i   stringToEncode     toString        output   i encodedString    substring 3     In my opinion using a standard library is a better idea rather than post processing manually  Also  Chris answer looked good  but it doesn t work for urls  like  http   a b c html

User · Answer

I came up with another implementation documented at  http   blog sangupta com 2010 05 encodeuricomponent-and html  The implementation can also handle Unicode bytes

User · Answer

I have found PercentEscaper class from google-http-java-client library  that can be used to implement encodeURIComponent quite easily   PercentEscaper from google-http-java-client javadoc google-http-java-client home

User · Answer

I use java net URI getRawPath    e g   String s    a b c html   String fixed   new URI null  null  s  null  getRawPath      The value of fixed will be a b 20c html  which is what you want   Post-processing the output of URLEncoder encode   will obliterate any pluses that are supposed to be in the URI  For example  URLEncoder encode  a b c html   replaceAll          20      will give you a 20b 20c html  which will be interpreted as a b c html

User · Answer

Using the javascript engine that is shipped with Java 6     import javax script ScriptEngine  import javax script ScriptEngineManager   public class Wow       public static void main String   args  throws Exception               ScriptEngineManager factory   new ScriptEngineManager            ScriptEngine engine   factory getEngineByName  JavaScript            engine eval  print encodeURIComponent    A   B                        Output   22A 22 20B 20 c2 b1 20 22  The case is different but it s closer to what you want

User · Answer

This is the class I came up with in the end   import java io UnsupportedEncodingException  import java net URLDecoder  import java net URLEncoder          Utility class for JavaScript compatible UTF-8 encoding and decoding          see http   stackoverflow com questions 607176 java-equivalent-to-javascripts-encodeuricomponent-that-produces-identical-output     author John Topley      public class EncodingUtil              Decodes the passed UTF-8 String using an algorithm that s compatible with      JavaScript s  lt code gt decodeURIComponent lt  code gt  function  Returns       lt code gt null lt  code gt  if the String is  lt code gt null lt  code gt              param s The UTF-8 encoded String to be decoded       return the decoded String         public static String decodeURIComponent String s          if  s    null              return null             String result   null       try             result   URLDecoder decode s   UTF-8                  This exception should never occur      catch  UnsupportedEncodingException e              result   s               return result                  Encodes the passed String as UTF-8 using an algorithm that s compatible      with JavaScript s  lt code gt encodeURIComponent lt  code gt  function  Returns       lt code gt null lt  code gt  if the String is  lt code gt null lt  code gt               param s The String to be encoded       return the encoded String         public static String encodeURIComponent String s          String result   null       try             result   URLEncoder encode s   UTF-8                             replaceAll          20                             replaceAll     21                                  replaceAll     27                                  replaceAll     28                                  replaceAll     29                                  replaceAll     7E                       This exception should never occur      catch  UnsupportedEncodingException e              result   s             return result                    Private constructor to prevent this class from being instantiated          private EncodingUtil           super

[java] Java equivalent to JavaScript's encodeURIComponent that produces identical output?

Examples related to java

Examples related to javascript

Examples related to unicode

Examples related to utf-8