[java] HTTP URL Address Encoding in Java

My Java standalone application gets a URL (which points to a file) from the user and I need to hit it and download it. The problem I am facing is that I am not able to encode the HTTP URL address properly...

Example:

URL:  http://search.barnesandnoble.com/booksearch/first book.pdf

java.net.URLEncoder.encode(url.toString(), "ISO-8859-1");

returns me:

http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst+book.pdf

But, what I want is

http://search.barnesandnoble.com/booksearch/first%20book.pdf

(space replaced by %20)

I guess URLEncoder is not designed to encode HTTP URLs... The JavaDoc says "Utility class for HTML form encoding"... Is there any other way to do this?

This question is related to java http urlencode

The answer is


If you have a URL, you can pass url.toString() into this method. First decode, to avoid double encoding (for example, encoding a space results in %20 and encoding a percent sign results in %25, so double encoding will turn a space into %2520). Then, use the URI as explained above, adding in all the parts of the URL (so that you don't drop the query parameters).

public URL convertToURLEscapingIllegalCharacters(String string){
    try {
        String decodedURL = URLDecoder.decode(string, "UTF-8");
        URL url = new URL(decodedURL);
        URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef()); 
        return uri.toURL(); 
    } catch (Exception ex) {
        ex.printStackTrace();
        return null;
    }
}

I had the same problem. Solved this by unsing:

android.net.Uri.encode(urlString, ":/");

It encodes the string but skips ":" and "/".


I read the previous answers to write my own method because I could not have something properly working using the solution of the previous answers, it looks good for me but if you can find URL that does not work with this, please let me know.

public static URL convertToURLEscapingIllegalCharacters(String toEscape) throws MalformedURLException, URISyntaxException {
            URL url = new URL(toEscape);
            URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
            //if a % is included in the toEscape string, it will be re-encoded to %25 and we don't want re-encoding, just encoding
            return new URL(uri.toString().replace("%25", "%"));
}

I took the content above and changed it around a bit. I like positive logic first, and I thought a HashSet might give better performance than some other options, like searching through a String. Although, I'm not sure if the autoboxing penalty is worth it, but if the compiler optimizes for ASCII chars, then the cost of boxing will be low.

/***
 * Replaces any character not specifically unreserved to an equivalent 
 * percent sequence.
 * @param s
 * @return
 */
public static String encodeURIcomponent(String s)
{
    StringBuilder o = new StringBuilder();
    for (char ch : s.toCharArray()) {
        if (isSafe(ch)) {
            o.append(ch);
        }
        else {
            o.append('%');
            o.append(toHex(ch / 16));
            o.append(toHex(ch % 16));
        }
    }
    return o.toString();
}

private static char toHex(int ch)
{
    return (char)(ch < 10 ? '0' + ch : 'A' + ch - 10);
}

// https://tools.ietf.org/html/rfc3986#section-2.3
public static final HashSet<Character> UnreservedChars = new HashSet<Character>(Arrays.asList(
        'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z',
        'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',
        '0','1','2','3','4','5','6','7','8','9',
        '-','_','.','~'));
public static boolean isSafe(char ch)
{
    return UnreservedChars.contains(ch);
}

I agree with Matt. Indeed, I've never seen it well explained in tutorials, but one matter is how to encode the URL path, and a very different one is how to encode the parameters which are appended to the URL (the query part, behind the "?" symbol). They use similar encoding, but not the same.

Specially for the encoding of the white space character. The URL path needs it to be encoded as %20, whereas the query part allows %20 and also the "+" sign. The best idea is to test it by ourselves against our Web server, using a Web browser.

For both cases, I ALWAYS would encode COMPONENT BY COMPONENT, never the whole string. Indeed URLEncoder allows that for the query part. For the path part you can use the class URI, although in this case it asks for the entire string, not a single component.

Anyway, I believe that the best way to avoid these problems is to use a personal non-conflictive design. How? For example, I never would name directories or parameters using other characters than a-Z, A-Z, 0-9 and _ . That way, the only need is to encode the value of every parameter, since it may come from an user input and the used characters are unknown.


URLEncoding can encode HTTP URLs just fine, as you've unfortunately discovered. The string you passed in, "http://search.barnesandnoble.com/booksearch/first book.pdf", was correctly and completely encoded into a URL-encoded form. You could pass that entire long string of gobbledigook that you got back as a parameter in a URL, and it could be decoded back into exactly the string you passed in.

It sounds like you want to do something a little different than passing the entire URL as a parameter. From what I gather, you're trying to create a search URL that looks like "http://search.barnesandnoble.com/booksearch/whateverTheUserPassesIn". The only thing that you need to encode is the "whateverTheUserPassesIn" bit, so perhaps all you need to do is something like this:

String url = "http://search.barnesandnoble.com/booksearch/" + 
       URLEncoder.encode(userInput,"UTF-8");

That should produce something rather more valid for you.


You can also use GUAVA and path escaper: UrlEscapers.urlFragmentEscaper().escape(relativePath)


Unfortunately, org.apache.commons.httpclient.util.URIUtil is deprecated, and the replacement org.apache.commons.codec.net.URLCodec does coding suitable for form posts, not in actual URL's. So I had to write my own function, which does a single component (not suitable for entire query strings that have ?'s and &'s)

public static String encodeURLComponent(final String s)
{
  if (s == null)
  {
    return "";
  }

  final StringBuilder sb = new StringBuilder();

  try
  {
    for (int i = 0; i < s.length(); i++)
    {
      final char c = s.charAt(i);

      if (((c >= 'A') && (c <= 'Z')) || ((c >= 'a') && (c <= 'z')) ||
          ((c >= '0') && (c <= '9')) ||
          (c == '-') ||  (c == '.')  || (c == '_') || (c == '~'))
      {
        sb.append(c);
      }
      else
      {
        final byte[] bytes = ("" + c).getBytes("UTF-8");

        for (byte b : bytes)
        {
          sb.append('%');

          int upper = (((int) b) >> 4) & 0xf;
          sb.append(Integer.toHexString(upper).toUpperCase(Locale.US));

          int lower = ((int) b) & 0xf;
          sb.append(Integer.toHexString(lower).toUpperCase(Locale.US));
        }
      }
    }

    return sb.toString();
  }
  catch (UnsupportedEncodingException uee)
  {
    throw new RuntimeException("UTF-8 unsupported!?", uee);
  }
}

I've created a new project to help construct HTTP URLs. The library will automatically URL encode path segments and query parameters.

You can view the source and download a binary at https://github.com/Widen/urlbuilder

The example URL in this question:

new UrlBuilder("search.barnesandnoble.com", "booksearch/first book.pdf").toString()

produces

http://search.barnesandnoble.com/booksearch/first%20book.pdf


In addition to the Carlos Heuberger's reply: if a different than the default (80) is needed, the 7 param constructor should be used:

URI uri = new URI(
        "http",
        null, // this is for userInfo
        "www.google.com",
        8080, // port number as int
        "/ig/api",
        "weather=São Paulo",
        null);
String request = uri.toASCIIString();

You can use a function like this. Complete and modify it to your need :

/**
     * Encode URL (except :, /, ?, &, =, ... characters)
     * @param url to encode
     * @param encodingCharset url encoding charset
     * @return encoded URL
     * @throws UnsupportedEncodingException
     */
    public static String encodeUrl (String url, String encodingCharset) throws UnsupportedEncodingException{
            return new URLCodec().encode(url, encodingCharset).replace("%3A", ":").replace("%2F", "/").replace("%3F", "?").replace("%3D", "=").replace("%26", "&");
    }

Example of use :

String urlToEncode = ""http://www.growup.com/folder/intérieur-à_vendre?o=4";
Utils.encodeUrl (urlToEncode , "UTF-8")

The result is : http://www.growup.com/folder/int%C3%A9rieur-%C3%A0_vendre?o=4


i use this

org.apache.commons.text.StringEscapeUtils.escapeHtml4("my text % & < >");

add this dependecy

 <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-text</artifactId>
        <version>1.8</version>
    </dependency>

Yeah URL encoding is going to encode that string so that it would be passed properly in a url to a final destination. For example you could not have http://stackoverflow.com?url=http://yyy.com. UrlEncoding the parameter would fix that parameter value.

So i have two choices for you:

  1. Do you have access to the path separate from the domain? If so you may be able to simply UrlEncode the path. However, if this is not the case then option 2 may be for you.

  2. Get commons-httpclient-3.1. This has a class URIUtil:

    System.out.println(URIUtil.encodePath("http://example.com/x y", "ISO-8859-1"));

This will output exactly what you are looking for, as it will only encode the path part of the URI.

FYI, you'll need commons-codec and commons-logging for this method to work at runtime.


There is still a problem if you have got an encoded "/" (%2F) in your URL.

RFC 3986 - Section 2.2 says: "If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed." (RFC 3986 - Section 2.2)

But there is an Issue with Tomcat:

http://tomcat.apache.org/security-6.html - Fixed in Apache Tomcat 6.0.10

important: Directory traversal CVE-2007-0450

Tomcat permits '\', '%2F' and '%5C' [...] .

The following Java system properties have been added to Tomcat to provide additional control of the handling of path delimiters in URLs (both options default to false):

  • org.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH: true|false
  • org.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH: true|false

Due to the impossibility to guarantee that all URLs are handled by Tomcat as they are in proxy servers, Tomcat should always be secured as if no proxy restricting context access was used.

Affects: 6.0.0-6.0.9

So if you have got an URL with the %2F character, Tomcat returns: "400 Invalid URI: noSlash"

You can switch of the bugfix in the Tomcat startup script:

set JAVA_OPTS=%JAVA_OPTS% %LOGGING_CONFIG%   -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true 

Maybe can try UriUtils in org.springframework.web.util

UriUtils.encodeUri(input, "UTF-8")

Use the following standard Java solution (passes around 100 of the testcases provided by Web Plattform Tests):

0. Test if URL is already encoded.

1. Split URL into structural parts. Use java.net.URL for it.

2. Encode each structural part properly!

3. Use IDN.toASCII(putDomainNameHere) to Punycode encode the host name!

4. Use java.net.URI.toASCIIString() to percent-encode, NFC encoded unicode - (better would be NFKC!).

Find more here: https://stackoverflow.com/a/49796882/1485527


I develop a library that serves this purpose: galimatias. It parses URL the same way web browsers do. That is, if a URL works in a browser, it will be correctly parsed by galimatias.

In this case:

// Parse
io.mola.galimatias.URL.parse(
    "http://search.barnesandnoble.com/booksearch/first book.pdf"
).toString()

Will give you: http://search.barnesandnoble.com/booksearch/first%20book.pdf. Of course this is the simplest case, but it'll work with anything, way beyond java.net.URI.

You can check it out at: https://github.com/smola/galimatias


a solution i developed and much more stable than any other:

public class URLParamEncoder {

    public static String encode(String input) {
        StringBuilder resultStr = new StringBuilder();
        for (char ch : input.toCharArray()) {
            if (isUnsafe(ch)) {
                resultStr.append('%');
                resultStr.append(toHex(ch / 16));
                resultStr.append(toHex(ch % 16));
            } else {
                resultStr.append(ch);
            }
        }
        return resultStr.toString();
    }

    private static char toHex(int ch) {
        return (char) (ch < 10 ? '0' + ch : 'A' + ch - 10);
    }

    private static boolean isUnsafe(char ch) {
        if (ch > 128 || ch < 0)
            return true;
        return " %$&+,/:;=?@<>#%".indexOf(ch) >= 0;
    }

}

Please be warned that most of the answers above are INCORRECT.

The URLEncoder class, despite is name, is NOT what needs to be here. It's unfortunate that Sun named this class so annoyingly. URLEncoder is meant for passing data as parameters, not for encoding the URL itself.

In other words, "http://search.barnesandnoble.com/booksearch/first book.pdf" is the URL. Parameters would be, for example, "http://search.barnesandnoble.com/booksearch/first book.pdf?parameter1=this&param2=that". The parameters are what you would use URLEncoder for.

The following two examples highlights the differences between the two.

The following produces the wrong parameters, according to the HTTP standard. Note the ampersand (&) and plus (+) are encoded incorrectly.

uri = new URI("http", null, "www.google.com", 80, 
"/help/me/book name+me/", "MY CRZY QUERY! +&+ :)", null);

// URI: http://www.google.com:80/help/me/book%20name+me/?MY%20CRZY%20QUERY!%20+&+%20:)

The following will produce the correct parameters, with the query properly encoded. Note the spaces, ampersands, and plus marks.

uri = new URI("http", null, "www.google.com", 80, "/help/me/book name+me/", URLEncoder.encode("MY CRZY QUERY! +&+ :)", "UTF-8"), null);

// URI: http://www.google.com:80/help/me/book%20name+me/?MY+CRZY+QUERY%2521+%252B%2526%252B+%253A%2529

If anybody doesn't want to add a dependency to their project, these functions may be helpful.

We pass the 'path' part of our URL into here. You probably don't want to pass the full URL in as a parameter (query strings need different escapes, etc).

/**
 * Percent-encodes a string so it's suitable for use in a URL Path (not a query string / form encode, which uses + for spaces, etc)
 */
public static String percentEncode(String encodeMe) {
    if (encodeMe == null) {
        return "";
    }
    String encoded = encodeMe.replace("%", "%25");
    encoded = encoded.replace(" ", "%20");
    encoded = encoded.replace("!", "%21");
    encoded = encoded.replace("#", "%23");
    encoded = encoded.replace("$", "%24");
    encoded = encoded.replace("&", "%26");
    encoded = encoded.replace("'", "%27");
    encoded = encoded.replace("(", "%28");
    encoded = encoded.replace(")", "%29");
    encoded = encoded.replace("*", "%2A");
    encoded = encoded.replace("+", "%2B");
    encoded = encoded.replace(",", "%2C");
    encoded = encoded.replace("/", "%2F");
    encoded = encoded.replace(":", "%3A");
    encoded = encoded.replace(";", "%3B");
    encoded = encoded.replace("=", "%3D");
    encoded = encoded.replace("?", "%3F");
    encoded = encoded.replace("@", "%40");
    encoded = encoded.replace("[", "%5B");
    encoded = encoded.replace("]", "%5D");
    return encoded;
}

/**
 * Percent-decodes a string, such as used in a URL Path (not a query string / form encode, which uses + for spaces, etc)
 */
public static String percentDecode(String encodeMe) {
    if (encodeMe == null) {
        return "";
    }
    String decoded = encodeMe.replace("%21", "!");
    decoded = decoded.replace("%20", " ");
    decoded = decoded.replace("%23", "#");
    decoded = decoded.replace("%24", "$");
    decoded = decoded.replace("%26", "&");
    decoded = decoded.replace("%27", "'");
    decoded = decoded.replace("%28", "(");
    decoded = decoded.replace("%29", ")");
    decoded = decoded.replace("%2A", "*");
    decoded = decoded.replace("%2B", "+");
    decoded = decoded.replace("%2C", ",");
    decoded = decoded.replace("%2F", "/");
    decoded = decoded.replace("%3A", ":");
    decoded = decoded.replace("%3B", ";");
    decoded = decoded.replace("%3D", "=");
    decoded = decoded.replace("%3F", "?");
    decoded = decoded.replace("%40", "@");
    decoded = decoded.replace("%5B", "[");
    decoded = decoded.replace("%5D", "]");
    decoded = decoded.replace("%25", "%");
    return decoded;
}

And tests:

@Test
public void testPercentEncode_Decode() {
    assertEquals("", percentDecode(percentEncode(null)));
    assertEquals("", percentDecode(percentEncode("")));

    assertEquals("!", percentDecode(percentEncode("!")));
    assertEquals("#", percentDecode(percentEncode("#")));
    assertEquals("$", percentDecode(percentEncode("$")));
    assertEquals("@", percentDecode(percentEncode("@")));
    assertEquals("&", percentDecode(percentEncode("&")));
    assertEquals("'", percentDecode(percentEncode("'")));
    assertEquals("(", percentDecode(percentEncode("(")));
    assertEquals(")", percentDecode(percentEncode(")")));
    assertEquals("*", percentDecode(percentEncode("*")));
    assertEquals("+", percentDecode(percentEncode("+")));
    assertEquals(",", percentDecode(percentEncode(",")));
    assertEquals("/", percentDecode(percentEncode("/")));
    assertEquals(":", percentDecode(percentEncode(":")));
    assertEquals(";", percentDecode(percentEncode(";")));

    assertEquals("=", percentDecode(percentEncode("=")));
    assertEquals("?", percentDecode(percentEncode("?")));
    assertEquals("@", percentDecode(percentEncode("@")));
    assertEquals("[", percentDecode(percentEncode("[")));
    assertEquals("]", percentDecode(percentEncode("]")));
    assertEquals(" ", percentDecode(percentEncode(" ")));

    // Get a little complex
    assertEquals("[]]", percentDecode(percentEncode("[]]")));
    assertEquals("a=d%*", percentDecode(percentEncode("a=d%*")));
    assertEquals(")  (", percentDecode(percentEncode(")  (")));
    assertEquals("%21%20%2A%20%27%20%28%20%25%20%29%20%3B%20%3A%20%40%20%26%20%3D%20%2B%20%24%20%2C%20%2F%20%3F%20%23%20%5B%20%5D%20%25",
                    percentEncode("! * ' ( % ) ; : @ & = + $ , / ? # [ ] %"));
    assertEquals("! * ' ( % ) ; : @ & = + $ , / ? # [ ] %", percentDecode(
                    "%21%20%2A%20%27%20%28%20%25%20%29%20%3B%20%3A%20%40%20%26%20%3D%20%2B%20%24%20%2C%20%2F%20%3F%20%23%20%5B%20%5D%20%25"));

    assertEquals("%23456", percentDecode(percentEncode("%23456")));

}

I'm going to add one suggestion here aimed at Android users. You can do this which avoids having to get any external libraries. Also, all the search/replace characters solutions suggested in some of the answers above are perilous and should be avoided.

Give this a try:

String urlStr = "http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4";
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
url = uri.toURL();

You can see that in this particular URL, I need to have those spaces encoded so that I can use it for a request.

This takes advantage of a couple features available to you in Android classes. First, the URL class can break a url into its proper components so there is no need for you to do any string search/replace work. Secondly, this approach takes advantage of the URI class feature of properly escaping components when you construct a URI via components rather than from a single string.

The beauty of this approach is that you can take any valid url string and have it work without needing any special knowledge of it yourself.


How about:

public String UrlEncode(String in_) {

String retVal = "";

try {
    retVal = URLEncoder.encode(in_, "UTF8");
} catch (UnsupportedEncodingException ex) {
    Log.get().exception(Log.Level.Error, "urlEncode ", ex);
}

return retVal;

}


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to http

Access blocked by CORS policy: Response to preflight request doesn't pass access control check Axios Delete request with body and headers? Read response headers from API response - Angular 5 + TypeScript Android 8: Cleartext HTTP traffic not permitted Angular 4 HttpClient Query Parameters Load json from local file with http.get() in angular 2 Angular 2: How to access an HTTP response body? What is HTTP "Host" header? Golang read request body Angular 2 - Checking for server errors from subscribe

Examples related to urlencode

How to URL encode in Python 3? How to encode a URL in Swift Swift - encode URL Java URL encoding of query string parameters Objective-C and Swift URL encoding How to URL encode a string in Ruby Should I URL-encode POST data? How to set the "Content-Type ... charset" in the request header using a HTML link Parsing GET request parameters in a URL that contains another URL How to urlencode a querystring in Python?