[java] Java - Convert String to valid URI object

I am trying to get a java.net.URI object from a String. The string has some characters which will need to be replaced by their percentage escape sequences. But when I use URLEncoder to encode the String with UTF-8 encoding, even the / are replaced with their escape sequences.

How can I get a valid encoded URL from a String object?

http://www.google.com?q=a b gives http%3A%2F%2www.google.com... whereas I want the output to be http://www.google.com?q=a%20b

Can someone please tell me how to achieve this.

I am trying to do this in an Android app. So I have access to a limited number of libraries.

This question is related to java android encoding utf-8

The answer is


If you don't like libraries, how about this?

Note that you should not use this function on the whole URL, instead you should use this on the components...e.g. just the "a b" component, as you build up the URL - otherwise the computer won't know what characters are supposed to have a special meaning and which ones are supposed to have a literal meaning.

/** Converts a string into something you can safely insert into a URL. */
public static String encodeURIcomponent(String s)
{
    StringBuilder o = new StringBuilder();
    for (char ch : s.toCharArray()) {
        if (isUnsafe(ch)) {
            o.append('%');
            o.append(toHex(ch / 16));
            o.append(toHex(ch % 16));
        }
        else o.append(ch);
    }
    return o.toString();
}

private static char toHex(int ch)
{
    return (char)(ch < 10 ? '0' + ch : 'A' + ch - 10);
}

private static boolean isUnsafe(char ch)
{
    if (ch > 128 || ch < 0)
        return true;
    return " %$&+,/:;=?@<>#%".indexOf(ch) >= 0;
}

I ended up using the httpclient-4.3.6:

import org.apache.http.client.utils.URIBuilder;
public static void main (String [] args) {
    URIBuilder uri = new URIBuilder();
    uri.setScheme("http")
    .setHost("www.example.com")
    .setPath("/somepage.php")
    .setParameter("username", "Hello Günter")
    .setParameter("p1", "parameter 1");
    System.out.println(uri.toString());
}

Output will be:

http://www.example.com/somepage.php?username=Hello+G%C3%BCnter&p1=paramter+1

I had similar problems for one of my projects to create a URI object from a string. I couldn't find any clean solution either. Here's what I came up with :

public static URI encodeURL(String url) throws MalformedURLException, URISyntaxException  
{
    URI uriFormatted = null; 

    URL urlLink = new URL(url);
    uriFormatted = new URI("http", urlLink.getHost(), urlLink.getPath(), urlLink.getQuery(), urlLink.getRef());

    return uriFormatted;
}

You can use the following URI constructor instead to specify a port if needed:

URI uri = new URI(scheme, userInfo, host, port, path, query, fragment);

Well I tried using

String converted = URLDecoder.decode("toconvert","UTF-8");

I hope this is what you were actually looking for?


The java.net blog had a class the other day that might have done what you want (but it is down right now so I cannot check).

This code here could probably be modified to do what you want:

http://svn.apache.org/repos/asf/incubator/shindig/trunk/java/common/src/main/java/org/apache/shindig/common/uri/UriBuilder.java

Here is the one I was thinking of from java.net: https://urlencodedquerystring.dev.java.net/


Even if this is an old post with an already accepted answer, I post my alternative answer because it works well for the present issue and it seems nobody mentioned this method.

With the java.net.URI library:

URI uri = URI.create(URLString);

And if you want a URL-formatted string corresponding to it:

String validURLString = uri.toASCIIString();

Unlike many other methods (e.g. java.net.URLEncoder) this one replaces only unsafe ASCII characters (like ç, é...).


In the above example, if URLString is the following String:

"http://www.domain.com/façon+word"

the resulting validURLString will be:

"http://www.domain.com/fa%C3%A7on+word"

which is a well-formatted URL.


Android has always had the Uri class as part of the SDK: http://developer.android.com/reference/android/net/Uri.html

You can simply do something like:

String requestURL = String.format("http://www.example.com/?a=%s&b=%s", Uri.encode("foo bar"), Uri.encode("100% fubar'd"));

Or perhaps you could use this class:

http://developer.android.com/reference/java/net/URLEncoder.html

Which is present in Android since API level 1.

Annoyingly however, it treats spaces specially (replacing them with + instead of %20). To get round this we simply use this fragment:

URLEncoder.encode(value, "UTF-8").replace("+", "%20");


I'm going to add one suggestion here aimed at Android users. You can do this which avoids having to get any external libraries. Also, all the search/replace characters solutions suggested in some of the answers above are perilous and should be avoided.

Give this a try:

String urlStr = "http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4";
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
url = uri.toURL();

You can see that in this particular URL, I need to have those spaces encoded so that I can use it for a request.

This takes advantage of a couple features available to you in Android classes. First, the URL class can break a url into its proper components so there is no need for you to do any string search/replace work. Secondly, this approach takes advantage of the URI class feature of properly escaping components when you construct a URI via components rather than from a single string.

The beauty of this approach is that you can take any valid url string and have it work without needing any special knowledge of it yourself.


You can use the multi-argument constructors of the URI class. From the URI javadoc:

The multi-argument constructors quote illegal characters as required by the components in which they appear. The percent character ('%') is always quoted by these constructors. Any other characters are preserved.

So if you use

URI uri = new URI("http", "www.google.com?q=a b");

Then you get http:www.google.com?q=a%20b which isn't quite right, but it's a little closer.

If you know that your string will not have URL fragments (e.g. http://example.com/page#anchor), then you can use the following code to get what you want:

String s = "http://www.google.com?q=a b";
String[] parts = s.split(":",2);
URI uri = new URI(parts[0], parts[1], null);

To be safe, you should scan the string for # characters, but this should get you started.


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to android

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How to implement a simple scenario the OO way My eclipse won't open, i download the bundle pack it keeps saying error log getting " (1) no such column: _id10 " error java doesn't run if structure inside of onclick listener Cannot retrieve string(s) from preferences (settings) strange error in my Animation Drawable how to put image in a bundle and pass it to another activity FragmentActivity to Fragment A failure occurred while executing com.android.build.gradle.internal.tasks

Examples related to encoding

How to check encoding of a CSV file UnicodeEncodeError: 'ascii' codec can't encode character at special name Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? The character encoding of the plain text document was not declared - mootool script UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) How to encode text to base64 in python UTF-8 output from PowerShell Set Encoding of File to UTF8 With BOM in Sublime Text 3 Replace non-ASCII characters with a single space

Examples related to utf-8

error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Changing PowerShell's default output encoding to UTF-8 'Malformed UTF-8 characters, possibly incorrectly encoded' in Laravel Encoding Error in Panda read_csv Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? what is <meta charset="utf-8">? Pandas df.to_csv("file.csv" encode="utf-8") still gives trash characters for minus sign UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) Android Studio : unmappable character for encoding UTF-8