[java] string decode utf-8

How can I decode an utf-8 string with android? I tried with this commands but output is the same of input:

URLDecoder.decode("hello&//à", "UTF-8");

new String("hello&//à", "UTF-8");

EntityUtils.toString("hello&//à", "utf-8");

This question is related to java android

The answer is


A string needs no encoding. It is simply a sequence of Unicode characters.

You need to encode when you want to turn a String into a sequence of bytes. The charset the you choose (UTF-8, cp1255, etc.) determines the Character->Byte mapping. Note that a character is not necessarily translated into a single byte. In most charsets, most Unicode characters are translated to at least two bytes.

Encoding of a String is carried out by:

String s1 = "some text";
byte[] bytes = s1.getBytes("UTF-8"); // Charset to encode into

You need to decode when you have ? sequence of bytes and you want to turn them into a String. When y?u d? that you need to specify, again, the charset with which the byt?s were originally encoded (otherwise you'll end up with garbl?d t?xt).

Decoding:

String s2 = new String(bytes, "UTF-8"); // Charset with which bytes were encoded 

If you want to understand this better, a great text is "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"


Try looking at decode string encoded in utf-8 format in android but it doesn't look like your string is encoded with anything particular. What do you think the output should be?


the core functions are getBytes(String charset) and new String(byte[] data). you can use these functions to do UTF-8 decoding.

UTF-8 decoding actually is a string to string conversion, the intermediate buffer is a byte array. since the target is an UTF-8 string, so the only parameter for new String() is the byte array, which calling is equal to new String(bytes, "UTF-8")

Then the key is the parameter for input encoded string to get internal byte array, which you should know beforehand. If you don't, guess the most possible one, "ISO-8859-1" is a good guess for English user.

The decoding sentence should be

String decoded = new String(encoded.getBytes("ISO-8859-1"));