[java] How do I convert between ISO-8859-1 and UTF-8 in Java?

Does anyone know how to convert a string from ISO-8859-1 to UTF-8 and back in Java?

I'm getting a string from the web and saving it in the RMS (J2ME), but I want to preserve the special chars and get the string from the RMS but with the ISO-8859-1 encoding. How do I do this?

This question is related to java java-me utf-8 character-encoding iso-8859-1

The answer is


The easiest way to convert an ISO-8859-1 string to UTF-8 string.

private static String convertIsoToUTF8(String example) throws UnsupportedEncodingException {
    return new String(example.getBytes("ISO-8859-1"), "utf-8");
}

If we want to convert an UTF-8 string to ISO-8859-1 string.

private static String convertUTF8ToISO(String example) throws UnsupportedEncodingException {
    return new String(example.getBytes("utf-8"), "ISO-8859-1");
}

Moreover, a method that converts an ISO-8859-1 string to UTF-8 string without using the constructor of class String.

public static String convertISO_to_UTF8_personal(String strISO_8859_1) {
    String res = "";
    int i = 0;
    for (i = 0; i < strISO_8859_1.length() - 1; i++) {
        char ch = strISO_8859_1.charAt(i);
        char chNext = strISO_8859_1.charAt(i + 1);
        if (ch <= 127) {
            res += ch;
        } else if (ch == 194 && chNext >= 128 && chNext <= 191) {
            res += chNext;
        } else if(ch == 195 && chNext >= 128 && chNext <= 191){
            int resNum = chNext + 64;
            res += (char) resNum;
        } else if(ch == 194){
            res += (char) 173;
        } else if(ch == 195){
            res += (char) 224;
        }
    }
    char ch = strISO_8859_1.charAt(i);
    if (ch <= 127 ){
        res += ch;
    }
    return res;
}

}

That method is based on enconding utf-8 to iso-8859-1 of this website. Encoding utf-8 to iso-8859-1


Regex can also be good and be used effectively (Replaces all UTF-8 characters not covered in ISO-8859-1 with space):

String input = "€Tes¶ti©ng [§] al€l o€f i¶t _ - À ÆÑ with some 9umbers as"
            + " w2921**#$%!@# well Ü, or ü, is a chaŒracte?";
String output = input.replaceAll("[^\\u0020-\\u007e\\u00a0-\\u00ff]", " ");
System.out.println("Input = " + input);
System.out.println("Output = " + output);

If you have a String, you can do that:

String s = "test";
try {
    s.getBytes("UTF-8");
} catch(UnsupportedEncodingException uee) {
    uee.printStackTrace();
}

If you have a 'broken' String, you did something wrong, converting a String to a String in another encoding is defenetely not the way to go! You can convert a String to a byte[] and vice-versa (given an encoding). In Java Strings are AFAIK encoded with UTF-16 but that's an implementation detail.

Say you have a InputStream, you can read in a byte[] and then convert that to a String using

byte[] bs = ...;
String s;
try {
    s = new String(bs, encoding);
} catch(UnsupportedEncodingException uee) {
    uee.printStackTrace();
}

or even better (thanks to erickson) use InputStreamReader like that:

InputStreamReader isr;
try {
     isr = new InputStreamReader(inputStream, encoding);
} catch(UnsupportedEncodingException uee) {
    uee.printStackTrace();
}

Apache Commons IO Charsets class can come in handy:

String utf8String = new String(org.apache.commons.io.Charsets.ISO_8859_1.encode(latinString).array())

Here is an easy way with String output (I created a method to do this):

public static String (String input){
    String output = "";
    try {
        /* From ISO-8859-1 to UTF-8 */
        output = new String(input.getBytes("ISO-8859-1"), "UTF-8");
        /* From UTF-8 to ISO-8859-1 */
        output = new String(input.getBytes("UTF-8"), "ISO-8859-1");
    } catch (UnsupportedEncodingException e) {
        e.printStackTrace();
    }
    return output;
}
// Example
input = "Música";
output = "Música";

Which worked for me: ("üzüm baglari" is the correct written in Turkish)

Convert ISO-8859-1 to UTF-8:

String encodedWithISO88591 = "üzüm baÄları";
String decodedToUTF8 = new String(encodedWithISO88591.getBytes("ISO-8859-1"), "UTF-8");
//Result, decodedToUTF8 --> "üzüm baglari"

Convert UTF-8 to ISO-8859-1

String encodedWithUTF8 = "üzüm baglari";
String decodedToISO88591 = new String(encodedWithUTF8.getBytes("UTF-8"), "ISO-8859-1");
//Result, decodedToISO88591 --> "üzüm baÄları"

Here is a function to convert UNICODE (ISO_8859_1) to UTF-8

public static String String_ISO_8859_1To_UTF_8(String strISO_8859_1) {
final StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < strISO_8859_1.length(); i++) {
  final char ch = strISO_8859_1.charAt(i);
  if (ch <= 127) 
  {
      stringBuilder.append(ch);
  }
  else 
  {
      stringBuilder.append(String.format("%02x", (int)ch));
  }
}
String s = stringBuilder.toString();
int len = s.length();
byte[] data = new byte[len / 2];
for (int i = 0; i < len; i += 2) {
    data[i / 2] = (byte) ((Character.digit(s.charAt(i), 16) << 4)
                         + Character.digit(s.charAt(i+1), 16));
}
String strUTF_8 =new String(data, StandardCharsets.UTF_8);
return strUTF_8;
}

TEST

String strA_ISO_8859_1_i = new String("??????".getBytes(StandardCharsets.UTF_8), StandardCharsets.ISO_8859_1);

System.out.println("ISO_8859_1 strA est = "+ strA_ISO_8859_1_i + "\n String_ISO_8859_1To_UTF_8 = " + String_ISO_8859_1To_UTF_8(strA_ISO_8859_1_i));

RESULT

ISO_8859_1 strA est = اÙغÙا٠String_ISO_8859_1To_UTF_8 = ??????


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to java-me

Extract source code from .jar file Difference between volatile and synchronized in Java Difference between Java SE/EE/ME? How to get access to raw resources that I put in res folder? J2ME/Android/BlackBerry - driving directions, route between two locations Convert a JSON string to object in Java ME? How Long Does it Take to Learn Java for a Complete Newbie? How do I convert between ISO-8859-1 and UTF-8 in Java?

Examples related to utf-8

error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Changing PowerShell's default output encoding to UTF-8 'Malformed UTF-8 characters, possibly incorrectly encoded' in Laravel Encoding Error in Panda read_csv Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? what is <meta charset="utf-8">? Pandas df.to_csv("file.csv" encode="utf-8") still gives trash characters for minus sign UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) Android Studio : unmappable character for encoding UTF-8

Examples related to character-encoding

Changing PowerShell's default output encoding to UTF-8 JsonParseException : Illegal unquoted character ((CTRL-CHAR, code 10) Change the encoding of a file in Visual Studio Code What is the difference between utf8mb4 and utf8 charsets in MySQL? How to open html file? All inclusive Charset to avoid "java.nio.charset.MalformedInputException: Input length = 1"? UTF-8 output from PowerShell ERROR 1115 (42000): Unknown character set: 'utf8mb4' "for line in..." results in UnicodeDecodeError: 'utf-8' codec can't decode byte How to make php display \t \n as tab and new line instead of characters

Examples related to iso-8859-1

What is the difference between UTF-8 and ISO-8859-1? C# Convert string from UTF-8 to ISO-8859-1 (Latin1) H HTML encoding issues - "Â" character showing up instead of "&nbsp;" Converting UTF-8 to ISO-8859-1 in Java - how to keep it as single byte How do I convert between ISO-8859-1 and UTF-8 in Java? Convert utf8-characters to iso-88591 and back in PHP How do I write out a text file in C# with a code page other than UTF-8? Setting the character encoding in form submit for Internet Explorer