How do I convert between ISO-8859-1 and UTF-8 in Java

Question

Does anyone know how to convert a string from ISO-8859-1 to UTF-8 and back in Java   I m getting a string from the web and saving it in the RMS  J2ME   but I want to preserve the special chars and get the string from the RMS but with the ISO-8859-1 encoding  How do I do this

User · Answer

The easiest way to convert an ISO-8859-1 string to UTF-8 string.

private static String convertIsoToUTF8(String example) throws UnsupportedEncodingException {
    return new String(example.getBytes("ISO-8859-1"), "utf-8");
}

If we want to convert an UTF-8 string to ISO-8859-1 string.

private static String convertUTF8ToISO(String example) throws UnsupportedEncodingException {
    return new String(example.getBytes("utf-8"), "ISO-8859-1");
}

Moreover, a method that converts an ISO-8859-1 string to UTF-8 string without using the constructor of class String.

public static String convertISO_to_UTF8_personal(String strISO_8859_1) {
    String res = "";
    int i = 0;
    for (i = 0; i < strISO_8859_1.length() - 1; i++) {
        char ch = strISO_8859_1.charAt(i);
        char chNext = strISO_8859_1.charAt(i + 1);
        if (ch <= 127) {
            res += ch;
        } else if (ch == 194 && chNext >= 128 && chNext <= 191) {
            res += chNext;
        } else if(ch == 195 && chNext >= 128 && chNext <= 191){
            int resNum = chNext + 64;
            res += (char) resNum;
        } else if(ch == 194){
            res += (char) 173;
        } else if(ch == 195){
            res += (char) 224;
        }
    }
    char ch = strISO_8859_1.charAt(i);
    if (ch <= 127 ){
        res += ch;
    }
    return res;
}

}

That method is based on enconding utf-8 to iso-8859-1 of this website. Encoding utf-8 to iso-8859-1

User · Answer

If you have a String  you can do that   String s    test   try       s getBytes  UTF-8      catch UnsupportedEncodingException uee        uee printStackTrace        If you have a  broken  String  you did something wrong  converting a String to a String in another encoding is defenetely not the way to go  You can convert a String to a byte   and vice-versa  given an encoding   In Java Strings are AFAIK encoded with UTF-16 but that s an implementation detail   Say you have a InputStream  you can read in a byte   and then convert that to a String using  byte   bs        String s  try       s   new String bs  encoding     catch UnsupportedEncodingException uee        uee printStackTrace        or even better  thanks to erickson  use InputStreamReader like that   InputStreamReader isr  try        isr   new InputStreamReader inputStream  encoding     catch UnsupportedEncodingException uee        uee printStackTrace

User · Answer

Here is a function to convert UNICODE  ISO 8859 1  to UTF-8  public static String String ISO 8859 1To UTF 8 String strISO 8859 1    final StringBuilder stringBuilder   new StringBuilder    for  int i   0  i  lt  strISO 8859 1 length    i        final char ch   strISO 8859 1 charAt i     if  ch  lt   127             stringBuilder append ch         else            stringBuilder append String format   02x    int ch          String s   stringBuilder toString    int len   s length    byte   data   new byte len   2   for  int i   0  i  lt  len  i    2        data i   2     byte    Character digit s charAt i   16   lt  lt  4                             Character digit s charAt i 1   16      String strUTF 8  new String data  StandardCharsets UTF 8   return strUTF 8      TEST  String strA ISO 8859 1 i   new String          getBytes StandardCharsets UTF 8   StandardCharsets ISO 8859 1    System out println  ISO 8859 1 strA est      strA ISO 8859 1 i     n String ISO 8859 1To UTF 8       String ISO 8859 1To UTF 8 strA ISO 8859 1 i      RESULT     ISO 8859 1 strA est                      String ISO 8859 1To UTF 8

User · Answer

Apache Commons IO Charsets class can come in handy   String utf8String   new String org apache commons io Charsets ISO 8859 1 encode latinString  array

User · Answer

Which worked for me      z  m baglari  is the correct written in Turkish   Convert ISO-8859-1 to UTF-8   String encodedWithISO88591        z    m ba  lar       String decodedToUTF8   new String encodedWithISO88591 getBytes  ISO-8859-1     UTF-8      Result  decodedToUTF8 -- gt     z  m baglari    Convert UTF-8 to ISO-8859-1  String encodedWithUTF8      z  m baglari   String decodedToISO88591   new String encodedWithUTF8 getBytes  UTF-8     ISO-8859-1      Result  decodedToISO88591 -- gt       z    m ba  lar

User · Answer

Here is an easy way with String output  I created a method to do this    public static String  String input       String output           try              From ISO-8859-1 to UTF-8            output   new String input getBytes  ISO-8859-1     UTF-8               From UTF-8 to ISO-8859-1            output   new String input getBytes  UTF-8     ISO-8859-1          catch  UnsupportedEncodingException e            e printStackTrace              return output       Example input    M  sica   output    M    sica

User · Answer

In general  you can t do this  UTF-8 is capable of encoding any Unicode code point  ISO-8859-1 can handle only a tiny fraction of them  So  transcoding from ISO-8859-1 to UTF-8 is no problem  Going backwards from UTF-8 to ISO-8859-1 will cause  replacement characters     xFFFD   to appear in your text when unsupported characters are found   To transcode text   byte   latin1       byte   utf8   new String latin1   ISO-8859-1   getBytes  UTF-8      or   byte   utf8       byte   latin1   new String utf8   UTF-8   getBytes  ISO-8859-1      You can exercise more control by using the lower-level Charset APIs  For example  you can raise an exception when an un-encodable character is found  or use a different character for replacement text

User · Answer

Regex can also be good and be used effectively  Replaces all UTF-8 characters not covered in ISO-8859-1 with space    String input       Tes  ti  ng      al   l o   f i  t   -         with some 9umbers as                  w2921         well     or     is a cha  racte    String output   input replaceAll      u0020-  u007e  u00a0-  u00ff          System out println  Input       input   System out println  Output       output

[java] How do I convert between ISO-8859-1 and UTF-8 in Java?

Examples related to java

Examples related to java-me

Examples related to utf-8

Examples related to character-encoding

Examples related to iso-8859-1