Does anyone know how to convert a string from ISO-8859-1 to UTF-8 and back in Java?
I'm getting a string from the web and saving it in the RMS (J2ME), but I want to preserve the special chars and get the string from the RMS but with the ISO-8859-1 encoding. How do I do this?
This question is related to
java
java-me
utf-8
character-encoding
iso-8859-1
The easiest way to convert an ISO-8859-1 string to UTF-8 string.
private static String convertIsoToUTF8(String example) throws UnsupportedEncodingException {
return new String(example.getBytes("ISO-8859-1"), "utf-8");
}
If we want to convert an UTF-8 string to ISO-8859-1 string.
private static String convertUTF8ToISO(String example) throws UnsupportedEncodingException {
return new String(example.getBytes("utf-8"), "ISO-8859-1");
}
Moreover, a method that converts an ISO-8859-1 string to UTF-8 string without using the constructor of class String.
public static String convertISO_to_UTF8_personal(String strISO_8859_1) {
String res = "";
int i = 0;
for (i = 0; i < strISO_8859_1.length() - 1; i++) {
char ch = strISO_8859_1.charAt(i);
char chNext = strISO_8859_1.charAt(i + 1);
if (ch <= 127) {
res += ch;
} else if (ch == 194 && chNext >= 128 && chNext <= 191) {
res += chNext;
} else if(ch == 195 && chNext >= 128 && chNext <= 191){
int resNum = chNext + 64;
res += (char) resNum;
} else if(ch == 194){
res += (char) 173;
} else if(ch == 195){
res += (char) 224;
}
}
char ch = strISO_8859_1.charAt(i);
if (ch <= 127 ){
res += ch;
}
return res;
}
}
That method is based on enconding utf-8 to iso-8859-1 of this website. Encoding utf-8 to iso-8859-1
Regex can also be good and be used effectively (Replaces all UTF-8 characters not covered in ISO-8859-1
with space):
String input = "€Tes¶ti©ng [§] al€l o€f i¶t _ - À ÆÑ with some 9umbers as"
+ " w2921**#$%!@# well Ü, or ü, is a chaŒracte?";
String output = input.replaceAll("[^\\u0020-\\u007e\\u00a0-\\u00ff]", " ");
System.out.println("Input = " + input);
System.out.println("Output = " + output);
If you have a String
, you can do that:
String s = "test";
try {
s.getBytes("UTF-8");
} catch(UnsupportedEncodingException uee) {
uee.printStackTrace();
}
If you have a 'broken' String
, you did something wrong, converting a String
to a String
in another encoding is defenetely not the way to go! You can convert a String
to a byte[]
and vice-versa (given an encoding). In Java String
s are AFAIK encoded with UTF-16
but that's an implementation detail.
Say you have a InputStream
, you can read in a byte[]
and then convert that to a String
using
byte[] bs = ...;
String s;
try {
s = new String(bs, encoding);
} catch(UnsupportedEncodingException uee) {
uee.printStackTrace();
}
or even better (thanks to erickson) use InputStreamReader
like that:
InputStreamReader isr;
try {
isr = new InputStreamReader(inputStream, encoding);
} catch(UnsupportedEncodingException uee) {
uee.printStackTrace();
}
Apache Commons IO Charsets class can come in handy:
String utf8String = new String(org.apache.commons.io.Charsets.ISO_8859_1.encode(latinString).array())
Here is an easy way with String output (I created a method to do this):
public static String (String input){
String output = "";
try {
/* From ISO-8859-1 to UTF-8 */
output = new String(input.getBytes("ISO-8859-1"), "UTF-8");
/* From UTF-8 to ISO-8859-1 */
output = new String(input.getBytes("UTF-8"), "ISO-8859-1");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return output;
}
// Example
input = "Música";
output = "Música";
Which worked for me: ("üzüm baglari" is the correct written in Turkish)
Convert ISO-8859-1 to UTF-8:
String encodedWithISO88591 = "üzüm baÄları";
String decodedToUTF8 = new String(encodedWithISO88591.getBytes("ISO-8859-1"), "UTF-8");
//Result, decodedToUTF8 --> "üzüm baglari"
Convert UTF-8 to ISO-8859-1
String encodedWithUTF8 = "üzüm baglari";
String decodedToISO88591 = new String(encodedWithUTF8.getBytes("UTF-8"), "ISO-8859-1");
//Result, decodedToISO88591 --> "üzüm baÄları"
Here is a function to convert UNICODE (ISO_8859_1) to UTF-8
public static String String_ISO_8859_1To_UTF_8(String strISO_8859_1) {
final StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < strISO_8859_1.length(); i++) {
final char ch = strISO_8859_1.charAt(i);
if (ch <= 127)
{
stringBuilder.append(ch);
}
else
{
stringBuilder.append(String.format("%02x", (int)ch));
}
}
String s = stringBuilder.toString();
int len = s.length();
byte[] data = new byte[len / 2];
for (int i = 0; i < len; i += 2) {
data[i / 2] = (byte) ((Character.digit(s.charAt(i), 16) << 4)
+ Character.digit(s.charAt(i+1), 16));
}
String strUTF_8 =new String(data, StandardCharsets.UTF_8);
return strUTF_8;
}
TEST
String strA_ISO_8859_1_i = new String("??????".getBytes(StandardCharsets.UTF_8), StandardCharsets.ISO_8859_1);
System.out.println("ISO_8859_1 strA est = "+ strA_ISO_8859_1_i + "\n String_ISO_8859_1To_UTF_8 = " + String_ISO_8859_1To_UTF_8(strA_ISO_8859_1_i));
RESULT
ISO_8859_1 strA est = اÙغÙا٠String_ISO_8859_1To_UTF_8 = ??????
Source: Stackoverflow.com