[java] Setting the default Java character encoding

How do I properly set the default character encoding used by the JVM (1.5.x) programmatically?

I have read that -Dfile.encoding=whatever used to be the way to go for older JVMs. I don't have that luxury for reasons I wont get into.

I have tried:

System.setProperty("file.encoding", "UTF-8");

And the property gets set, but it doesn't seem to cause the final getBytes call below to use UTF8:

System.setProperty("file.encoding", "UTF-8");

byte inbytes[] = new byte[1024];

FileInputStream fis = new FileInputStream("response.txt");
fis.read(inbytes);
FileOutputStream fos = new FileOutputStream("response-2.txt");
String in = new String(inbytes, "UTF8");
fos.write(in.getBytes());

This question is related to java utf-8 character-encoding

The answer is


I think a better approach than setting the platform's default character set, especially as you seem to have restrictions on affecting the application deployment, let alone the platform, is to call the much safer String.getBytes("charsetName"). That way your application is not dependent on things beyond its control.

I personally feel that String.getBytes() should be deprecated, as it has caused serious problems in a number of cases I have seen, where the developer did not account for the default charset possibly changing.


Not clear on what you do and don't have control over at this point. If you can interpose a different OutputStream class on the destination file, you could use a subtype of OutputStream which converts Strings to bytes under a charset you define, say UTF-8 by default. If modified UTF-8 is suffcient for your needs, you can use DataOutputStream.writeUTF(String):

byte inbytes[] = new byte[1024];
FileInputStream fis = new FileInputStream("response.txt");
fis.read(inbytes);
String in = new String(inbytes, "UTF8");
DataOutputStream out = new DataOutputStream(new FileOutputStream("response-2.txt"));
out.writeUTF(in); // no getBytes() here

If this approach is not feasible, it may help if you clarify here exactly what you can and can't control in terms of data flow and execution environment (though I know that's sometimes easier said than determined). Good luck.


I think a better approach than setting the platform's default character set, especially as you seem to have restrictions on affecting the application deployment, let alone the platform, is to call the much safer String.getBytes("charsetName"). That way your application is not dependent on things beyond its control.

I personally feel that String.getBytes() should be deprecated, as it has caused serious problems in a number of cases I have seen, where the developer did not account for the default charset possibly changing.


Following @Caspar comment on accepted answer, the preferred way to fix this according to Sun is :

"change the locale of the underlying platform before starting your Java program."

http://bugs.java.com/view_bug.do?bug_id=4163515

For docker see:

http://jaredmarkell.com/docker-and-locales/


I can't answer your original question but I would like to offer you some advice -- don't depend on the JVM's default encoding. It's always best to explicitly specify the desired encoding (i.e. "UTF-8") in your code. That way, you know it will work even across different systems and JVM configurations.


Following @Caspar comment on accepted answer, the preferred way to fix this according to Sun is :

"change the locale of the underlying platform before starting your Java program."

http://bugs.java.com/view_bug.do?bug_id=4163515

For docker see:

http://jaredmarkell.com/docker-and-locales/


I have a hacky way that definitely works!!

System.setProperty("file.encoding","UTF-8");
Field charset = Charset.class.getDeclaredField("defaultCharset");
charset.setAccessible(true);
charset.set(null,null);

This way you are going to trick JVM which would think that charset is not set and make it to set it again to UTF-8, on runtime!


In case you are using Spring Boot and want to pass the argument file.encoding in JVM you have to run it like that:

mvn spring-boot:run -Drun.jvmArguments="-Dfile.encoding=UTF-8"

this was needed for us since we were using JTwig templates and the operating system had ANSI_X3.4-1968 that we found out through System.out.println(System.getProperty("file.encoding"));

Hope this helps someone!


I can't answer your original question but I would like to offer you some advice -- don't depend on the JVM's default encoding. It's always best to explicitly specify the desired encoding (i.e. "UTF-8") in your code. That way, you know it will work even across different systems and JVM configurations.


We were having the same issues. We methodically tried several suggestions from this article (and others) to no avail. We also tried adding the -Dfile.encoding=UTF8 and nothing seemed to be working.

For people that are having this issue, the following article finally helped us track down describes how the locale setting can break unicode/UTF-8 in Java/Tomcat

http://www.jvmhost.com/articles/locale-breaks-unicode-utf-8-java-tomcat

Setting the locale correctly in the ~/.bashrc file worked for us.


From the JVM™ Tool Interface documentation…

Since the command-line cannot always be accessed or modified, for example in embedded VMs or simply VMs launched deep within scripts, a JAVA_TOOL_OPTIONS variable is provided so that agents may be launched in these cases.

By setting the (Windows) environment variable JAVA_TOOL_OPTIONS to -Dfile.encoding=UTF8, the (Java) System property will be set automatically every time a JVM is started. You will know that the parameter has been picked up because the following message will be posted to System.err:

Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8


mvn clean install -Dfile.encoding=UTF-8 -Dmaven.repo.local=/path-to-m2

command worked with exec-maven-plugin to resolve following error while configuring a jenkins task.

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
Error occurred during initialization of VM
java.nio.charset.IllegalCharsetNameException: "UTF-8"
    at java.nio.charset.Charset.checkName(Charset.java:315)
    at java.nio.charset.Charset.lookup2(Charset.java:484)
    at java.nio.charset.Charset.lookup(Charset.java:464)
    at java.nio.charset.Charset.defaultCharset(Charset.java:609)
    at sun.nio.cs.StreamEncoder.forOutputStreamWriter(StreamEncoder.java:56)
    at java.io.OutputStreamWriter.<init>(OutputStreamWriter.java:111)
    at java.io.PrintStream.<init>(PrintStream.java:104)
    at java.io.PrintStream.<init>(PrintStream.java:151)
    at java.lang.System.newPrintStream(System.java:1148)
    at java.lang.System.initializeSystemClass(System.java:1192)

I think a better approach than setting the platform's default character set, especially as you seem to have restrictions on affecting the application deployment, let alone the platform, is to call the much safer String.getBytes("charsetName"). That way your application is not dependent on things beyond its control.

I personally feel that String.getBytes() should be deprecated, as it has caused serious problems in a number of cases I have seen, where the developer did not account for the default charset possibly changing.


Recently I bumped into a local company's Notes 6.5 system and found out the webmail would show unidentifiable characters on a non-Zhongwen localed Windows installation. Have dug for several weeks online, figured it out just few minutes ago:

In Java properties, add the following string to Runtime Parameters

-Dfile.encoding=MS950 -Duser.language=zh -Duser.country=TW -Dsun.jnu.encoding=MS950

UTF-8 setting would not work in this case.


I think a better approach than setting the platform's default character set, especially as you seem to have restrictions on affecting the application deployment, let alone the platform, is to call the much safer String.getBytes("charsetName"). That way your application is not dependent on things beyond its control.

I personally feel that String.getBytes() should be deprecated, as it has caused serious problems in a number of cases I have seen, where the developer did not account for the default charset possibly changing.


Not clear on what you do and don't have control over at this point. If you can interpose a different OutputStream class on the destination file, you could use a subtype of OutputStream which converts Strings to bytes under a charset you define, say UTF-8 by default. If modified UTF-8 is suffcient for your needs, you can use DataOutputStream.writeUTF(String):

byte inbytes[] = new byte[1024];
FileInputStream fis = new FileInputStream("response.txt");
fis.read(inbytes);
String in = new String(inbytes, "UTF8");
DataOutputStream out = new DataOutputStream(new FileOutputStream("response-2.txt"));
out.writeUTF(in); // no getBytes() here

If this approach is not feasible, it may help if you clarify here exactly what you can and can't control in terms of data flow and execution environment (though I know that's sometimes easier said than determined). Good luck.


Recently I bumped into a local company's Notes 6.5 system and found out the webmail would show unidentifiable characters on a non-Zhongwen localed Windows installation. Have dug for several weeks online, figured it out just few minutes ago:

In Java properties, add the following string to Runtime Parameters

-Dfile.encoding=MS950 -Duser.language=zh -Duser.country=TW -Dsun.jnu.encoding=MS950

UTF-8 setting would not work in this case.


My team encountered the same issue in machines with Windows.. then managed to resolve it in two ways:

a) Set enviroment variable (even in Windows system preferences)

JAVA_TOOL_OPTIONS
-Dfile.encoding=UTF8

b) Introduce following snippet to your pom.xml:

 -Dfile.encoding=UTF-8 

WITHIN

 <jvmArguments>
 -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8001
 -Dfile.encoding=UTF-8
 </jvmArguments>

Not clear on what you do and don't have control over at this point. If you can interpose a different OutputStream class on the destination file, you could use a subtype of OutputStream which converts Strings to bytes under a charset you define, say UTF-8 by default. If modified UTF-8 is suffcient for your needs, you can use DataOutputStream.writeUTF(String):

byte inbytes[] = new byte[1024];
FileInputStream fis = new FileInputStream("response.txt");
fis.read(inbytes);
String in = new String(inbytes, "UTF8");
DataOutputStream out = new DataOutputStream(new FileOutputStream("response-2.txt"));
out.writeUTF(in); // no getBytes() here

If this approach is not feasible, it may help if you clarify here exactly what you can and can't control in terms of data flow and execution environment (though I know that's sometimes easier said than determined). Good luck.


Solve this problem in my project. Hope it helps someone.

I use LIBGDX java framework and also had this issue in my android studio project. In Mac OS encoding is correct, but in Windows 10 special characters and symbols and also russian characters show as questions like: ????? and other incorrect symbols.

  1. Change in android studio project settings: File->Settings...->Editor-> File Encodings to UTF-8 in all three fields (Global Encoding, Project Encoding and Default below).

  2. In any java file set:

    System.setProperty("file.encoding","UTF-8");

  3. And for test print debug log:

    System.out.println("My project encoding is : "+ Charset.defaultCharset());


I can't answer your original question but I would like to offer you some advice -- don't depend on the JVM's default encoding. It's always best to explicitly specify the desired encoding (i.e. "UTF-8") in your code. That way, you know it will work even across different systems and JVM configurations.


I'm using Amazon (AWS) Elastic Beanstalk and successfully changed it to UTF-8.

In Elastic Beanstalk, go to Configuration > Software, "Environment properties". Add (name) JAVA_TOOL_OPTIONS with (value) -Dfile.encoding=UTF8

After saving, the environment will restart with the UTF-8 encoding.


We were having the same issues. We methodically tried several suggestions from this article (and others) to no avail. We also tried adding the -Dfile.encoding=UTF8 and nothing seemed to be working.

For people that are having this issue, the following article finally helped us track down describes how the locale setting can break unicode/UTF-8 in Java/Tomcat

http://www.jvmhost.com/articles/locale-breaks-unicode-utf-8-java-tomcat

Setting the locale correctly in the ~/.bashrc file worked for us.


In case you are using Spring Boot and want to pass the argument file.encoding in JVM you have to run it like that:

mvn spring-boot:run -Drun.jvmArguments="-Dfile.encoding=UTF-8"

this was needed for us since we were using JTwig templates and the operating system had ANSI_X3.4-1968 that we found out through System.out.println(System.getProperty("file.encoding"));

Hope this helps someone!


I have tried a lot of things, but the sample code here works perfect. Link

The crux of the code is:

String s = "?? ??? ??? ?? ?????";
String out = new String(s.getBytes("UTF-8"), "ISO-8859-1");

We set there two system properties together and it makes the system take everything into utf8

file.encoding=UTF8
client.encoding.override=UTF-8

I have tried a lot of things, but the sample code here works perfect. Link

The crux of the code is:

String s = "?? ??? ??? ?? ?????";
String out = new String(s.getBytes("UTF-8"), "ISO-8859-1");

I can't answer your original question but I would like to offer you some advice -- don't depend on the JVM's default encoding. It's always best to explicitly specify the desired encoding (i.e. "UTF-8") in your code. That way, you know it will work even across different systems and JVM configurations.


Not clear on what you do and don't have control over at this point. If you can interpose a different OutputStream class on the destination file, you could use a subtype of OutputStream which converts Strings to bytes under a charset you define, say UTF-8 by default. If modified UTF-8 is suffcient for your needs, you can use DataOutputStream.writeUTF(String):

byte inbytes[] = new byte[1024];
FileInputStream fis = new FileInputStream("response.txt");
fis.read(inbytes);
String in = new String(inbytes, "UTF8");
DataOutputStream out = new DataOutputStream(new FileOutputStream("response-2.txt"));
out.writeUTF(in); // no getBytes() here

If this approach is not feasible, it may help if you clarify here exactly what you can and can't control in terms of data flow and execution environment (though I know that's sometimes easier said than determined). Good luck.


Try this :

    new OutputStreamWriter( new FileOutputStream("Your_file_fullpath" ),Charset.forName("UTF8"))

From the JVM™ Tool Interface documentation…

Since the command-line cannot always be accessed or modified, for example in embedded VMs or simply VMs launched deep within scripts, a JAVA_TOOL_OPTIONS variable is provided so that agents may be launched in these cases.

By setting the (Windows) environment variable JAVA_TOOL_OPTIONS to -Dfile.encoding=UTF8, the (Java) System property will be set automatically every time a JVM is started. You will know that the parameter has been picked up because the following message will be posted to System.err:

Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8


mvn clean install -Dfile.encoding=UTF-8 -Dmaven.repo.local=/path-to-m2

command worked with exec-maven-plugin to resolve following error while configuring a jenkins task.

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
Error occurred during initialization of VM
java.nio.charset.IllegalCharsetNameException: "UTF-8"
    at java.nio.charset.Charset.checkName(Charset.java:315)
    at java.nio.charset.Charset.lookup2(Charset.java:484)
    at java.nio.charset.Charset.lookup(Charset.java:464)
    at java.nio.charset.Charset.defaultCharset(Charset.java:609)
    at sun.nio.cs.StreamEncoder.forOutputStreamWriter(StreamEncoder.java:56)
    at java.io.OutputStreamWriter.<init>(OutputStreamWriter.java:111)
    at java.io.PrintStream.<init>(PrintStream.java:104)
    at java.io.PrintStream.<init>(PrintStream.java:151)
    at java.lang.System.newPrintStream(System.java:1148)
    at java.lang.System.initializeSystemClass(System.java:1192)

We set there two system properties together and it makes the system take everything into utf8

file.encoding=UTF8
client.encoding.override=UTF-8

I'm using Amazon (AWS) Elastic Beanstalk and successfully changed it to UTF-8.

In Elastic Beanstalk, go to Configuration > Software, "Environment properties". Add (name) JAVA_TOOL_OPTIONS with (value) -Dfile.encoding=UTF8

After saving, the environment will restart with the UTF-8 encoding.


Try this :

    new OutputStreamWriter( new FileOutputStream("Your_file_fullpath" ),Charset.forName("UTF8"))

My team encountered the same issue in machines with Windows.. then managed to resolve it in two ways:

a) Set enviroment variable (even in Windows system preferences)

JAVA_TOOL_OPTIONS
-Dfile.encoding=UTF8

b) Introduce following snippet to your pom.xml:

 -Dfile.encoding=UTF-8 

WITHIN

 <jvmArguments>
 -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8001
 -Dfile.encoding=UTF-8
 </jvmArguments>

Solve this problem in my project. Hope it helps someone.

I use LIBGDX java framework and also had this issue in my android studio project. In Mac OS encoding is correct, but in Windows 10 special characters and symbols and also russian characters show as questions like: ????? and other incorrect symbols.

  1. Change in android studio project settings: File->Settings...->Editor-> File Encodings to UTF-8 in all three fields (Global Encoding, Project Encoding and Default below).

  2. In any java file set:

    System.setProperty("file.encoding","UTF-8");

  3. And for test print debug log:

    System.out.println("My project encoding is : "+ Charset.defaultCharset());


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to utf-8

error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Changing PowerShell's default output encoding to UTF-8 'Malformed UTF-8 characters, possibly incorrectly encoded' in Laravel Encoding Error in Panda read_csv Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? what is <meta charset="utf-8">? Pandas df.to_csv("file.csv" encode="utf-8") still gives trash characters for minus sign UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) Android Studio : unmappable character for encoding UTF-8

Examples related to character-encoding

Changing PowerShell's default output encoding to UTF-8 JsonParseException : Illegal unquoted character ((CTRL-CHAR, code 10) Change the encoding of a file in Visual Studio Code What is the difference between utf8mb4 and utf8 charsets in MySQL? How to open html file? All inclusive Charset to avoid "java.nio.charset.MalformedInputException: Input length = 1"? UTF-8 output from PowerShell ERROR 1115 (42000): Unknown character set: 'utf8mb4' "for line in..." results in UnicodeDecodeError: 'utf-8' codec can't decode byte How to make php display \t \n as tab and new line instead of characters