[java] Write a file in UTF-8 using FileWriter (Java)?

I have the following code however, I want it to write as a UTF-8 file to handle foreign characters. Is there a way of doing this, is there some need to have a parameter?

I would really appreciate your help with this. Thanks.

try {
  BufferedReader reader = new BufferedReader(new FileReader("C:/Users/Jess/My Documents/actresses.list"));
  writer = new BufferedWriter(new FileWriter("C:/Users/Jess/My Documents/actressesFormatted.csv"));
  while( (line = reader.readLine()) != null) {
    //If the line starts with a tab then we just want to add a movie
    //using the current actor's name.
    if(line.length() == 0)
      continue;
    else if(line.charAt(0) == '\t') {
      readMovieLine2(0, line, surname.toString(), forename.toString());
    } //Else we've reached a new actor
    else {
      readActorName(line);
    }
  }
} catch (IOException e) {
  e.printStackTrace();
}

This question is related to java file-io unicode utf-8 file-format

The answer is


OK it's 2019 now, and from Java 11 you have a constructor with Charset:

FileWriter?(String fileName, Charset charset)

Unfortunately, we still cannot modify the byte buffer size, and it's set to 8192. (https://www.baeldung.com/java-filewriter)


In my opinion

If you wanna write follow kind UTF-8.You should create a byte array.Then,you can do such as the following: byte[] by=("<?xml version=\"1.0\" encoding=\"utf-8\"?>"+"Your string".getBytes();

Then, you can write each byte into file you created. Example:

OutputStream f=new FileOutputStream(xmlfile);
    byte[] by=("<?xml version=\"1.0\" encoding=\"utf-8\"?>"+"Your string".getBytes();
    for (int i=0;i<by.length;i++){
    byte b=by[i];
    f.write(b);

    }
    f.close();

Safe Encoding Constructors

Getting Java to properly notify you of encoding errors is tricky. You must use the most verbose and, alas, the least used of the four alternate contructors for each of InputStreamReader and OutputStreamWriter to receive a proper exception on an encoding glitch.

For file I/O, always make sure to always use as the second argument to both OutputStreamWriter and InputStreamReader the fancy encoder argument:

  Charset.forName("UTF-8").newEncoder()

There are other even fancier possibilities, but none of the three simpler possibilities work for exception handing. These do:

 OutputStreamWriter char_output = new OutputStreamWriter(
     new FileOutputStream("some_output.utf8"),
     Charset.forName("UTF-8").newEncoder() 
 );

 InputStreamReader char_input = new InputStreamReader(
     new FileInputStream("some_input.utf8"),
     Charset.forName("UTF-8").newDecoder() 
 );

As for running with

 $ java -Dfile.encoding=utf8 SomeTrulyRemarkablyLongcLassNameGoeShere

The problem is that that will not use the full encoder argument form for the character streams, and so you will again miss encoding problems.

Longer Example

Here’s a longer example, this one managing a process instead of a file, where we promote two different input bytes streams and one output byte stream all to UTF-8 character streams with full exception handling:

 // this runs a perl script with UTF-8 STD{IN,OUT,ERR} streams
 Process
 slave_process = Runtime.getRuntime().exec("perl -CS script args");

 // fetch his stdin byte stream...
 OutputStream
 __bytes_into_his_stdin  = slave_process.getOutputStream();

 // and make a character stream with exceptions on encoding errors
 OutputStreamWriter
   chars_into_his_stdin  = new OutputStreamWriter(
                             __bytes_into_his_stdin,
         /* DO NOT OMIT! */  Charset.forName("UTF-8").newEncoder()
                         );

 // fetch his stdout byte stream...
 InputStream
 __bytes_from_his_stdout = slave_process.getInputStream();

 // and make a character stream with exceptions on encoding errors
 InputStreamReader
   chars_from_his_stdout = new InputStreamReader(
                             __bytes_from_his_stdout,
         /* DO NOT OMIT! */  Charset.forName("UTF-8").newDecoder()
                         );

// fetch his stderr byte stream...
 InputStream
 __bytes_from_his_stderr = slave_process.getErrorStream();

 // and make a character stream with exceptions on encoding errors
 InputStreamReader
   chars_from_his_stderr = new InputStreamReader(
                             __bytes_from_his_stderr,
         /* DO NOT OMIT! */  Charset.forName("UTF-8").newDecoder()
                         );

Now you have three character streams that all raise exception on encoding errors, respectively called chars_into_his_stdin, chars_from_his_stdout, and chars_from_his_stderr.

This is only slightly more complicated that what you need for your problem, whose solution I gave in the first half of this answer. The key point is this is the only way to detect encoding errors.

Just don’t get me started about PrintStreams eating exceptions.


Ditch FileWriter and FileReader, which are useless exactly because they do not allow you to specify the encoding. Instead, use

new OutputStreamWriter(new FileOutputStream(file), StandardCharsets.UTF_8)

and

new InputStreamReader(new FileInputStream(file), StandardCharsets.UTF_8);


Since Java 7 there is an easy way to handle character encoding of BufferedWriter and BufferedReaders. You can create a BufferedWriter directly by using the Files class instead of creating various instances of Writer. You can simply create a BufferedWriter, which considers character encoding, by calling:

Files.newBufferedWriter(file.toPath(), StandardCharsets.UTF_8);

You can find more about it in JavaDoc:


With Chinese text, I tried to use the Charset UTF-16 and lucklily it work.

Hope this could help!

PrintWriter out = new PrintWriter( file, "UTF-16" );

You need to use the OutputStreamWriter class as the writer parameter for your BufferedWriter. It does accept an encoding. Review javadocs for it.

Somewhat like this:

BufferedWriter out = new BufferedWriter(new OutputStreamWriter(
    new FileOutputStream("jedis.txt"), "UTF-8"
));

Or you can set the current system encoding with the system property file.encoding to UTF-8.

java -Dfile.encoding=UTF-8 com.jediacademy.Runner arg1 arg2 ...

You may also set it as a system property at runtime with System.setProperty(...) if you only need it for this specific file, but in a case like this I think I would prefer the OutputStreamWriter.

By setting the system property you can use FileWriter and expect that it will use UTF-8 as the default encoding for your files. In this case for all the files that you read and write.

EDIT

  • Starting from API 19, you can replace the String "UTF-8" with StandardCharsets.UTF_8

  • As suggested in the comments below by tchrist, if you intend to detect encoding errors in your file you would be forced to use the OutputStreamWriter approach and use the constructor that receives a charset encoder.

    Somewhat like

    CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder();
    encoder.onMalformedInput(CodingErrorAction.REPORT);
    encoder.onUnmappableCharacter(CodingErrorAction.REPORT);
    BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("jedis.txt"),encoder));
    

    You may choose between actions IGNORE | REPLACE | REPORT

Also, this question was already answered here.


Since Java 11 you can do:

FileWriter fw = new FileWriter("filename.txt", Charset.forName("utf-8"));

use OutputStream instead of FileWriter to set encoding type

// file is your File object where you want to write you data 
OutputStream outputStream = new FileOutputStream(file);
OutputStreamWriter outputStreamWriter = new OutputStreamWriter(outputStream, "UTF-8");
outputStreamWriter.write(json); // json is your data 
outputStreamWriter.flush();
outputStreamWriter.close();

Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to file-io

Python, Pandas : write content of DataFrame into text File Saving response from Requests to file How to while loop until the end of a file in Python without checking for empty line? Getting "java.nio.file.AccessDeniedException" when trying to write to a folder How do I add a resources folder to my Java project in Eclipse Read and write a String from text file Python Pandas: How to read only first n rows of CSV files in? Open files in 'rt' and 'wt' modes How to write to a file without overwriting current contents? Write objects into file with Node.js

Examples related to unicode

How to resolve TypeError: can only concatenate str (not "int") to str (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape UnicodeEncodeError: 'ascii' codec can't encode character at special name Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP) HTML for the Pause symbol in audio and video control Javascript: Unicode string to hex Concrete Javascript Regex for Accented Characters (Diacritics) Replace non-ASCII characters with a single space UTF-8 in Windows 7 CMD NameError: global name 'unicode' is not defined - in Python 3

Examples related to utf-8

error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Changing PowerShell's default output encoding to UTF-8 'Malformed UTF-8 characters, possibly incorrectly encoded' in Laravel Encoding Error in Panda read_csv Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? what is <meta charset="utf-8">? Pandas df.to_csv("file.csv" encode="utf-8") still gives trash characters for minus sign UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) Android Studio : unmappable character for encoding UTF-8

Examples related to file-format

Write a file in UTF-8 using FileWriter (Java)? Text file with 0D 0D 0A line breaks Can a CSV file have a comment? What is the difference between "JPG" / "JPEG" / "PNG" / "BMP" / "GIF" / "TIFF" Image?