[java] Java FileReader encoding issue

I tried to use java.io.FileReader to read some text files and convert them into a string, but I found the result is wrongly encoded and not readable at all.

Here's my environment:

  • Windows 2003, OS encoding: CP1252

  • Java 5.0

My files are UTF-8 encoded or CP1252 encoded, and some of them (UTF-8 encoded files) may contain Chinese (non-Latin) characters.

I use the following code to do my work:

   private static String readFileAsString(String filePath)
    throws java.io.IOException{
        StringBuffer fileData = new StringBuffer(1000);
        FileReader reader = new FileReader(filePath);
        //System.out.println(reader.getEncoding());
        BufferedReader reader = new BufferedReader(reader);
        char[] buf = new char[1024];
        int numRead=0;
        while((numRead=reader.read(buf)) != -1){
            String readData = String.valueOf(buf, 0, numRead);
            fileData.append(readData);
            buf = new char[1024];
        }
        reader.close();
        return fileData.toString();
    }

The above code doesn't work. I found the FileReader's encoding is CP1252 even if the text is UTF-8 encoded. But the JavaDoc of java.io.FileReader says that:

The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate.

Does this mean that I am not required to set character encoding by myself if I am using FileReader? But I did get wrongly encoded data currently, what's the correct way to deal with my situtaion? Thanks.

This question is related to java file unicode encoding

The answer is


FileReader uses Java's platform default encoding, which depends on the system settings of the computer it's running on and is generally the most popular encoding among users in that locale.

If this "best guess" is not correct then you have to specify the encoding explicitly. Unfortunately, FileReader does not allow this (major oversight in the API). Instead, you have to use new InputStreamReader(new FileInputStream(filePath), encoding) and ideally get the encoding from metadata about the file.


FileInputStream with InputStreamReader is better than directly using FileReader, because the latter doesn't allow you to specify encoding charset.

Here is an example using BufferedReader, FileInputStream and InputStreamReader together, so that you could read lines from a file.

List<String> words = new ArrayList<>();
List<String> meanings = new ArrayList<>();
public void readAll( ) throws IOException{
    String fileName = "College_Grade4.txt";
    String charset = "UTF-8";
    BufferedReader reader = new BufferedReader(
        new InputStreamReader(
            new FileInputStream(fileName), charset)); 

    String line; 
    while ((line = reader.readLine()) != null) { 
        line = line.trim();
        if( line.length() == 0 ) continue;
        int idx = line.indexOf("\t");
        words.add( line.substring(0, idx ));
        meanings.add( line.substring(idx+1));
    } 
    reader.close();
}

Since Java 11 you may use that:

public FileReader(String fileName, Charset charset) throws IOException;

For another as Latin languages for example Cyrillic you can use something like this:

FileReader fr = new FileReader("src/text.txt", StandardCharsets.UTF_8);

and be sure that your .txt file is saved with UTF-8 (but not as default ANSI) format. Cheers!


For Java 7+ doc you can use this:

BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8);

Here are all Charsets doc

For example if your file is in CP1252, use this method

Charset.forName("windows-1252");

Here is other canonical names for Java encodings both for IO and NIO doc

If you do not know with exactly encoding you have got in a file, you may use some third-party libs like this tool from Google this which works fairly neat.


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to file

Gradle - Move a folder from ABC to XYZ Difference between opening a file in binary vs text Angular: How to download a file from HttpClient? Python error message io.UnsupportedOperation: not readable java.io.FileNotFoundException: class path resource cannot be opened because it does not exist Writing JSON object to a JSON file with fs.writeFileSync How to read/write files in .Net Core? How to write to a CSV line by line? Writing a dictionary to a text file? What are the pros and cons of parquet format compared to other formats?

Examples related to unicode

How to resolve TypeError: can only concatenate str (not "int") to str (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape UnicodeEncodeError: 'ascii' codec can't encode character at special name Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP) HTML for the Pause symbol in audio and video control Javascript: Unicode string to hex Concrete Javascript Regex for Accented Characters (Diacritics) Replace non-ASCII characters with a single space UTF-8 in Windows 7 CMD NameError: global name 'unicode' is not defined - in Python 3

Examples related to encoding

How to check encoding of a CSV file UnicodeEncodeError: 'ascii' codec can't encode character at special name Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? The character encoding of the plain text document was not declared - mootool script UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) How to encode text to base64 in python UTF-8 output from PowerShell Set Encoding of File to UTF8 With BOM in Sublime Text 3 Replace non-ASCII characters with a single space