Java FileReader encoding issue

Question

I tried to use java io FileReader to read some text files and convert them into a string  but I found the result is wrongly encoded and not readable at all   Here s my environment    Windows 2003  OS encoding  CP1252 Java 5 0   My files are UTF-8 encoded or CP1252 encoded  and some of them  UTF-8 encoded files   may contain Chinese  non-Latin  characters   I use the following code to do my work      private static String readFileAsString String filePath      throws java io IOException          StringBuffer fileData   new StringBuffer 1000           FileReader reader   new FileReader filePath             System out println reader getEncoding             BufferedReader reader   new BufferedReader reader           char   buf   new char 1024           int numRead 0          while  numRead reader read buf      -1               String readData   String valueOf buf  0  numRead               fileData append readData               buf   new char 1024                     reader close            return fileData toString            The above code doesn t work  I found the FileReader s encoding is CP1252 even if the text is UTF-8 encoded  But the JavaDoc of java io FileReader says that      The constructors of this class assume   that the default character encoding   and the default byte-buffer size are   appropriate    Does this mean that I am not required to set character encoding by myself if I am using FileReader  But I did get wrongly encoded data currently  what s the correct way to deal with my situtaion  Thanks

User · Answer

FileReader uses Java s platform default encoding  which depends on the system settings of the computer it s running on and is generally the most popular encoding among users in that locale   If this  best guess  is not correct then you have to specify the encoding explicitly  Unfortunately  FileReader does not allow this  major oversight in the API   Instead  you have to use new InputStreamReader new FileInputStream filePath   encoding  and ideally get the encoding from metadata about the file

User · Answer

Yes  you need to specify the encoding of the file you want to read   Yes  this means that you have to know the encoding of the file you want to read   No  there is no general way to guess the encoding of any given  plain text  file   The one-arguments constructors of FileReader always use the platform default encoding which is generally a bad idea   Since Java 11 FileReader has also gained constructors that accept an encoding  new FileReader file  charset  and new FileReader fileName  charset    In earlier versions of java  you need to use new InputStreamReader new FileInputStream pathToFile    lt encoding gt

User · Answer

For Java 7  doc you can use this   BufferedReader reader   Files newBufferedReader path  StandardCharsets UTF 8     Here are all Charsets doc  For example if your file is in CP1252  use this method  Charset forName  windows-1252      Here is other canonical names for Java encodings both for IO and NIO doc  If you do not know with exactly encoding you have got in a file  you may use some third-party libs like this tool from Google this which works fairly neat

User · Answer

FileInputStream with InputStreamReader is better than directly using FileReader  because the latter doesn t allow you to specify encoding charset   Here is an example using BufferedReader  FileInputStream and InputStreamReader together  so that you could read lines from a file    List lt String gt  words   new ArrayList lt  gt     List lt String gt  meanings   new ArrayList lt  gt     public void readAll    throws IOException      String fileName    College Grade4 txt       String charset    UTF-8       BufferedReader reader   new BufferedReader          new InputStreamReader              new FileInputStream fileName   charset          String line       while   line   reader readLine       null             line   line trim            if  line length      0   continue          int idx   line indexOf   t            words add  line substring 0  idx             meanings add  line substring idx 1               reader close

User · Answer

For another as Latin languages for example Cyrillic you can use something like this   FileReader fr   new FileReader  src text txt   StandardCharsets UTF 8     and be sure that your  txt file is saved with UTF-8  but not as default ANSI  format  Cheers

User · Answer

Since Java 11 you may use that   public FileReader String fileName  Charset charset  throws IOException

[java] Java FileReader encoding issue

Examples related to java

Examples related to file

Examples related to unicode

Examples related to encoding