All inclusive Charset to avoid java nio charset MalformedInputException Input length 1

Question

I m creating a simple wordcount program in Java that reads through a directory s text-based files   However  I keep on getting the error   java nio charset MalformedInputException  Input length   1   from this line of code   BufferedReader reader   Files newBufferedReader file Charset forName  UTF-8       I know I probably get this because I used a Charset that didn t include some of the characters in the text files  some of which included characters of other languages  But I want to include those characters   I later learned at the JavaDocs that the Charset is optional and only used for a more efficient reading of the files  so I changed the code to   BufferedReader reader   Files newBufferedReader file     But some files still throw the MalformedInputException  I don t know why   I was wondering if there is an all-inclusive Charset that will allow me to read text files with many different types of characters   Thanks

User · Answer

I also encountered this exception with error message,

java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(Unknown Source)
at sun.nio.cs.StreamEncoder.implWrite(Unknown Source)
at sun.nio.cs.StreamEncoder.write(Unknown Source)
at java.io.OutputStreamWriter.write(Unknown Source)
at java.io.BufferedWriter.flushBuffer(Unknown Source)
at java.io.BufferedWriter.write(Unknown Source)
at java.io.Writer.write(Unknown Source)

and found that some strange bug occurs when trying to use

BufferedWriter writer = Files.newBufferedWriter(Paths.get(filePath));

to write a String "orazg 54" cast from a generic type in a class.

//key is of generic type <Key extends Comparable<Key>>
writer.write(item.getKey() + "\t" + item.getValue() + "\n");

This String is of length 9 containing chars with the following code points:

111 114 97 122 103 9 53 52 10

However, if the BufferedWriter in the class is replaced with:

FileOutputStream outputStream = new FileOutputStream(filePath);
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(outputStream));

it can successfully write this String without exceptions. In addition, if I write the same String create from the characters it still works OK.

String string = new String(new char[] {111, 114, 97, 122, 103, 9, 53, 52, 10});
BufferedWriter writer = Files.newBufferedWriter(Paths.get("a.txt"));
writer.write(string);
writer.close();

Previously I have never encountered any Exception when using the first BufferedWriter to write any Strings. It's a strange bug that occurs to BufferedWriter created from java.nio.file.Files.newBufferedWriter(path, options)

User · Answer

ISO-8859-1 is an all-inclusive charset  in the sense that it s guaranteed not to throw MalformedInputException  So it s good for debugging  even if your input is not in this charset   So -  req setCharacterEncoding  ISO-8859-1      I had some double-right-quote double-left-quote characters in my input  and both US-ASCII and UTF-8 threw MalformedInputException on them  but ISO-8859-1 worked

User · Answer

you can try something like this  or just copy and past below piece   boolean exception   true  Charset charset   Charset defaultCharset      Try the default one first          int index   0   while exception        try           lines   Files readAllLines f toPath   charset             for  String line  lines                  line  line trim                  if line contains keyword                     values add line                                        No exception  just returns         exception   false         catch  IOException e            exception   true            Try the next charset         if index lt Charset availableCharsets   values   size                charset    Charset  Charset availableCharsets   values   toArray   index           index

User · Answer

Well  the problem is that Files newBufferedReader Path path  is implemented like this    public static BufferedReader newBufferedReader Path path  throws IOException       return newBufferedReader path  StandardCharsets UTF 8       so basically there is no point in specifying UTF-8 unless you want to be descriptive in your code   If you want to try a  broader  charset you could try with StandardCharsets UTF 16  but you can t be 100  sure to get every possible character anyway

User · Answer

ISO 8859 1 Worked for me  I was reading text file with comma separated values

User · Answer

UTF-8 works for me with Polish characters

User · Answer

try this   i had the same issue  below implementation worked for me  Reader reader   Files newBufferedReader Paths get  lt yourfilewithpath gt    StandardCharsets ISO 8859 1     then use Reader where ever you want   foreg   CsvToBean lt anyPojo gt  csvToBean   null      try           Reader reader   Files newBufferedReader Paths get csvFilePath                            StandardCharsets ISO 8859 1           csvToBean   new CsvToBeanBuilder reader                   withType anyPojo class                   withIgnoreLeadingWhiteSpace true                   withSkipLines 1                   build           catch  IOException e            e printStackTrace

User · Answer

You probably want to have a list of supported encodings   For each file  try each encoding in turn  maybe starting with UTF-8   Every time you catch the MalformedInputException  try the next encoding

User · Answer

I wrote the following to print a list of results to standard out based on available charsets   Note that it also tells you what line fails from a 0 based line number in case you are troubleshooting what character is causing issues   public static void testCharset String fileName        SortedMap lt String  Charset gt  charsets   Charset availableCharsets        for  String k   charsets keySet              int line   0          boolean success   true          try  BufferedReader b   Files newBufferedReader Paths get fileName  charsets get k                  while  b ready                      b readLine                    line                            catch  IOException e                success   false              System out println k   failed on line   line                     if  success               System out println                             Successs   k

User · Answer

Creating BufferedReader from Files newBufferedReader   Files newBufferedReader Paths get  a txt    StandardCharsets UTF 8     when running the application it may throw the following exception   java nio charset MalformedInputException  Input length   1   But  new BufferedReader new InputStreamReader new FileInputStream  a txt    utf-8       works well   The different is that  the former uses CharsetDecoder default action      The default action for malformed-input and unmappable-character errors is to report them    while the latter uses the REPLACE action   cs newDecoder   onMalformedInput CodingErrorAction REPLACE  onUnmappableCharacter CodingErrorAction REPLACE

[java] All inclusive Charset to avoid "java.nio.charset.MalformedInputException: Input length = 1"?

Examples related to java

Examples related to character-encoding