[java] Check line for unprintable characters while reading text file

My program must read text files - line by line. Files in UTF-8. I am not sure that files are correct - can contain unprintable characters. Is possible check for it without going to byte level? Thanks.

This question is related to java file file-io

The answer is


If you want to check a string has unprintable characters you can use a regular expression

[^\p{Print}]

The answer by @T.J.Crowder is Java 6 - in java 7 the valid answer is the one by @McIntosh - though its use of Charset for name for UTF -8 is discouraged:

List<String> lines = Files.readAllLines(Paths.get("/tmp/test.csv"),
    StandardCharsets.UTF_8);
for(String line: lines){ /* DO */ }

Reminds a lot of the Guava way posted by Skeet above - and of course same caveats apply. That is, for big files (Java 7):

BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8);
for (String line = reader.readLine(); line != null; line = reader.readLine()) {}

While it's not hard to do this manually using BufferedReader and InputStreamReader, I'd use Guava:

List<String> lines = Files.readLines(file, Charsets.UTF_8);

You can then do whatever you like with those lines.

EDIT: Note that this will read the whole file into memory in one go. In most cases that's actually fine - and it's certainly simpler than reading it line by line, processing each line as you read it. If it's an enormous file, you may need to do it that way as per T.J. Crowder's answer.


Open the file with a FileInputStream, then use an InputStreamReader with the UTF-8 Charset to read characters from the stream, and use a BufferedReader to read lines, e.g. via BufferedReader#readLine, which will give you a string. Once you have the string, you can check for characters that aren't what you consider to be printable.

E.g. (without error checking), using try-with-resources (which is in vaguely modern Java version):

String line;
try (
    InputStream fis = new FileInputStream("the_file_name");
    InputStreamReader isr = new InputStreamReader(fis, Charset.forName("UTF-8"));
    BufferedReader br = new BufferedReader(isr);
) {
    while ((line = br.readLine()) != null) {
        // Deal with the line
    }
}

How about below:

 FileReader fileReader = new FileReader(new File("test.txt"));

 BufferedReader br = new BufferedReader(fileReader);

 String line = null;
 // if no more lines the readLine() returns null
 while ((line = br.readLine()) != null) {
      // reading lines until the end of the file

 }

Source: http://devmain.blogspot.co.uk/2013/10/java-quick-way-to-read-or-write-to-file.html


If every char in the file is properly encoded in UTF-8, you won't have any problem reading it using a reader with the UTF-8 encoding. Up to you to check every char of the file and see if you consider it printable or not.


I can find following ways to do.

private static final String fileName = "C:/Input.txt";

public static void main(String[] args) throws IOException {
    Stream<String> lines = Files.lines(Paths.get(fileName));
    lines.toArray(String[]::new);

    List<String> readAllLines = Files.readAllLines(Paths.get(fileName));
    readAllLines.forEach(s -> System.out.println(s));

    File file = new File(fileName);
    Scanner scanner = new Scanner(file);
    while (scanner.hasNext()) {
        System.out.println(scanner.next());
    }

Just found out that with the Java NIO (java.nio.file.*) you can easily write:

List<String> lines=Files.readAllLines(Paths.get("/tmp/test.csv"), StandardCharsets.UTF_8);
for(String line:lines){
  System.out.println(line);
}

instead of dealing with FileInputStreams and BufferedReaders...


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to file

Gradle - Move a folder from ABC to XYZ Difference between opening a file in binary vs text Angular: How to download a file from HttpClient? Python error message io.UnsupportedOperation: not readable java.io.FileNotFoundException: class path resource cannot be opened because it does not exist Writing JSON object to a JSON file with fs.writeFileSync How to read/write files in .Net Core? How to write to a CSV line by line? Writing a dictionary to a text file? What are the pros and cons of parquet format compared to other formats?

Examples related to file-io

Python, Pandas : write content of DataFrame into text File Saving response from Requests to file How to while loop until the end of a file in Python without checking for empty line? Getting "java.nio.file.AccessDeniedException" when trying to write to a folder How do I add a resources folder to my Java project in Eclipse Read and write a String from text file Python Pandas: How to read only first n rows of CSV files in? Open files in 'rt' and 'wt' modes How to write to a file without overwriting current contents? Write objects into file with Node.js