[java] How can I get the count of line in a file in an efficient way?

I have a big file. It includes approximately 3.000-20.000 lines. How can I get the total count of lines in the file using Java?

This question is related to java file

The answer is


Try the unix "wc" command. I don't mean use it, I mean download the source and see how they do it. It's probably in c, but you can easily port the behavior to java. The problem with making your own is to account for the ending cr/lf problem.


Quick and dirty, but it does the job:

import java.io.*;

public class Counter {

    public final static void main(String[] args) throws IOException {
        if (args.length > 0) {
            File file = new File(args[0]);
            System.out.println(countLines(file));
        }
    }

    public final static int countLines(File file) throws IOException {
        ProcessBuilder builder = new ProcessBuilder("wc", "-l", file.getAbsolutePath());
        Process process = builder.start();
        InputStream in = process.getInputStream();
        LineNumberReader reader = new LineNumberReader(new InputStreamReader(in));
        String line = reader.readLine();
        if (line != null) {
            return Integer.parseInt(line.trim().split(" ")[0]);
        } else {
            return -1;
        }
    }

}

Read the file through and count the number of newline characters. An easy way to read a file in Java, one line at a time, is the java.util.Scanner class.


Files.lines

Java 8+ has a nice and short way using NIO using Files.lines. Note that you have to close the stream using try-with-resources:

long lineCount;
try (Stream<String> stream = Files.lines(path, StandardCharsets.UTF_8)) {
  lineCount = stream.count();
}

If you don't specify the character encoding, the default one used is UTF-8. You may specify an alternate encoding to match your particular data file as shown in the example above.


All previous answers suggest to read though the whole file and count the amount of newlines you find while doing this. You commented some as "not effective" but thats the only way you can do that. A "line" is nothing else as a simple character inside the file. And to count that character you must have a look at every single character within the file.

I'm sorry, but you have no choice. :-)


The buffered reader is overkill

Reader r = new FileReader("f.txt");

int count = 0;
int nextchar = 0;
while (nextchar != -1){
        nextchar = r.read();
        if (nextchar == Character.getNumericValue('\n') ){
            count++;
        }
    }

My search for a simple example has createde one thats actually quite poor. calling read() repeadedly for a single character is less than optimal. see here for examples and measurements.


Read the file line by line and increment a counter for each line until you have read the entire file.


This solution is about 3.6× faster than the top rated answer when tested on a file with 13.8 million lines. It simply reads the bytes into a buffer and counts the \n characters. You could play with the buffer size, but on my machine, anything above 8KB didn't make the code faster.

private int countLines(File file) throws IOException {
    int lines = 0;

    FileInputStream fis = new FileInputStream(file);
    byte[] buffer = new byte[BUFFER_SIZE]; // BUFFER_SIZE = 8 * 1024
    int read;

    while ((read = fis.read(buffer)) != -1) {
        for (int i = 0; i < read; i++) {
            if (buffer[i] == '\n') lines++;
        }
    }

    fis.close();

    return lines;
}

This is about as efficient as it can get, buffered binary read, no string conversion,

FileInputStream stream = new FileInputStream("/tmp/test.txt");
byte[] buffer = new byte[8192];
int count = 0;
int n;
while ((n = stream.read(buffer)) > 0) {
    for (int i = 0; i < n; i++) {
        if (buffer[i] == '\n') count++;
    }
}
stream.close();
System.out.println("Number of lines: " + count);

Do You need exact number of lines or only its approximation? I happen to process large files in parallel and often I don't need to know exact count of lines - I then revert to sampling. Split the file into ten 1MB chunks and count lines in each chunk, then multiply it by 10 and You'll receive pretty good approximation of line count.


Probably the fastest solution in pure Java would be to read the file as bytes using a NIO Channel into large ByteBuffer. Then using your knowledge of the file encoding scheme(s) count the encoded CR and/or NL bytes, per the relevant line separator convention.

The keys to maximising throughput will be:

  • make sure that you read the file in large chunks,
  • avoid copying the bytes from one buffer to another,
  • avoid copying / converting bytes into characters, and
  • avoid allocating objects to represent the file lines.

The actual code is too complicated for me to write on the fly. Besides, the OP is not asking for the fastest solution.


use LineNumberReader

something like

public static int countLines(File aFile) throws IOException {
    LineNumberReader reader = null;
    try {
        reader = new LineNumberReader(new FileReader(aFile));
        while ((reader.readLine()) != null);
        return reader.getLineNumber();
    } catch (Exception ex) {
        return -1;
    } finally { 
        if(reader != null) 
            reader.close();
    }
}

I found some solution for this, it might useful for you

Below is the code snippet for, count the no.of lines from the file.

  File file = new File("/mnt/sdcard/abc.txt");
  LineNumberReader lineNumberReader = new LineNumberReader(new FileReader(file));
  lineNumberReader.skip(Long.MAX_VALUE);
  int lines = lineNumberReader.getLineNumber();
  lineNumberReader.close();

If the already posted answers aren't fast enough you'll probably have to look for a solution specific to your particular problem.

For example if these text files are logs that are only appended to and you regularly need to know the number of lines in them you could create an index. This index would contain the number of lines in the file, when the file was last modified and how large the file was then. This would allow you to recalculate the number of lines in the file by skipping over all the lines you had already seen and just reading the new lines.


Old post, but I have a solution that could be usefull for next people. Why not just use file length to know what is the progression? Of course, lines has to be almost the same size, but it works very well for big files:

public static void main(String[] args) throws IOException {
    File file = new File("yourfilehere");
    double fileSize = file.length();
    System.out.println("=======> File size = " + fileSize);
    InputStream inputStream = new FileInputStream(file);
    InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "iso-8859-1");
    BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
    int totalRead = 0;
    try {
        while (bufferedReader.ready()) {
            String line = bufferedReader.readLine();
            // LINE PROCESSING HERE
            totalRead += line.length() + 1; // we add +1 byte for the newline char.
            System.out.println("Progress ===> " + ((totalRead / fileSize) * 100) + " %");
        }
    } finally {
        bufferedReader.close();
    }
}

It allows to see the progression without doing any full read on the file. I know it depends on lot of elements, but I hope it will be usefull :).

[Edition] Here is a version with estimated time. I put some SYSO to show progress and estimation. I see that you have a good time estimation errors after you have treated enough line (I try with 10M lines, and after 1% of the treatment, the time estimation was exact at 95%). I know, some values has to be set in variable. This code is quickly written but has be usefull for me. Hope it will be for you too :).

long startProcessLine = System.currentTimeMillis();
    int totalRead = 0;
    long progressTime = 0;
    double percent = 0;
    int i = 0;
    int j = 0;
    int fullEstimation = 0;
    try {
        while (bufferedReader.ready()) {
            String line = bufferedReader.readLine();
            totalRead += line.length() + 1;
            progressTime = System.currentTimeMillis() - startProcessLine;
            percent = (double) totalRead / fileSize * 100;
            if ((percent > 1) && i % 10000 == 0) {
                int estimation = (int) ((progressTime / percent) * (100 - percent));
                fullEstimation += progressTime + estimation;
                j++;
                System.out.print("Progress ===> " + percent + " %");
                System.out.print(" - current progress : " + (progressTime) + " milliseconds");
                System.out.print(" - Will be finished in ===> " + estimation + " milliseconds");
                System.out.println(" - estimated full time => " + (progressTime + estimation));
            }
            i++;
        }
    } finally {
        bufferedReader.close();
    }
    System.out.println("Ended in " + (progressTime) + " seconds");
    System.out.println("Estimative average ===> " + (fullEstimation / j));
    System.out.println("Difference: " + ((((double) 100 / (double) progressTime)) * (progressTime - (fullEstimation / j))) + "%");

Feel free to improve this code if you think it's a good solution.