[java] How can I read a large text file line by line using Java?

I need to read a large text file of around 5-6 GB line by line using Java.

How can I do this quickly?

This question is related to java performance file-io io garbage-collection

The answer is


You can use streams to do it more precisely:

Files.lines(Paths.get("input.txt")).forEach(s -> stringBuffer.append(s);

For reading a file with Java 8

package com.java.java8;

import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.stream.Stream;

/**
 * The Class ReadLargeFile.
 *
 * @author Ankit Sood Apr 20, 2017
 */
public class ReadLargeFile {

    /**
     * The main method.
     *
     * @param args
     *            the arguments
     */
    public static void main(String[] args) {
        try {
            Stream<String> stream = Files.lines(Paths.get("C:\\Users\\System\\Desktop\\demoData.txt"));
            stream.forEach(System.out::println);
        }
        catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}

Java 9:

try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
    stream.forEach(System.out::println);
}

In Java 8, there is also an alternative to using Files.lines(). If your input source isn't a file but something more abstract like a Reader or an InputStream, you can stream the lines via the BufferedReaders lines() method.

For example:

try (BufferedReader reader = new BufferedReader(...)) {
  reader.lines().forEach(line -> processLine(line));
}

will call processLine() for each input line read by the BufferedReader.


I documented and tested 10 different ways to read a file in Java and then ran them against each other by making them read in test files from 1KB to 1GB. Here are the fastest 3 file reading methods for reading a 1GB test file.

Note that when running the performance tests I didn't output anything to the console since that would really slow down the test. I just wanted to test the raw reading speed.

1) java.nio.file.Files.readAllBytes()

Tested in Java 7, 8, 9. This was overall the fastest method. Reading a 1GB file was consistently just under 1 second.

import java.io..File;
import java.io.IOException;
import java.nio.file.Files;

public class ReadFile_Files_ReadAllBytes {
  public static void main(String [] pArgs) throws IOException {
    String fileName = "c:\\temp\\sample-1GB.txt";
    File file = new File(fileName);

    byte [] fileBytes = Files.readAllBytes(file.toPath());
    char singleChar;
    for(byte b : fileBytes) {
      singleChar = (char) b;
      System.out.print(singleChar);
    }
  }
}

2) java.nio.file.Files.lines()

This was tested successfully in Java 8 and 9 but it won't work in Java 7 because of the lack of support for lambda expressions. It took about 3.5 seconds to read in a 1GB file which put it in second place as far as reading larger files.

import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.util.stream.Stream;

public class ReadFile_Files_Lines {
  public static void main(String[] pArgs) throws IOException {
    String fileName = "c:\\temp\\sample-1GB.txt";
    File file = new File(fileName);

    try (Stream linesStream = Files.lines(file.toPath())) {
      linesStream.forEach(line -> {
        System.out.println(line);
      });
    }
  }
}

3) BufferedReader

Tested to work in Java 7, 8, 9. This took about 4.5 seconds to read in a 1GB test file.

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class ReadFile_BufferedReader_ReadLine {
  public static void main(String [] args) throws IOException {
    String fileName = "c:\\temp\\sample-1GB.txt";
    FileReader fileReader = new FileReader(fileName);

    try (BufferedReader bufferedReader = new BufferedReader(fileReader)) {
      String line;
      while((line = bufferedReader.readLine()) != null) {
        System.out.println(line);
      }
    }
  }

You can find the complete rankings for all 10 file reading methods here.


FileReader won't let you specify the encoding, use InputStreamReaderinstead if you need to specify it:

try {
    BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), "Cp1252"));         

    String line;
    while ((line = br.readLine()) != null) {
        // process the line.
    }
    br.close();

} catch (IOException e) {
    e.printStackTrace();
}

If you imported this file from Windows, it might have ANSI encoding (Cp1252), so you have to specify the encoding.


In Java 8, you could do:

try (Stream<String> lines = Files.lines (file, StandardCharsets.UTF_8))
{
    for (String line : (Iterable<String>) lines::iterator)
    {
        ;
    }
}

Some notes: The stream returned by Files.lines (unlike most streams) needs to be closed. For the reasons mentioned here I avoid using forEach(). The strange code (Iterable<String>) lines::iterator casts a Stream to an Iterable.


I usually do the reading routine straightforward:

void readResource(InputStream source) throws IOException {
    BufferedReader stream = null;
    try {
        stream = new BufferedReader(new InputStreamReader(source));
        while (true) {
            String line = stream.readLine();
            if(line == null) {
                break;
            }
            //process line
            System.out.println(line)
        }
    } finally {
        closeQuiet(stream);
    }
}

static void closeQuiet(Closeable closeable) {
    if (closeable != null) {
        try {
            closeable.close();
        } catch (IOException ignore) {
        }
    }
}

What you can do is scan the entire text using Scanner and go through the text line by line. Of course you should import the following:

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public static void readText throws FileNotFoundException {
    Scanner scan = new Scanner(new File("samplefilename.txt"));
    while(scan.hasNextLine()){
        String line = scan.nextLine();
        //Here you can manipulate the string the way you want
    }
}

Scanner basically scans all the text. The while loop is used to traverse through the entire text.

The .hasNextLine() function is a boolean that returns true if there are still more lines in the text. The .nextLine() function gives you an entire line as a String which you can then use the way you want. Try System.out.println(line) to print the text.

Side Note: .txt is the file type text.


The clear way to achieve this,

For example:

If you have dataFile.txt on your current directory

import java.io.*;
import java.util.Scanner;
import java.io.FileNotFoundException;

public class readByLine
{
    public readByLine() throws FileNotFoundException
    {
        Scanner linReader = new Scanner(new File("dataFile.txt"));

        while (linReader.hasNext())
        {
            String line = linReader.nextLine();
            System.out.println(line);
        }
        linReader.close();

    }

    public static void main(String args[])  throws FileNotFoundException
    {
        new readByLine();
    }
}

The output like as below, enter image description here


BufferedReader br;
FileInputStream fin;
try {
    fin = new FileInputStream(fileName);
    br = new BufferedReader(new InputStreamReader(fin));

    /*Path pathToFile = Paths.get(fileName);
    br = Files.newBufferedReader(pathToFile,StandardCharsets.US_ASCII);*/

    String line = br.readLine();
    while (line != null) {
        String[] attributes = line.split(",");
        Movie movie = createMovie(attributes);
        movies.add(movie);
        line = br.readLine();
    }
    fin.close();
    br.close();
} catch (FileNotFoundException e) {
    System.out.println("Your Message");
} catch (IOException e) {
    System.out.println("Your Message");
}

It works for me. Hope It will help you too.


You can also use Apache Commons IO:

File file = new File("/home/user/file.txt");
try {
    List<String> lines = FileUtils.readLines(file);
} catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

You can use Scanner class

Scanner sc=new Scanner(file);
sc.nextLine();

Once Java 8 is out (March 2014) you'll be able to use streams:

try (Stream<String> lines = Files.lines(Paths.get(filename), Charset.defaultCharset())) {
  lines.forEachOrdered(line -> process(line));
}

Printing all the lines in the file:

try (Stream<String> lines = Files.lines(file, Charset.defaultCharset())) {
  lines.forEachOrdered(System.out::println);
}

You need to use the readLine() method in class BufferedReader. Create a new object from that class and operate this method on him and save it to a string.

BufferReader Javadoc


You can use this code:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;

public class ReadTextFile {

    public static void main(String[] args) throws IOException {

        try {

            File f = new File("src/com/data.txt");

            BufferedReader b = new BufferedReader(new FileReader(f));

            String readLine = "";

            System.out.println("Reading file using Buffered Reader");

            while ((readLine = b.readLine()) != null) {
                System.out.println(readLine);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

}

In Java 7:

String folderPath = "C:/folderOfMyFile";
Path path = Paths.get(folderPath, "myFileName.csv"); //or any text file eg.: txt, bat, etc
Charset charset = Charset.forName("UTF-8");

try (BufferedReader reader = Files.newBufferedReader(path , charset)) {
  while ((line = reader.readLine()) != null ) {
    //separate all csv fields into string array
    String[] lineVariables = line.split(","); 
  }
} catch (IOException e) {
    System.err.println(e);
}

By using the org.apache.commons.io package, it gave more performance, especially in legacy code which uses Java 6 and below.

Java 7 has a better API with fewer exceptions handling and more useful methods:

LineIterator lineIterator = null;
try {
    lineIterator = FileUtils.lineIterator(new File("/home/username/m.log"), "windows-1256"); // The second parameter is optionnal
    while (lineIterator.hasNext()) {
        String currentLine = lineIterator.next();
        // Some operation
    }
}
finally {
    LineIterator.closeQuietly(lineIterator);
}

Maven

<!-- https://mvnrepository.com/artifact/commons-io/commons-io -->
<dependency>
    <groupId>commons-io</groupId>
    <artifactId>commons-io</artifactId>
    <version>2.6</version>
</dependency>

Look at this blog:

The buffer size may be specified, or the default size may be used. The default is large enough for most purposes.

// Open the file
FileInputStream fstream = new FileInputStream("textfile.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));

String strLine;

//Read File Line By Line
while ((strLine = br.readLine()) != null)   {
  // Print the content on the console
  System.out.println (strLine);
}

//Close the input stream
fstream.close();

You can read file data line by line as below:

String fileLoc = "fileLocationInTheDisk";

List<String> lines = Files.lines(Path.of(fileLoc), StandardCharsets.UTF_8).collect(Collectors.toList());

A common pattern is to use

try (BufferedReader br = new BufferedReader(new FileReader(file))) {
    String line;
    while ((line = br.readLine()) != null) {
       // process the line.
    }
}

You can read the data faster if you assume there is no character encoding. e.g. ASCII-7 but it won't make much difference. It is highly likely that what you do with the data will take much longer.

EDIT: A less common pattern to use which avoids the scope of line leaking.

try(BufferedReader br = new BufferedReader(new FileReader(file))) {
    for(String line; (line = br.readLine()) != null; ) {
        // process the line.
    }
    // line is not visible here.
}

UPDATE: In Java 8 you can do

try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
        stream.forEach(System.out::println);
}

NOTE: You have to place the Stream in a try-with-resource block to ensure the #close method is called on it, otherwise the underlying file handle is never closed until GC does it much later.


Here is a sample with full error handling and supporting charset specification for pre-Java 7. With Java 7 you can use try-with-resources syntax, which makes the code cleaner.

If you just want the default charset you can skip the InputStream and use FileReader.

InputStream ins = null; // raw byte-stream
Reader r = null; // cooked reader
BufferedReader br = null; // buffered for readLine()
try {
    String s;
    ins = new FileInputStream("textfile.txt");
    r = new InputStreamReader(ins, "UTF-8"); // leave charset out for default
    br = new BufferedReader(r);
    while ((s = br.readLine()) != null) {
        System.out.println(s);
    }
}
catch (Exception e)
{
    System.err.println(e.getMessage()); // handle exception
}
finally {
    if (br != null) { try { br.close(); } catch(Throwable t) { /* ensure close happens */ } }
    if (r != null) { try { r.close(); } catch(Throwable t) { /* ensure close happens */ } }
    if (ins != null) { try { ins.close(); } catch(Throwable t) { /* ensure close happens */ } }
}

Here is the Groovy version, with full error handling:

File f = new File("textfile.txt");
f.withReader("UTF-8") { br ->
    br.eachLine { line ->
        println line;
    }
}

Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to performance

Why is 2 * (i * i) faster than 2 * i * i in Java? What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check if a key exists in Json Object and get its value Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? Most efficient way to map function over numpy array The most efficient way to remove first N elements in a list? Fastest way to get the first n elements of a List into an Array Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3? pandas loc vs. iloc vs. at vs. iat? Android Recyclerview vs ListView with Viewholder

Examples related to file-io

Python, Pandas : write content of DataFrame into text File Saving response from Requests to file How to while loop until the end of a file in Python without checking for empty line? Getting "java.nio.file.AccessDeniedException" when trying to write to a folder How do I add a resources folder to my Java project in Eclipse Read and write a String from text file Python Pandas: How to read only first n rows of CSV files in? Open files in 'rt' and 'wt' modes How to write to a file without overwriting current contents? Write objects into file with Node.js

Examples related to io

Reading file using relative path in python project How to write to a CSV line by line? Getting "java.nio.file.AccessDeniedException" when trying to write to a folder Exception: Unexpected end of ZLIB input stream How to get File Created Date and Modified Date Printing Mongo query output to a file while in the mongo shell Load data from txt with pandas Writing File to Temp Folder How to get resources directory path programmatically ValueError : I/O operation on closed file

Examples related to garbage-collection

When to create variables (memory management) Difference between Xms and Xmx and XX:MaxPermSize Java GC (Allocation Failure) How to handle :java.util.concurrent.TimeoutException: android.os.BinderProxy.finalize() timed out after 10 seconds errors? Implementing IDisposable correctly How to check heap usage of a running JVM from the command line? how to destroy an object in java? How to force deletion of a python object? How can I read a large text file line by line using Java? GC overhead limit exceeded