[java] Fastest way to write huge data in text file Java

I have to write huge data in text[csv] file. I used BufferedWriter to write the data and it took around 40 secs to write 174 mb of data. Is this the fastest speed java can offer?

bufferedWriter = new BufferedWriter ( new FileWriter ( "fileName.csv" ) );

Note: These 40 secs include the time of iterating and fetching the records from resultset as well. :) . 174 mb is for 400000 rows in resultset.

This question is related to java file resultset

The answer is


For those who want to improve the time for retrieval of records and dump into the file (i.e no processing on records), instead of putting them into an ArrayList, append those records into a StringBuffer. Apply toSring() function to get a single String and write it into the file at once.

For me, the retrieval time reduced from 22 seconds to 17 seconds.


Only for the sake of statistics:

The machine is old Dell with new SSD

CPU: Intel Pentium D 2,8 Ghz

SSD: Patriot Inferno 120GB SSD

4000000 'records'
175.47607421875 MB

Iteration 0
Writing raw... 3.547 seconds
Writing buffered (buffer size: 8192)... 2.625 seconds
Writing buffered (buffer size: 1048576)... 2.203 seconds
Writing buffered (buffer size: 4194304)... 2.312 seconds

Iteration 1
Writing raw... 2.922 seconds
Writing buffered (buffer size: 8192)... 2.406 seconds
Writing buffered (buffer size: 1048576)... 2.015 seconds
Writing buffered (buffer size: 4194304)... 2.282 seconds

Iteration 2
Writing raw... 2.828 seconds
Writing buffered (buffer size: 8192)... 2.109 seconds
Writing buffered (buffer size: 1048576)... 2.078 seconds
Writing buffered (buffer size: 4194304)... 2.015 seconds

Iteration 3
Writing raw... 3.187 seconds
Writing buffered (buffer size: 8192)... 2.109 seconds
Writing buffered (buffer size: 1048576)... 2.094 seconds
Writing buffered (buffer size: 4194304)... 2.031 seconds

Iteration 4
Writing raw... 3.093 seconds
Writing buffered (buffer size: 8192)... 2.141 seconds
Writing buffered (buffer size: 1048576)... 2.063 seconds
Writing buffered (buffer size: 4194304)... 2.016 seconds

As we can see the raw method is slower the buffered.


try memory mapped files (takes 300 m/s to write 174MB in my m/c, core 2 duo, 2.5GB RAM) :

byte[] buffer = "Help I am trapped in a fortune cookie factory\n".getBytes();
int number_of_lines = 400000;

FileChannel rwChannel = new RandomAccessFile("textfile.txt", "rw").getChannel();
ByteBuffer wrBuf = rwChannel.map(FileChannel.MapMode.READ_WRITE, 0, buffer.length * number_of_lines);
for (int i = 0; i < number_of_lines; i++)
{
    wrBuf.put(buffer);
}
rwChannel.close();

For these bulky reads from DB you may want to tune your Statement's fetch size. It might save a lot of roundtrips to DB.

http://download.oracle.com/javase/1.5.0/docs/api/java/sql/Statement.html#setFetchSize%28int%29


_x000D_
_x000D_
package all.is.well;_x000D_
import java.io.IOException;_x000D_
import java.io.RandomAccessFile;_x000D_
import java.util.concurrent.ExecutorService;_x000D_
import java.util.concurrent.Executors;_x000D_
import junit.framework.TestCase;_x000D_
_x000D_
/**_x000D_
 * @author Naresh Bhabat_x000D_
 * _x000D_
Following  implementation helps to deal with extra large files in java._x000D_
This program is tested for dealing with 2GB input file._x000D_
There are some points where extra logic can be added in future._x000D_
_x000D_
_x000D_
Pleasenote: if we want to deal with binary input file, then instead of reading line,we need to read bytes from read file object._x000D_
_x000D_
_x000D_
_x000D_
It uses random access file,which is almost like streaming API._x000D_
_x000D_
_x000D_
 * ****************************************_x000D_
Notes regarding executor framework and its readings._x000D_
Please note :ExecutorService executor = Executors.newFixedThreadPool(10);_x000D_
_x000D_
 *      for 10 threads:Total time required for reading and writing the text in_x000D_
 *         :seconds 349.317_x000D_
 * _x000D_
 *         For 100:Total time required for reading the text and writing   : seconds 464.042_x000D_
 * _x000D_
 *         For 1000 : Total time required for reading and writing text :466.538 _x000D_
 *         For 10000  Total time required for reading and writing in seconds 479.701_x000D_
 *_x000D_
 * _x000D_
 */_x000D_
public class DealWithHugeRecordsinFile extends TestCase {_x000D_
_x000D_
 static final String FILEPATH = "C:\\springbatch\\bigfile1.txt.txt";_x000D_
 static final String FILEPATH_WRITE = "C:\\springbatch\\writinghere.txt";_x000D_
 static volatile RandomAccessFile fileToWrite;_x000D_
 static volatile RandomAccessFile file;_x000D_
 static volatile String fileContentsIter;_x000D_
 static volatile int position = 0;_x000D_
_x000D_
 public static void main(String[] args) throws IOException, InterruptedException {_x000D_
  long currentTimeMillis = System.currentTimeMillis();_x000D_
_x000D_
  try {_x000D_
   fileToWrite = new RandomAccessFile(FILEPATH_WRITE, "rw");//for random write,independent of thread obstacles _x000D_
   file = new RandomAccessFile(FILEPATH, "r");//for random read,independent of thread obstacles _x000D_
   seriouslyReadProcessAndWriteAsynch();_x000D_
_x000D_
  } catch (IOException e) {_x000D_
   // TODO Auto-generated catch block_x000D_
   e.printStackTrace();_x000D_
  }_x000D_
  Thread currentThread = Thread.currentThread();_x000D_
  System.out.println(currentThread.getName());_x000D_
  long currentTimeMillis2 = System.currentTimeMillis();_x000D_
  double time_seconds = (currentTimeMillis2 - currentTimeMillis) / 1000.0;_x000D_
  System.out.println("Total time required for reading the text in seconds " + time_seconds);_x000D_
_x000D_
 }_x000D_
_x000D_
 /**_x000D_
  * @throws IOException_x000D_
  * Something  asynchronously serious_x000D_
  */_x000D_
 public static void seriouslyReadProcessAndWriteAsynch() throws IOException {_x000D_
  ExecutorService executor = Executors.newFixedThreadPool(10);//pls see for explanation in comments section of the class_x000D_
  while (true) {_x000D_
   String readLine = file.readLine();_x000D_
   if (readLine == null) {_x000D_
    break;_x000D_
   }_x000D_
   Runnable genuineWorker = new Runnable() {_x000D_
    @Override_x000D_
    public void run() {_x000D_
     // do hard processing here in this thread,i have consumed_x000D_
     // some time and eat some exception in write method._x000D_
     writeToFile(FILEPATH_WRITE, readLine);_x000D_
     // System.out.println(" :" +_x000D_
     // Thread.currentThread().getName());_x000D_
_x000D_
    }_x000D_
   };_x000D_
   executor.execute(genuineWorker);_x000D_
  }_x000D_
  executor.shutdown();_x000D_
  while (!executor.isTerminated()) {_x000D_
  }_x000D_
  System.out.println("Finished all threads");_x000D_
  file.close();_x000D_
  fileToWrite.close();_x000D_
 }_x000D_
_x000D_
 /**_x000D_
  * @param filePath_x000D_
  * @param data_x000D_
  * @param position_x000D_
  */_x000D_
 private static void writeToFile(String filePath, String data) {_x000D_
  try {_x000D_
   // fileToWrite.seek(position);_x000D_
   data = "\n" + data;_x000D_
   if (!data.contains("Randomization")) {_x000D_
    return;_x000D_
   }_x000D_
   System.out.println("Let us do something time consuming to make this thread busy"+(position++) + "   :" + data);_x000D_
   System.out.println("Lets consume through this loop");_x000D_
   int i=1000;_x000D_
   while(i>0){_x000D_
   _x000D_
    i--;_x000D_
   }_x000D_
   fileToWrite.write(data.getBytes());_x000D_
   throw new Exception();_x000D_
  } catch (Exception exception) {_x000D_
   System.out.println("exception was thrown but still we are able to proceeed further"_x000D_
     + " \n This can be used for marking failure of the records");_x000D_
   //exception.printStackTrace();_x000D_
_x000D_
  }_x000D_
_x000D_
 }_x000D_
}
_x000D_
_x000D_
_x000D_


Your transfer speed is likely not to be limited by Java. Instead I would suspect (in no particular order)

  1. the speed of transfer from the database
  2. the speed of transfer to the disk

If you read the complete dataset and then write it out to disk, then that will take longer, since the JVM will have to allocate memory, and the db rea/disk write will happen sequentially. Instead I would write out to the buffered writer for every read that you make from the db, and so the operation will be closer to a concurrent one (I don't know if you're doing that or not)


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to file

Gradle - Move a folder from ABC to XYZ Difference between opening a file in binary vs text Angular: How to download a file from HttpClient? Python error message io.UnsupportedOperation: not readable java.io.FileNotFoundException: class path resource cannot be opened because it does not exist Writing JSON object to a JSON file with fs.writeFileSync How to read/write files in .Net Core? How to write to a CSV line by line? Writing a dictionary to a text file? What are the pros and cons of parquet format compared to other formats?

Examples related to resultset

Mapping a JDBC ResultSet to an object JDBC ResultSet: I need a getDateTime, but there is only getDate and getTimeStamp Python, how to check if a result set is empty? Iterating over ResultSet and adding its value in an ArrayList Get Number of Rows returned by ResultSet in Java What does "if (rs.next())" mean? How to get row count using ResultSet in Java? Trying to get the average of a count resultset Efficient way to Handle ResultSet in Java Most efficient conversion of ResultSet to JSON?