I have to write huge data in text[csv] file. I used BufferedWriter to write the data and it took around 40 secs to write 174 mb of data. Is this the fastest speed java can offer?
bufferedWriter = new BufferedWriter ( new FileWriter ( "fileName.csv" ) );
Note: These 40 secs include the time of iterating and fetching the records from resultset as well. :) . 174 mb is for 400000 rows in resultset.
For those who want to improve the time for retrieval of records and dump into the file (i.e no processing on records), instead of putting them into an ArrayList, append those records into a StringBuffer. Apply toSring() function to get a single String and write it into the file at once.
For me, the retrieval time reduced from 22 seconds to 17 seconds.
Only for the sake of statistics:
The machine is old Dell with new SSD
CPU: Intel Pentium D 2,8 Ghz
SSD: Patriot Inferno 120GB SSD
4000000 'records'
175.47607421875 MB
Iteration 0
Writing raw... 3.547 seconds
Writing buffered (buffer size: 8192)... 2.625 seconds
Writing buffered (buffer size: 1048576)... 2.203 seconds
Writing buffered (buffer size: 4194304)... 2.312 seconds
Iteration 1
Writing raw... 2.922 seconds
Writing buffered (buffer size: 8192)... 2.406 seconds
Writing buffered (buffer size: 1048576)... 2.015 seconds
Writing buffered (buffer size: 4194304)... 2.282 seconds
Iteration 2
Writing raw... 2.828 seconds
Writing buffered (buffer size: 8192)... 2.109 seconds
Writing buffered (buffer size: 1048576)... 2.078 seconds
Writing buffered (buffer size: 4194304)... 2.015 seconds
Iteration 3
Writing raw... 3.187 seconds
Writing buffered (buffer size: 8192)... 2.109 seconds
Writing buffered (buffer size: 1048576)... 2.094 seconds
Writing buffered (buffer size: 4194304)... 2.031 seconds
Iteration 4
Writing raw... 3.093 seconds
Writing buffered (buffer size: 8192)... 2.141 seconds
Writing buffered (buffer size: 1048576)... 2.063 seconds
Writing buffered (buffer size: 4194304)... 2.016 seconds
As we can see the raw method is slower the buffered.
try memory mapped files (takes 300 m/s to write 174MB in my m/c, core 2 duo, 2.5GB RAM) :
byte[] buffer = "Help I am trapped in a fortune cookie factory\n".getBytes();
int number_of_lines = 400000;
FileChannel rwChannel = new RandomAccessFile("textfile.txt", "rw").getChannel();
ByteBuffer wrBuf = rwChannel.map(FileChannel.MapMode.READ_WRITE, 0, buffer.length * number_of_lines);
for (int i = 0; i < number_of_lines; i++)
{
wrBuf.put(buffer);
}
rwChannel.close();
For these bulky reads from DB you may want to tune your Statement's fetch size. It might save a lot of roundtrips to DB.
http://download.oracle.com/javase/1.5.0/docs/api/java/sql/Statement.html#setFetchSize%28int%29
package all.is.well;_x000D_
import java.io.IOException;_x000D_
import java.io.RandomAccessFile;_x000D_
import java.util.concurrent.ExecutorService;_x000D_
import java.util.concurrent.Executors;_x000D_
import junit.framework.TestCase;_x000D_
_x000D_
/**_x000D_
* @author Naresh Bhabat_x000D_
* _x000D_
Following implementation helps to deal with extra large files in java._x000D_
This program is tested for dealing with 2GB input file._x000D_
There are some points where extra logic can be added in future._x000D_
_x000D_
_x000D_
Pleasenote: if we want to deal with binary input file, then instead of reading line,we need to read bytes from read file object._x000D_
_x000D_
_x000D_
_x000D_
It uses random access file,which is almost like streaming API._x000D_
_x000D_
_x000D_
* ****************************************_x000D_
Notes regarding executor framework and its readings._x000D_
Please note :ExecutorService executor = Executors.newFixedThreadPool(10);_x000D_
_x000D_
* for 10 threads:Total time required for reading and writing the text in_x000D_
* :seconds 349.317_x000D_
* _x000D_
* For 100:Total time required for reading the text and writing : seconds 464.042_x000D_
* _x000D_
* For 1000 : Total time required for reading and writing text :466.538 _x000D_
* For 10000 Total time required for reading and writing in seconds 479.701_x000D_
*_x000D_
* _x000D_
*/_x000D_
public class DealWithHugeRecordsinFile extends TestCase {_x000D_
_x000D_
static final String FILEPATH = "C:\\springbatch\\bigfile1.txt.txt";_x000D_
static final String FILEPATH_WRITE = "C:\\springbatch\\writinghere.txt";_x000D_
static volatile RandomAccessFile fileToWrite;_x000D_
static volatile RandomAccessFile file;_x000D_
static volatile String fileContentsIter;_x000D_
static volatile int position = 0;_x000D_
_x000D_
public static void main(String[] args) throws IOException, InterruptedException {_x000D_
long currentTimeMillis = System.currentTimeMillis();_x000D_
_x000D_
try {_x000D_
fileToWrite = new RandomAccessFile(FILEPATH_WRITE, "rw");//for random write,independent of thread obstacles _x000D_
file = new RandomAccessFile(FILEPATH, "r");//for random read,independent of thread obstacles _x000D_
seriouslyReadProcessAndWriteAsynch();_x000D_
_x000D_
} catch (IOException e) {_x000D_
// TODO Auto-generated catch block_x000D_
e.printStackTrace();_x000D_
}_x000D_
Thread currentThread = Thread.currentThread();_x000D_
System.out.println(currentThread.getName());_x000D_
long currentTimeMillis2 = System.currentTimeMillis();_x000D_
double time_seconds = (currentTimeMillis2 - currentTimeMillis) / 1000.0;_x000D_
System.out.println("Total time required for reading the text in seconds " + time_seconds);_x000D_
_x000D_
}_x000D_
_x000D_
/**_x000D_
* @throws IOException_x000D_
* Something asynchronously serious_x000D_
*/_x000D_
public static void seriouslyReadProcessAndWriteAsynch() throws IOException {_x000D_
ExecutorService executor = Executors.newFixedThreadPool(10);//pls see for explanation in comments section of the class_x000D_
while (true) {_x000D_
String readLine = file.readLine();_x000D_
if (readLine == null) {_x000D_
break;_x000D_
}_x000D_
Runnable genuineWorker = new Runnable() {_x000D_
@Override_x000D_
public void run() {_x000D_
// do hard processing here in this thread,i have consumed_x000D_
// some time and eat some exception in write method._x000D_
writeToFile(FILEPATH_WRITE, readLine);_x000D_
// System.out.println(" :" +_x000D_
// Thread.currentThread().getName());_x000D_
_x000D_
}_x000D_
};_x000D_
executor.execute(genuineWorker);_x000D_
}_x000D_
executor.shutdown();_x000D_
while (!executor.isTerminated()) {_x000D_
}_x000D_
System.out.println("Finished all threads");_x000D_
file.close();_x000D_
fileToWrite.close();_x000D_
}_x000D_
_x000D_
/**_x000D_
* @param filePath_x000D_
* @param data_x000D_
* @param position_x000D_
*/_x000D_
private static void writeToFile(String filePath, String data) {_x000D_
try {_x000D_
// fileToWrite.seek(position);_x000D_
data = "\n" + data;_x000D_
if (!data.contains("Randomization")) {_x000D_
return;_x000D_
}_x000D_
System.out.println("Let us do something time consuming to make this thread busy"+(position++) + " :" + data);_x000D_
System.out.println("Lets consume through this loop");_x000D_
int i=1000;_x000D_
while(i>0){_x000D_
_x000D_
i--;_x000D_
}_x000D_
fileToWrite.write(data.getBytes());_x000D_
throw new Exception();_x000D_
} catch (Exception exception) {_x000D_
System.out.println("exception was thrown but still we are able to proceeed further"_x000D_
+ " \n This can be used for marking failure of the records");_x000D_
//exception.printStackTrace();_x000D_
_x000D_
}_x000D_
_x000D_
}_x000D_
}
_x000D_
Your transfer speed is likely not to be limited by Java. Instead I would suspect (in no particular order)
If you read the complete dataset and then write it out to disk, then that will take longer, since the JVM will have to allocate memory, and the db rea/disk write will happen sequentially. Instead I would write out to the buffered writer for every read that you make from the db, and so the operation will be closer to a concurrent one (I don't know if you're doing that or not)
Source: Stackoverflow.com