Two questions we must ask ourselves while doing such operations are:
Solutions like require('fs').readFileSync()
loads the whole file into memory. That means that the amount of memory required to perform operations will be almost equivalent to the file size. We should avoid these for anything larger than 50mbs
We can easily track the amount of memory used by a function by placing these lines of code after the function invocation :
const used = process.memoryUsage().heapUsed / 1024 / 1024;
console.log(
`The script uses approximately ${Math.round(used * 100) / 100} MB`
);
Right now the best way to read particular lines from a large file is using node's readline. The documentation has an amazing examples.
Although we don't need any third-party module to do it. But, If you are writing an enterprise code, you have to handle lots of edge cases. I had to write a very lightweight module called Apick File Storage to handle all those edge cases.
Apick File Storage module : https://www.npmjs.com/package/apickfs Documentation : https://github.com/apickjs/apickFS#readme
Example file: https://1drv.ms/t/s!AtkMCsWInsSZiGptXYAFjalXOpUx
Example : Install module
npm i apickfs
// import module
const apickFileStorage = require('apickfs');
//invoke readByLineNumbers() method
apickFileStorage
.readByLineNumbers(path.join(__dirname), 'big.txt', [163845])
.then(d => {
console.log(d);
})
.catch(e => {
console.log(e);
});
This method was successfully tested with up to 4 GB dense files.
big.text is a dense text file with 163,845 lines and is of 124 Mb. The script to read 10 different lines from this file uses approximately just 4.63 MB Memory only. And it parses valid JSON to Objects or Arrays for free. Awesome!!
We can read a single line of the file or hundreds of lines of the file with very little memory consumption.