[javascript] Parse large JSON file in Nodejs

I have a file which stores many JavaScript objects in JSON form and I need to read the file, create each of the objects, and do something with them (insert them into a db in my case). The JavaScript objects can be represented a format:

Format A:

[{name: 'thing1'},
....
{name: 'thing999999999'}]

or Format B:

{name: 'thing1'}         // <== My choice.
...
{name: 'thing999999999'}

Note that the ... indicates a lot of JSON objects. I am aware I could read the entire file into memory and then use JSON.parse() like this:

fs.readFile(filePath, 'utf-8', function (err, fileContents) {
  if (err) throw err;
  console.log(JSON.parse(fileContents));
});

However, the file could be really large, I would prefer to use a stream to accomplish this. The problem I see with a stream is that the file contents could be broken into data chunks at any point, so how can I use JSON.parse() on such objects?

Ideally, each object would be read as a separate data chunk, but I am not sure on how to do that.

var importStream = fs.createReadStream(filePath, {flags: 'r', encoding: 'utf-8'});
importStream.on('data', function(chunk) {

    var pleaseBeAJSObject = JSON.parse(chunk);           
    // insert pleaseBeAJSObject in a database
});
importStream.on('end', function(item) {
   console.log("Woot, imported objects into the database!");
});*/

Note, I wish to prevent reading the entire file into memory. Time efficiency does not matter to me. Yes, I could try to read a number of objects at once and insert them all at once, but that's a performance tweak - I need a way that is guaranteed not to cause a memory overload, not matter how many objects are contained in the file.

I can choose to use FormatA or FormatB or maybe something else, just please specify in your answer. Thanks!

This question is related to javascript json file node.js

The answer is


I solved this problem using the split npm module. Pipe your stream into split, and it will "Break up a stream and reassemble it so that each line is a chunk".

Sample code:

var fs = require('fs')
  , split = require('split')
  ;

var stream = fs.createReadStream(filePath, {flags: 'r', encoding: 'utf-8'});
var lineStream = stream.pipe(split());
linestream.on('data', function(chunk) {
    var json = JSON.parse(chunk);           
    // ...
});

I think you need to use a database. MongoDB is a good choice in this case because it is JSON compatible.

UPDATE: You can use mongoimport tool to import JSON data into MongoDB.

mongoimport --collection collection --file collection.json

As of October 2014, you can just do something like the following (using JSONStream) - https://www.npmjs.org/package/JSONStream

var fs = require('fs'),
    JSONStream = require('JSONStream'),

var getStream() = function () {
    var jsonData = 'myData.json',
        stream = fs.createReadStream(jsonData, { encoding: 'utf8' }),
        parser = JSONStream.parse('*');
    return stream.pipe(parser);
}

getStream().pipe(MyTransformToDoWhateverProcessingAsNeeded).on('error', function (err) {
    // handle any errors
});

To demonstrate with a working example:

npm install JSONStream event-stream

data.json:

{
  "greeting": "hello world"
}

hello.js:

var fs = require('fs'),
    JSONStream = require('JSONStream'),
    es = require('event-stream');

var getStream = function () {
    var jsonData = 'data.json',
        stream = fs.createReadStream(jsonData, { encoding: 'utf8' }),
        parser = JSONStream.parse('*');
    return stream.pipe(parser);
};

getStream()
    .pipe(es.mapSync(function (data) {
        console.log(data);
    }));
$ node hello.js
// hello world

I had similar requirement, i need to read a large json file in node js and process data in chunks and call a api and save in mongodb. inputFile.json is like:

{
 "customers":[
       { /*customer data*/},
       { /*customer data*/},
       { /*customer data*/}....
      ]
}

Now i used JsonStream and EventStream to achieve this synchronously.

var JSONStream = require("JSONStream");
var es = require("event-stream");

fileStream = fs.createReadStream(filePath, { encoding: "utf8" });
fileStream.pipe(JSONStream.parse("customers.*")).pipe(
  es.through(function(data) {
    console.log("printing one customer object read from file ::");
    console.log(data);
    this.pause();
    processOneCustomer(data, this);
    return data;
  }),
  function end() {
    console.log("stream reading ended");
    this.emit("end");
  }
);

function processOneCustomer(data, es) {
  DataModel.save(function(err, dataModel) {
    es.resume();
  });
}

I realize that you want to avoid reading the whole JSON file into memory if possible, however if you have the memory available it may not be a bad idea performance-wise. Using node.js's require() on a json file loads the data into memory really fast.

I ran two tests to see what the performance looked like on printing out an attribute from each feature from a 81MB geojson file.

In the 1st test, I read the entire geojson file into memory using var data = require('./geo.json'). That took 3330 milliseconds and then printing out an attribute from each feature took 804 milliseconds for a grand total of 4134 milliseconds. However, it appeared that node.js was using 411MB of memory.

In the second test, I used @arcseldon's answer with JSONStream + event-stream. I modified the JSONPath query to select only what I needed. This time the memory never went higher than 82MB, however, the whole thing now took 70 seconds to complete!


I wrote a module that can do this, called BFJ. Specifically, the method bfj.match can be used to break up a large stream into discrete chunks of JSON:

const bfj = require('bfj');
const fs = require('fs');

const stream = fs.createReadStream(filePath);

bfj.match(stream, (key, value, depth) => depth === 0, { ndjson: true })
  .on('data', object => {
    // do whatever you need to do with object
  })
  .on('dataError', error => {
    // a syntax error was found in the JSON
  })
  .on('error', error => {
    // some kind of operational error occurred
  })
  .on('end', error => {
    // finished processing the stream
  });

Here, bfj.match returns a readable, object-mode stream that will receive the parsed data items, and is passed 3 arguments:

  1. A readable stream containing the input JSON.

  2. A predicate that indicates which items from the parsed JSON will be pushed to the result stream.

  3. An options object indicating that the input is newline-delimited JSON (this is to process format B from the question, it's not required for format A).

Upon being called, bfj.match will parse JSON from the input stream depth-first, calling the predicate with each value to determine whether or not to push that item to the result stream. The predicate is passed three arguments:

  1. The property key or array index (this will be undefined for top-level items).

  2. The value itself.

  3. The depth of the item in the JSON structure (zero for top-level items).

Of course a more complex predicate can also be used as necessary according to requirements. You can also pass a string or a regular expression instead of a predicate function, if you want to perform simple matches against property keys.


If you have control over the input file, and it's an array of objects, you can solve this more easily. Arrange to output the file with each record on one line, like this:

[
   {"key": value},
   {"key": value},
   ...

This is still valid JSON.

Then, use the node.js readline module to process them one line at a time.

var fs = require("fs");

var lineReader = require('readline').createInterface({
    input: fs.createReadStream("input.txt")
});

lineReader.on('line', function (line) {
    line = line.trim();

    if (line.charAt(line.length-1) === ',') {
        line = line.substr(0, line.length-1);
    }

    if (line.charAt(0) === '{') {
        processRecord(JSON.parse(line));
    }
});

function processRecord(record) {
    // Process the records one at a time here! 
}

Just as I was thinking that it would be fun to write a streaming JSON parser, I also thought that maybe I should do a quick search to see if there's one already available.

Turns out there is.

Since I just found it, I've obviously not used it, so I can't comment on its quality, but I'll be interested to hear if it works.

It does work consider the following Javascript and _.isString:

stream.pipe(JSONStream.parse('*'))
  .on('data', (d) => {
    console.log(typeof d);
    console.log("isString: " + _.isString(d))
  });

This will log objects as they come in if the stream is an array of objects. Therefore the only thing being buffered is one object at a time.


Examples related to javascript

need to add a class to an element How to make a variable accessible outside a function? Hide Signs that Meteor.js was Used How to create a showdown.js markdown extension Please help me convert this script to a simple image slider Highlight Anchor Links when user manually scrolls? Summing radio input values How to execute an action before close metro app WinJS javascript, for loop defines a dynamic variable name Getting all files in directory with ajax

Examples related to json

Use NSInteger as array index Uncaught SyntaxError: Unexpected end of JSON input at JSON.parse (<anonymous>) HTTP POST with Json on Body - Flutter/Dart Importing json file in TypeScript json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 190) Angular 5 Service to read local .json file How to import JSON File into a TypeScript file? Use Async/Await with Axios in React.js Uncaught SyntaxError: Unexpected token u in JSON at position 0 how to remove json object key and value.?

Examples related to file

Gradle - Move a folder from ABC to XYZ Difference between opening a file in binary vs text Angular: How to download a file from HttpClient? Python error message io.UnsupportedOperation: not readable java.io.FileNotFoundException: class path resource cannot be opened because it does not exist Writing JSON object to a JSON file with fs.writeFileSync How to read/write files in .Net Core? How to write to a CSV line by line? Writing a dictionary to a text file? What are the pros and cons of parquet format compared to other formats?

Examples related to node.js

Hide Signs that Meteor.js was Used Querying date field in MongoDB with Mongoose SyntaxError: Cannot use import statement outside a module Server Discovery And Monitoring engine is deprecated How to fix ReferenceError: primordials is not defined in node UnhandledPromiseRejectionWarning: This error originated either by throwing inside of an async function without a catch block dyld: Library not loaded: /usr/local/opt/icu4c/lib/libicui18n.62.dylib error running php after installing node with brew on Mac internal/modules/cjs/loader.js:582 throw err DeprecationWarning: Buffer() is deprecated due to security and usability issues when I move my script to another server Please run `npm cache clean`