[node.js] Read file from aws s3 bucket using node fs

I am attempting to read a file that is in a aws s3 bucket using

fs.readFile(file, function (err, contents) {
  var myLines = contents.Body.toString().split('\n')
})

I've been able to download and upload a file using the node aws-sdk, but I am at a loss as to how to simply read it and parse the contents.

Here is an example of how I am reading the file from s3:

var s3 = new AWS.S3();
var params = {Bucket: 'myBucket', Key: 'myKey.csv'}
var s3file = s3.getObject(params)

This question is related to node.js amazon-web-services amazon-s3 fs

The answer is


var fileStream = fs.createWriteStream('/path/to/file.jpg');
var s3Stream = s3.getObject({Bucket: 'myBucket', Key: 'myImageFile.jpg'}).createReadStream();

// Listen for errors returned by the service
s3Stream.on('error', function(err) {
    // NoSuchKey: The specified key does not exist
    console.error(err);
});

s3Stream.pipe(fileStream).on('error', function(err) {
    // capture any errors that occur when writing data to the file
    console.error('File Stream:', err);
}).on('close', function() {
    console.log('Done.');
});

Reference: https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/requests-using-stream-objects.html


Since you seem to want to process an S3 text file line-by-line. Here is a Node version that uses the standard readline module and AWS' createReadStream()

const readline = require('readline');

const rl = readline.createInterface({
    input: s3.getObject(params).createReadStream()
});

rl.on('line', function(line) {
    console.log(line);
})
.on('close', function() {
});

This will do it:

new AWS.S3().getObject({ Bucket: this.awsBucketName, Key: keyName }, function(err, data)
{
    if (!err)
        console.log(data.Body.toString());
});

I had exactly the same issue when downloading from S3 very large files.

The example solution from AWS docs just does not work:

var file = fs.createWriteStream(options.filePath);
        file.on('close', function(){
            if(self.logger) self.logger.info("S3Dataset file download saved to %s", options.filePath );
            return callback(null,done);
        });
        s3.getObject({ Key:  documentKey }).createReadStream().on('error', function(err) {
            if(self.logger) self.logger.error("S3Dataset download error key:%s error:%@", options.fileName, error);
            return callback(error);
        }).pipe(file);

While this solution will work:

    var file = fs.createWriteStream(options.filePath);
    s3.getObject({ Bucket: this._options.s3.Bucket, Key: documentKey })
    .on('error', function(err) {
        if(self.logger) self.logger.error("S3Dataset download error key:%s error:%@", options.fileName, error);
        return callback(error);
    })
    .on('httpData', function(chunk) { file.write(chunk); })
    .on('httpDone', function() { 
        file.end(); 
        if(self.logger) self.logger.info("S3Dataset file download saved to %s", options.filePath );
        return callback(null,done);
    })
    .send();

The createReadStream attempt just does not fire the end, close or error callback for some reason. See here about this.

I'm using that solution also for writing down archives to gzip, since the first one (AWS example) does not work in this case either:

        var gunzip = zlib.createGunzip();
        var file = fs.createWriteStream( options.filePath );

        s3.getObject({ Bucket: this._options.s3.Bucket, Key: documentKey })
        .on('error', function (error) {
            if(self.logger) self.logger.error("%@",error);
            return callback(error);
        })
        .on('httpData', function (chunk) {
            file.write(chunk);
        })
        .on('httpDone', function () {

            file.end();

            if(self.logger) self.logger.info("downloadArchive downloaded %s", options.filePath);

            fs.createReadStream( options.filePath )
            .on('error', (error) => {
                return callback(error);
            })
            .on('end', () => {
                if(self.logger) self.logger.info("downloadArchive unarchived %s", options.fileDest);
                return callback(null, options.fileDest);
            })
            .pipe(gunzip)
            .pipe(fs.createWriteStream(options.fileDest))
        })
        .send();

I prefer Buffer.from(data.Body).toString('utf8'). It supports encoding parameters. With other AWS services (ex. Kinesis Streams) someone may want to replace 'utf8' encoding with 'base64'.

new AWS.S3().getObject(
  { Bucket: this.awsBucketName, Key: keyName }, 
  function(err, data) {
    if (!err) {
      const body = Buffer.from(data.Body).toString('utf8');
      console.log(body);
    }
  }
);

I couldn't figure why yet, but the createReadStream/pipe approach didn't work for me. I was trying to download a large CSV file (300MB+) and I got duplicated lines. It seemed a random issue. The final file size varied in each attempt to download it.

I ended up using another way, based on AWS JS SDK examples:

var s3 = new AWS.S3();
var params = {Bucket: 'myBucket', Key: 'myImageFile.jpg'};
var file = require('fs').createWriteStream('/path/to/file.jpg');

s3.getObject(params).
    on('httpData', function(chunk) { file.write(chunk); }).
    on('httpDone', function() { file.end(); }).
    send();

This way, it worked like a charm.


If you are looking to avoid the callbacks you can take advantage of the sdk .promise() function like this:

const s3 = new AWS.S3();
const params = {Bucket: 'myBucket', Key: 'myKey.csv'}
const response = await s3.getObject(params).promise() // await the promise
const fileContent = getObjectResult.Body.toString('utf-8'); // can also do 'base64' here if desired

I'm sure the other ways mentioned here have their advantages but this works great for me. Sourced from this thread (see the last response from AWS): https://forums.aws.amazon.com/thread.jspa?threadID=116788


here is the example which i used to retrive and parse json data from s3.

    var params = {Bucket: BUCKET_NAME, Key: KEY_NAME};
    new AWS.S3().getObject(params, function(err, json_data)
    {
      if (!err) {
        var json = JSON.parse(new Buffer(json_data.Body).toString("utf8"));

       // PROCESS JSON DATA
           ......
     }
   });

With the new version of sdk, the accepted answer does not work - it does not wait for the object to be downloaded. The following code snippet will help with the new version:

// dependencies

const AWS = require('aws-sdk');

// get reference to S3 client

const s3 = new AWS.S3();

exports.handler = async (event, context, callback) => {

var bucket = "TestBucket"

var key = "TestKey"

   try {

      const params = {
            Bucket: Bucket,
            Key: Key
        };

       var theObject = await s3.getObject(params).promise();

    } catch (error) {
        console.log(error);
        return;
    }  
}

If you want to save memory and want to obtain each row as a json object, then you can use fast-csv to create readstream and can read each row as a json object as follows:

const csv = require('fast-csv');
const AWS = require('aws-sdk');

const credentials = new AWS.Credentials("ACCESSKEY", "SECRETEKEY", "SESSIONTOKEN");
AWS.config.update({
    credentials: credentials, // credentials required for local execution
    region: 'your_region'
});
const dynamoS3Bucket = new AWS.S3();
const stream = dynamoS3Bucket.getObject({ Bucket: 'your_bucket', Key: 'example.csv' }).createReadStream();

var parser = csv.fromStream(stream, { headers: true }).on("data", function (data) {
    parser.pause();  //can pause reading using this at a particular row
    parser.resume(); // to continue reading
    console.log(data);
}).on("end", function () {
    console.log('process finished');
});

Examples related to node.js

Hide Signs that Meteor.js was Used Querying date field in MongoDB with Mongoose SyntaxError: Cannot use import statement outside a module Server Discovery And Monitoring engine is deprecated How to fix ReferenceError: primordials is not defined in node UnhandledPromiseRejectionWarning: This error originated either by throwing inside of an async function without a catch block dyld: Library not loaded: /usr/local/opt/icu4c/lib/libicui18n.62.dylib error running php after installing node with brew on Mac internal/modules/cjs/loader.js:582 throw err DeprecationWarning: Buffer() is deprecated due to security and usability issues when I move my script to another server Please run `npm cache clean`

Examples related to amazon-web-services

How to specify credentials when connecting to boto3 S3? Is there a way to list all resources in AWS Access denied; you need (at least one of) the SUPER privilege(s) for this operation Job for mysqld.service failed See "systemctl status mysqld.service" What is difference between Lightsail and EC2? AWS S3 CLI - Could not connect to the endpoint URL boto3 client NoRegionError: You must specify a region error only sometimes How to write a file or data to an S3 object using boto3 Missing Authentication Token while accessing API Gateway? The AWS Access Key Id does not exist in our records

Examples related to amazon-s3

How to specify credentials when connecting to boto3 S3? AWS S3 CLI - Could not connect to the endpoint URL How to write a file or data to an S3 object using boto3 The AWS Access Key Id does not exist in our records AccessDenied for ListObjects for S3 bucket when permissions are s3:* Save Dataframe to csv directly to s3 Python Listing files in a specific "folder" of a AWS S3 bucket How to get response from S3 getObject in Node.js? Getting Access Denied when calling the PutObject operation with bucket-level permission Read file content from S3 bucket with boto3

Examples related to fs

Writing JSON object to a JSON file with fs.writeFileSync Using filesystem in node.js with async / await Write / add data in JSON file using Node.js Node.js Write a line into a .txt file NodeJS accessing file with relative path How to create full path with node's fs.mkdirSync? Read file from aws s3 bucket using node fs How to refactor Node.js code that uses fs.readFileSync() into using fs.readFile()? nodejs get file name from absolute path? Node.js check if file exists