[amazon-web-services] Downloading an entire S3 bucket?

I noticed that there doesn't seem to be an option to download an entire S3 bucket from the AWS Management Console.

Is there an easy way to grab everything in one of my buckets? I was thinking about making the root folder public, using wget to grab it all, and then making it private again but I don't know if there's an easier way.

This question is related to amazon-web-services amazon-s3 aws-cli

The answer is


As @layke said, it is the best practice to download the file from the S3 cli it is a safe and secure. But in some cases, people need to use wget to download the file and here is the solution

aws s3 presign s3://<your_bucket_name/>

This will presign will get you temporary public URL which you can use to download content from S3 using the presign_url, in your case using wget or any other download client.


  1. Windows User need to download S3EXPLORER from this link which also has installation instructions :- http://s3browser.com/download.aspx

  2. Then provide you AWS credentials like secretkey, accesskey and region to the s3explorer, this link contains configuration instruction for s3explorer:Copy Paste Link in brower: s3browser.com/s3browser-first-run.aspx

  3. Now your all s3 buckets would be visible on left panel of s3explorer.

  4. Simply select the bucket, and click on Buckets menu on top left corner, then select Download all files to option from the menu. Below is the screenshot for the same:

Bucket Selection Screen

  1. Then browse a folder to download the bucket at a particular place

  2. Click on OK and your download would begin.


If you use Firefox with S3Fox, that DOES let you select all files (shift-select first and last) and rightclick and download all... I've done it with 500+ files w/o problem


As Neel Bhaat has explained in this blog, there are many different tools that can be used for this purpose. Some are AWS provided, where most are third party tools. All these tools require you to save your AWS account key and secret in the tool itself. Be very cautious when using third party tools, as the credentials you save in might cost you, your entire worth and drop you dead.

Therefore, I always recommend using the AWS CLI for this purpose. You can simply install this from this link. Next, run the following command and save your key, secret values in AWS CLI.

aws configure

And use the following command to sync your AWS S3 Bucket to your local machine. (The local machine should have AWS CLI installed)

aws s3 sync <source> <destination>

Examples:

1) For AWS S3 to Local Storage

aws s3 sync <S3Uri> <LocalPath>

2) From Local Storage to AWS S3

aws s3 sync <LocalPath> <S3Uri>

3) From AWS s3 bucket to another bucket

aws s3 sync <S3Uri> <S3Uri> 

It's always better to use awscli for downloading / uploading files to s3. Sync will help you to resume without any hassle.

aws s3 sync s3://bucketname/ .

AWS CLI is the best option to download an entire S3 bucket locally.

  1. Install AWS CLI.

  2. Configure AWS CLI for using default security credentials and default AWS Region.

  3. To download the entire S3 bucket use command

    aws s3 sync s3://yourbucketname localpath

Reference to use AWS cli for different AWS services: https://docs.aws.amazon.com/cli/latest/reference/


You can do this with https://github.com/minio/mc :

mc cp -r https://s3-us-west-2.amazonaws.com/bucketName/ localdir

mc also supports sessions, resumable downloads, uploads and many more. mc supports Linux, OS X and Windows operating systems. Written in Golang and released under Apache Version 2.0.


Another option that could help some OS X users is Transmit.

It's an FTP program that also let you connect to your S3 files. And, it has an option to mount any FTP or S3 storage as a folder in the Finder, but it's only for a limited time.


AWS sdk API will only best option for upload entire folder and repo to s3 and download entire bucket of s3 to locally.

For uploading whole folder to s3

aws s3 sync . s3://BucketName

for download whole s3 bucket locally

aws s3 sync s3://BucketName . 

you can also assign path As like BucketName/Path for particular folder in s3 to download


You can use s3cmd to download your bucket:

s3cmd --configure
s3cmd sync s3://bucketnamehere/folder /destination/folder

There is another tool you can use called rclone. This is a code sample in the Rclone documentation:

rclone sync /home/local/directory remote:bucket

Try this command:

aws s3 sync yourBucketnameDirectory yourLocalDirectory

For example, if your bucket name is myBucket and local directory is c:\local, then:

aws s3 sync s3://myBucket c:\local

For more informations about awscli check this aws cli installation


You've many options to do that, but the best one is using the AWS CLI.

Here's a walk-through:

  1. Download and install AWS CLI in your machine:

  2. Configure AWS CLI:

    enter image description here

    Make sure you input valid access and secret keys, which you received when you created the account.

  3. Sync the S3 bucket using:

    aws s3 sync s3://yourbucket /local/path
    

    In the above command, replace the following fields:

    • yourbucket >> your S3 bucket that you want to download.
    • /local/path >> path in your local system where you want to download all the files.

You just need to pass --recursive & --include "*"

aws --region "${BUCKET_REGION}" s3 cp s3://${BUCKET}${BUCKET_PATH}/ ${LOCAL_PATH}/tmp --recursive --include "*" 2>&1


I've used a few different methods to copy Amazon S3 data to a local machine, including s3cmd, and by far the easiest is Cyberduck.

All you need to do is enter your Amazon credentials and use the simple interface to download, upload, sync any of your buckets, folders or files.

Screenshot


If you have only files there (no subdirectories) a quick solution is to select all the files (click on the first, Shift+click on the last) and hit Enter or right click and select Open. For most of the data files this will download them straight to your computer.


You may simple get it with s3cmd command:

s3cmd get --recursive --continue s3://test-bucket local-directory/

You can use this AWS cli command to download entire S3 bucket content to local folder

aws s3 sync s3://your-bucket-name "Local Folder Path"

If you see error like this

fatal error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)

--no-verify-ssl (boolean)

By default, the AWS CLI uses SSL when communicating with AWS services. For each SSL connection, the AWS CLI will verify SSL certificates. This option overrides the default behavior of verifying SSL certificates. reference

Use this tag with command --no-verify-ssl

aws s3 sync s3://your-bucket-name "Local Folder Path" --no-verify-ssl

To download using AWS S3 CLI:

aws s3 cp s3://WholeBucket LocalFolder --recursive
aws s3 cp s3://Bucket/Folder LocalFolder --recursive

To download using code, use the AWS SDK.

To download using GUI, use Cyberduck.


The answer by @Layke is good, but if you have a ton of data and don't want to wait forever, you should read "AWS CLI S3 Configuration".

The following commands will tell the AWS CLI to use 1,000 threads to execute jobs (each a small file or one part of a multipart copy) and look ahead 100,000 jobs:

aws configure set default.s3.max_concurrent_requests 1000
aws configure set default.s3.max_queue_size 100000

After running these, you can use the simple sync command:

aws s3 sync s3://source-bucket/source-path s3://destination-bucket/destination-path

or

aws s3 sync s3://source-bucket/source-path c:\my\local\data\path

On a system with CPU 4 cores and 16GB RAM, for cases like mine (3-50GB files) the sync/copy speed went from about 9.5MiB/s to 700+MiB/s, a speed increase of 70x over the default configuration.


I've done a bit of development for S3 and I have not found a simple way to download a whole bucket.

If you want to code in Java the jets3t lib is easy to use to create a list of buckets and iterate over that list to download them.

First, get a public private key set from the AWS management consule so you can create an S3service object:

AWSCredentials awsCredentials = new AWSCredentials(YourAccessKey, YourAwsSecretKey);
s3Service = new RestS3Service(awsCredentials);

Then, get an array of your buckets objects:

S3Object[] objects = s3Service.listObjects(YourBucketNameString);

Finally, iterate over that array to download the objects one at a time with:

S3Object obj = s3Service.getObject(bucket, fileName);
            file = obj.getDataInputStream();

I put the connection code in a threadsafe singleton. The necessary try/catch syntax has been omitted for obvious reasons.

If you'd rather code in Python you could use Boto instead.

After looking around BucketExplorer, "Downloading the whole bucket" may do what you want.


When in Windows, my preferred GUI tool for this is Cloudberry Explorer for S3., http://www.cloudberrylab.com/free-amazon-s3-explorer-cloudfront-IAM.aspx. Has a fairly polished file explorer, ftp-like interface.


aws sync is the perfect solution. It does not do a two way.. it is a one way from source to destination. Also, if you have lots of items in bucket it will be a good idea to create s3 endpoint first so that download happens faster (because download does not happen via internet but via intranet) and no charges


For Windows, S3 Browser is the easiest way I have found. It is excellent software, and it is free for non-commercial use.


Here is some stuff to download all buckets, list them, list their contents.

    //connection string
    private static void dBConnection() {
    app.setAwsCredentials(CONST.getAccessKey(), CONST.getSecretKey());
    conn = new AmazonS3Client(app.getAwsCredentials());
    app.setListOfBuckets(conn.listBuckets());
    System.out.println(CONST.getConnectionSuccessfullMessage());
    }

    private static void downloadBucket() {

    do {
        for (S3ObjectSummary objectSummary : app.getS3Object().getObjectSummaries()) {
            app.setBucketKey(objectSummary.getKey());
            app.setBucketName(objectSummary.getBucketName());
            if(objectSummary.getKey().contains(CONST.getDesiredKey())){
                //DOWNLOAD
                try 
                {
                    s3Client = new AmazonS3Client(new ProfileCredentialsProvider());
                    s3Client.getObject(
                            new GetObjectRequest(app.getBucketName(),app.getBucketKey()),
                            new File(app.getDownloadedBucket())
                            );
                } catch (IOException e) {
                    e.printStackTrace();
                }

                do
                {
                     if(app.getBackUpExist() == true){
                        System.out.println("Converting back up file");
                        app.setCurrentPacsId(objectSummary.getKey());
                        passIn = app.getDataBaseFile();
                        CONVERT= new DataConversion(passIn);
                        System.out.println(CONST.getFileDownloadedMessage());
                    }
                }
                while(app.getObjectExist()==true);

                if(app.getObjectExist()== false)
                {
                    app.setNoObjectFound(true);
                }
            }
        }
        app.setS3Object(conn.listNextBatchOfObjects(app.getS3Object()));
    } 
    while (app.getS3Object().isTruncated());
}

/----------------------------Extension Methods-------------------------------------/

//Unzip bucket after download 
public static void unzipBucket() throws IOException {
    unzip = new UnZipBuckets();
    unzip.unZipIt(app.getDownloadedBucket());
    System.out.println(CONST.getFileUnzippedMessage());
}

//list all S3 buckets
public static void listAllBuckets(){
    for (Bucket bucket : app.getListOfBuckets()) {
        String bucketName = bucket.getName();
        System.out.println(bucketName + "\t" + StringUtils.fromDate(bucket.getCreationDate()));
    }
}

//Get the contents from the auto back up bucket
public static void listAllBucketContents(){     
    do {
        for (S3ObjectSummary objectSummary : app.getS3Object().getObjectSummaries()) {
            if(objectSummary.getKey().contains(CONST.getDesiredKey())){
                System.out.println(objectSummary.getKey() + "\t" + objectSummary.getSize() + "\t" + StringUtils.fromDate(objectSummary.getLastModified()));
                app.setBackUpCount(app.getBackUpCount() + 1);   
            }
        }
        app.setS3Object(conn.listNextBatchOfObjects(app.getS3Object()));
    } 
    while (app.getS3Object().isTruncated());
    System.out.println("There are a total of : " + app.getBackUpCount() + " buckets.");
}

}


If you use Visual Studio, download "AWS Toolkit for Visual Studio".

After installed, go to Visual Studio - AWS Explorer - S3 - Your bucket - Double click

In the window you will be able to select all files. Right click and download files.


If you only want to download the bucket from AWS, first install the AWS CLI in your machine. In terminal change the directory to where you want to download the files and run this command.

aws s3 sync s3://bucket-name .

If you also want to sync the both local and s3 directories (in case you added some files in local folder), run this command:

aws s3 sync . s3://bucket-name

Use this command with the AWS CLI:

aws s3 cp s3://bucketname . --recursive

To add another GUI option, we use WinSCP's S3 functionality. It's very easy to connect, only requiring your access key and secret key in the UI. You can then browse and download whatever files you require from any accessible buckets, including recursive downloads of nested folders.

Since it can be a challenge to clear new software through security and WinSCP is fairly prevalent, it can be really beneficial to just use it rather than try to install a more specialized utility.


If the bucket is quite big there is a command called s4cmd which makes parallel connections and improves the download time:

To install it on Debian like

apt install s4cmd

If you have pip:

pip install s4cmd

It will read the ~/.s3cfg file if present (if not install s3cmd and run s3cmd --configure) or you can specify --access-key=ACCESS_KEY --secret-key=SECRET_KEY on the command.

The cli is similar to s3cmd. In your case a sync is recommended as you can cancel the download and start it again without having to re-downloaded the files.

s4cmd [--access-key=ACCESS_KEY --secret-key=SECRET_KEY] sync s3://<your-bucket> /some/local/dir

Be careful if you download a lot of data (>1TB) this may impact your bill, calculate first which will be the cost


Examples related to amazon-web-services

How to specify credentials when connecting to boto3 S3? Is there a way to list all resources in AWS Access denied; you need (at least one of) the SUPER privilege(s) for this operation Job for mysqld.service failed See "systemctl status mysqld.service" What is difference between Lightsail and EC2? AWS S3 CLI - Could not connect to the endpoint URL boto3 client NoRegionError: You must specify a region error only sometimes How to write a file or data to an S3 object using boto3 Missing Authentication Token while accessing API Gateway? The AWS Access Key Id does not exist in our records

Examples related to amazon-s3

How to specify credentials when connecting to boto3 S3? AWS S3 CLI - Could not connect to the endpoint URL How to write a file or data to an S3 object using boto3 The AWS Access Key Id does not exist in our records AccessDenied for ListObjects for S3 bucket when permissions are s3:* Save Dataframe to csv directly to s3 Python Listing files in a specific "folder" of a AWS S3 bucket How to get response from S3 getObject in Node.js? Getting Access Denied when calling the PutObject operation with bucket-level permission Read file content from S3 bucket with boto3

Examples related to aws-cli

The AWS Access Key Id does not exist in our records AWS CLI S3 A client error (403) occurred when calling the HeadObject operation: Forbidden How can I resolve the error "The security token included in the request is invalid" when running aws iam upload-server-certificate? AWS : The config profile (MyName) could not be found Error You must specify a region when running command aws ecs list-container-instances Downloading an entire S3 bucket?