[amazon-s3] Best way to move files between S3 buckets?

I'd like to copy some files from a production bucket to a development bucket daily.

For example: Copy productionbucket/feed/feedname/date to developmentbucket/feed/feedname/date

Because the files I want are so deep in the folder structure, it's too time consuming to go to each folder and copy/paste.

I've played around with mounting drives to each bucket and writing a windows batch script, but that is very slow and it unnecessarily downloads all the files/folders to the local server and back up again.

This question is related to amazon-s3

The answer is


We had this exact problem with our ETL jobs at Snowplow, so we extracted our parallel file-copy code (Ruby, built on top of Fog), into its own Ruby gem, called Sluice:

https://github.com/snowplow/sluice

Sluice also handles S3 file delete, move and download; all parallelised and with automatic re-try if an operation fails (which it does surprisingly often). I hope it's useful!


For me the following command just worked:

aws s3 mv s3://bucket/data s3://bucket/old_data --recursive

Here is a ruby class for performing this: https://gist.github.com/4080793

Example usage:

$ gem install aws-sdk
$ irb -r ./bucket_sync_service.rb
> from_creds = {aws_access_key_id:"XXX",
                aws_secret_access_key:"YYY",
                bucket:"first-bucket"}
> to_creds = {aws_access_key_id:"ZZZ",
              aws_secret_access_key:"AAA",
              bucket:"first-bucket"}
> syncer = BucketSyncService.new(from_creds, to_creds)
> syncer.debug = true # log each object
> syncer.perform

I spent days writing my own custom tool to parallelize the copies required for this, but then I ran across documentation on how to get the AWS S3 CLI sync command to synchronize buckets with massive parallelization. The following commands will tell the AWS CLI to use 1,000 threads to execute jobs (each a small file or one part of a multipart copy) and look ahead 100,000 jobs:

aws configure set default.s3.max_concurrent_requests 1000
aws configure set default.s3.max_queue_size 100000

After running these, you can use the simple sync command as follows:

aws s3 sync s3://source-bucket/source-path s3://destination-bucket/destination-path

On an m4.xlarge machine (in AWS--4 cores, 16GB RAM), for my case (3-50GB files) the sync/copy speed went from about 9.5MiB/s to 700+MiB/s, a speed increase of 70x over the default configuration.

Update: Note that S3CMD has been updated over the years and these changes are now only effective when you're working with lots of small files. Also note that S3CMD on Windows (only on Windows) is seriously limited in overall throughput and can only achieve about 3Gbps per process no matter what instance size or settings you use. Other systems like S5CMD have the same problem. I've spoken to the S3 team about this and they're looking into it.


If you have a unix host within AWS, then use s3cmd from s3tools.org. Set up permissions so that your key as read access to your development bucket. Then run:

s3cmd cp -r s3://productionbucket/feed/feedname/date s3://developmentbucket/feed/feedname

To move/copy from one bucket to another or the same bucket I use s3cmd tool and works fine. For instance:

s3cmd cp --recursive s3://bucket1/directory1 s3://bucket2/directory1
s3cmd mv --recursive s3://bucket1/directory1 s3://bucket2/directory1

For new version aws2.

aws2 s3 sync s3://SOURCE_BUCKET_NAME s3://NEW_BUCKET_NAME

I know this is an old thread but for others who reach there my suggestion is to create a scheduled job to copy content from production bucket to development one.

You can use If you use .NET this article might help you

https://edunyte.com/2015/03/aws-s3-copy-object-from-one-bucket-or/


Actually as of recently I just use the copy+paste action in the AWS s3 interface. Just navigate to the files you want to copy, click on "Actions" -> "Copy" then navigate to the destination bucket and "Actions" -> "Paste"

It transfers the files pretty quick and it seems like a less convoluted solution that doesn't require any programming, or over the top solutions like that.


.NET Example as requested:

using (client)
{
    var existingObject = client.ListObjects(requestForExisingFile).S3Objects; 
    if (existingObject.Count == 1)
    {
        var requestCopyObject = new CopyObjectRequest()
        {
            SourceBucket = BucketNameProd,
            SourceKey = objectToMerge.Key,
            DestinationBucket = BucketNameDev,
            DestinationKey = newKey
        };
        client.CopyObject(requestCopyObject);
    }
}

with client being something like

var config = new AmazonS3Config { CommunicationProtocol = Protocol.HTTP, ServiceURL = "s3-eu-west-1.amazonaws.com" };
var client = AWSClientFactory.CreateAmazonS3Client(AWSAccessKey, AWSSecretAccessKey, config);

There might be a better way, but it's just some quick code I wrote to get some files transferred.


The new official AWS CLI natively supports most of the functionality of s3cmd. I'd previously been using s3cmd or the ruby AWS SDK to do things like this, but the official CLI works great for this.

http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

aws s3 sync s3://oldbucket s3://newbucket