Best way to move files between S3 buckets

Question

I d like to copy some files from a production bucket to a development bucket daily   For example  Copy productionbucket feed feedname date to developmentbucket feed feedname date  Because the files I want are so deep in the folder structure  it s too time consuming to go to each folder and copy paste   I ve played around with mounting drives to each bucket and writing a windows batch script  but that is very slow and it unnecessarily downloads all the files folders to the local server and back up again

User · Answer

The new official AWS CLI natively supports most of the functionality of s3cmd. I'd previously been using s3cmd or the ruby AWS SDK to do things like this, but the official CLI works great for this.

http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

aws s3 sync s3://oldbucket s3://newbucket

User · Answer

Update  As pointed out by alberge   1   nowadays the excellent AWS Command Line Interface provides the most versatile approach for interacting with  almost  all things AWS - it meanwhile covers most services  APIs and also features higher level S3 commands for dealing with your use case specifically  see the AWS CLI reference for S3    sync - Syncs directories and S3 prefixes  Your use case is covered by Example 2  more fine grained usage with --exclude  --include and prefix handling etc  is also available       The following sync command syncs objects under a specified prefix and bucket to objects under another specified prefix and bucket by copying s3 objects           aws s3 sync s3   from my bucket s3   to my other bucket     For completeness  I ll mention that the lower level S3 commands are also still available via the s3api sub command  which would allow to directly translate any SDK based solution to the AWS CLI before adopting its higher level functionality eventually     Initial Answer  Moving files between S3 buckets can be achieved by means of the PUT Object - Copy API  followed by DELETE Object       This implementation of the PUT operation creates a copy of an object   that is already stored in Amazon S3  A PUT copy operation is the same   as performing a GET and then a PUT  Adding the request header    x-amz-copy-source  makes the PUT operation copy the source object into   the destination bucket  Source   There are respective samples for all existing AWS SDKs available  see Copying Objects in a Single Operation  Naturally  a scripting based solution would be the obvious first choice here  so Copy an Object Using the AWS SDK for Ruby might be a good starting point  if you prefer Python instead  the same can be achieved via boto as well of course  see method copy key   within boto s S3 API documentation   PUT Object only copies files  so you ll need to explicitly delete a file via DELETE Object still after a successful copy operation  but that will be just another few lines once the overall script handling the bucket and file names is in place  there are respective examples as well  see e g  Deleting One Object Per Request

User · Answer

Here is a ruby class for performing this  https   gist github com 4080793  Example usage     gem install aws-sdk   irb -r   bucket sync service rb  gt  from creds    aws access key id  XXX                   aws secret access key  YYY                   bucket  first-bucket    gt  to creds    aws access key id  ZZZ                 aws secret access key  AAA                 bucket  first-bucket    gt  syncer   BucketSyncService new from creds  to creds   gt  syncer debug   true   log each object  gt  syncer perform

User · Answer

For new version aws2   aws2 s3 sync s3   SOURCE BUCKET NAME s3   NEW BUCKET NAME

User · Answer

We had this exact problem with our ETL jobs at Snowplow  so we extracted our parallel file-copy code  Ruby  built on top of Fog   into its own Ruby gem  called Sluice   https   github com snowplow sluice  Sluice also handles S3 file delete  move and download  all parallelised and with automatic re-try if an operation fails  which it does surprisingly often   I hope it s useful

User · Answer

For me the following command just worked   aws s3 mv s3   bucket data s3   bucket old data --recursive

User · Answer

I know this is an old thread but for others who reach there my suggestion is to create a scheduled job to copy content from production bucket to development one   You can use If you use  NET this article might help you   https   edunyte com 2015 03 aws-s3-copy-object-from-one-bucket-or

User · Answer

To move copy from one bucket to another or the same bucket I use s3cmd tool and works fine  For instance   s3cmd cp --recursive s3   bucket1 directory1 s3   bucket2 directory1 s3cmd mv --recursive s3   bucket1 directory1 s3   bucket2 directory1

User · Answer

NET Example as requested     using  client        var existingObject   client ListObjects requestForExisingFile  S3Objects       if  existingObject Count    1                var requestCopyObject   new CopyObjectRequest                         SourceBucket   BucketNameProd              SourceKey   objectToMerge Key              DestinationBucket   BucketNameDev              DestinationKey   newKey                    client CopyObject requestCopyObject             with client being something like   var config   new AmazonS3Config   CommunicationProtocol   Protocol HTTP  ServiceURL    s3-eu-west-1 amazonaws com     var client   AWSClientFactory CreateAmazonS3Client AWSAccessKey  AWSSecretAccessKey  config     There might be a better way  but it s just some quick code I wrote to get some files transferred

User · Answer

I spent days writing my own custom tool to parallelize the copies required for this  but then I ran across documentation on how to get the AWS S3 CLI sync command to synchronize buckets with massive parallelization   The following commands will tell the AWS CLI to use 1 000 threads to execute jobs  each a small file or one part of a multipart copy  and look ahead 100 000 jobs   aws configure set default s3 max concurrent requests 1000 aws configure set default s3 max queue size 100000   After running these  you can use the simple sync command as follows   aws s3 sync s3   source-bucket source-path s3   destination-bucket destination-path   On an m4 xlarge machine  in AWS--4 cores  16GB RAM   for my case  3-50GB files  the sync copy speed went from about 9 5MiB s to 700 MiB s  a speed increase of 70x over the default configuration   Update  Note that S3CMD has been updated over the years and these changes are now only effective when you re working with lots of small files   Also note that S3CMD on Windows  only on Windows  is seriously limited in overall throughput and can only achieve about 3Gbps per process no matter what instance size or settings you use   Other systems like S5CMD have the same problem   I ve spoken to the S3 team about this and they re looking into it

User · Answer

Actually as of recently I just use the copy paste action in the AWS s3 interface  Just navigate to the files you want to copy  click on  Actions  -   Copy  then navigate to the destination bucket and  Actions  -   Paste   It transfers the files pretty quick and it seems like a less convoluted solution that doesn t require any programming  or over the top solutions like that

User · Answer

If you have a unix host within AWS  then use s3cmd from s3tools org   Set up permissions so that your key as read access to your development bucket  Then run   s3cmd cp -r s3   productionbucket feed feedname date s3   developmentbucket feed feedname

[amazon-s3] Best way to move files between S3 buckets?

Examples related to amazon-s3