Boto3 to download all files from a S3 Bucket

Question

I m using boto3 to get files from s3 bucket  I need a similar functionality like aws s3 sync  My current code is     usr bin python import boto3 s3 boto3 client  s3   list s3 list objects Bucket  my bucket name    Contents   for key in list      s3 download file  my bucket name   key  Key    key  Key      This is working fine  as  long as the bucket has only files  If a folder is present inside the bucket  its throwing an error  Traceback  most recent call last     File    test   line 6  in  lt module gt      s3 download file  my bucket name   key  Key    key  Key      File   usr local lib python2 7 dist-packages boto3 s3 inject py   line 58  in download file     extra args ExtraArgs  callback Callback    File   usr local lib python2 7 dist-packages boto3 s3 transfer py   line 651  in download file     extra args  callback    File   usr local lib python2 7 dist-packages boto3 s3 transfer py   line 666  in  download file     self  get object bucket  key  filename  extra args  callback    File   usr local lib python2 7 dist-packages boto3 s3 transfer py   line 690  in  get object     extra args  callback    File   usr local lib python2 7 dist-packages boto3 s3 transfer py   line 707  in  do get object     with self  osutil open filename   wb   as f    File   usr local lib python2 7 dist-packages boto3 s3 transfer py   line 323  in open     return open filename  mode  IOError   Errno 2  No such file or directory   my folder  8Df54234    Is this a proper way to download a complete s3 bucket using boto3  How to download folders

User · Answer

import os import boto3   initiate s3 resource s3   boto3 resource  s3      select bucket my bucket   s3 Bucket  my bucket name      download file into current directory for s3 object in my bucket objects all          Need to split s3 object key into path and file name  else it will give error file not found      path  filename   os path split s3 object key      my bucket download file s3 object key  filename

User · Answer

From AWS S3 Docs  How do I use folders in an S3 bucket     In Amazon S3  buckets and objects are the primary resources  and objects are stored in buckets  Amazon S3 has a flat structure instead of a hierarchy like you would see in a file system  However  for the sake of organizational simplicity  the Amazon S3 console supports the folder concept as a means of grouping objects  Amazon S3 does this by using a shared name prefix for objects  that is  objects have names that begin with a common string   Object names are also referred to as key names    For example  you can create a folder on the console named photos and store an object named myphoto jpg in it  The object is then stored with the key name photos myphoto jpg  where photos  is the prefix   To download all files from  quot mybucket quot  into the current directory respecting the bucket s emulated directory structure  creating the folders from the bucket if they don t already exist locally   import boto3 import os  bucket name    quot mybucket quot  s3   boto3 client  quot s3 quot   objects   s3 list objects Bucket   bucket name   quot Contents quot   for s3 object in objects      s3 key   s3 object  quot Key quot       path  filename   os path split s3 key      if len path     0 and not os path exists path           os makedirs path      if not s3 key endswith  quot   quot            download to   path         filename if path else filename         s3 download file bucket name  s3 key  download to

User · Answer

It is a very bad idea to get all files in one go  you should rather get it in batches   One implementation which I use to fetch a particular folder  directory  from S3 is   def get directory directory path  download path  exclude file names         prepare session     session   Session aws access key id  aws secret access key  region name         get instances for resource and bucket     resource   session resource  s3       bucket   resource Bucket bucket name       for s3 key in self client list objects Bucket self bucket name  Prefix directory path   Contents            s3 object   s3 key  Key           if s3 object not in exclude file names              bucket download file file path  download path   str s3 object split      -1     and still if you want to get the whole bucket use it via CIL as  John Rotenstein mentioned as below   aws s3 cp --recursive s3   bucket name download path

User · Answer

for objs in my bucket objects all        print objs key      path   tmp   os sep join objs key split os sep   -1       try          if not os path exists path               os makedirs path          my bucket download file objs key    tmp   objs key      except FileExistsError as fe                                    print objs key   exists     This code will download the content in  tmp  directory  If you want you can change the directory

User · Answer

Better late than never   The previous answer with paginator is really good  However it is recursive  and you might end up hitting Python s recursion limits  Here s an alternate approach  with a couple of extra checks   import os import errno import boto3   def assert dir exists path               Checks if directory tree in path exists  If not it created them       param path  the path to check if it exists             try          os makedirs path      except OSError as e          if e errno    errno EEXIST              raise   def download dir client  bucket  path  target               Downloads recursively the given S3 path to the target directory       param client  S3 client to use       param bucket  the name of the bucket to download from      param path  The S3 directory to download       param target  the local directory to download the files to                 Handle missing   at end of prefix     if not path endswith               path             paginator   client get paginator  list objects v2       for result in paginator paginate Bucket bucket  Prefix path             Download each file individually         for key in result  Contents                  Calculate relative path             rel path   key  Key   len path                  Skip paths ending in               if not key  Key   endswith                       local file path   os path join target  rel path                    Make sure directories exist                 local file dir   os path dirname local file path                  assert dir exists local file dir                  client download file bucket  key  Key    local file path    client   boto3 client  s3    download dir client   bucket-name    path to data    downloads

User · Answer

I have a workaround for this that runs the AWS CLI in the same process    Install awscli as python lib   pip install awscli   Then define this function   from awscli clidriver import create clidriver  def aws cli  cmd       old env   dict os environ      try             Environment         env   os environ copy           env  LC CTYPE     u en US UTF          os environ update env             Run awscli in the same process         exit code   create clidriver   main  cmd             Deal with problems         if exit code  gt  0              raise RuntimeError  AWS CLI exited with code     format exit code       finally          os environ clear           os environ update old env    To execute   aws cli  s3    sync     path to source    s3   bucket destination    --delete

User · Answer

I got the similar requirement and got help from reading few of the above solutions and across other websites  I have came up with below script  Just wanted to share if it might help anyone       from boto3 session import Session import os  def sync s3 folder access key id secret access key bucket name folder destination path           session   Session aws access key id access key id aws secret access key secret access key      s3   session resource  s3       your bucket   s3 Bucket bucket name      for s3 file in your bucket objects all            if folder in s3 file key              file os path join destination path s3 file key replace                        if not os path exists os path dirname file                    os makedirs os path dirname file               your bucket download file s3 file key file  sync s3 folder access key id secret access key bucket name folder destination path

User · Answer

I m currently achieving the task  by using the following     usr bin python import boto3 s3 boto3 client  s3   list s3 list objects Bucket  bucket    Contents   for s3 key in list      s3 object   s3 key  Key       if not s3 object endswith               s3 download file  bucket   s3 object  s3 object      else          import os         if not os path exists s3 object               os makedirs s3 object    Although  it does the job  I m not sure its good to do this way   I m leaving it here to help other users and further answers  with better manner of achieving this

User · Answer

When working with buckets that have 1000  objects its necessary to implement a solution that uses the NextContinuationToken on sequential sets of  at most  1000 keys  This solution first compiles a list of objects then iteratively creates the specified directories and downloads the existing objects   import boto3 import os  s3 client   boto3 client  s3    def download dir prefix  local  bucket  client s3 client               params      - prefix  pattern to match in s3     - local  local path to folder in which to place files     - bucket  s3 bucket with target contents     - client  initialized s3 client object             keys          dirs          next token          base kwargs              Bucket  bucket           Prefix  prefix            while next token is not None          kwargs   base kwargs copy           if next token                    kwargs update   ContinuationToken   next token           results   client list objects v2   kwargs          contents   results get  Contents           for i in contents              k   i get  Key               if k -1                          keys append k              else                  dirs append k          next token   results get  NextContinuationToken       for d in dirs          dest pathname   os path join local  d          if not os path exists os path dirname dest pathname                os makedirs os path dirname dest pathname       for k in keys          dest pathname   os path join local  k          if not os path exists os path dirname dest pathname                os makedirs os path dirname dest pathname           client download file bucket  k  dest pathname

User · Answer

Amazon S3 does not have folders directories  It is a flat file structure   To maintain the appearance of directories  path names are stored as part of the object Key  filename   For example    images foo jpg   In this case  the whole Key is images foo jpg  rather than just foo jpg   I suspect that your problem is that boto is returning a file called my folder  8Df54234 and is attempting to save it to the local filesystem  However  your local filesystem interprets the my folder  portion as a directory name  and that directory does not exist on your local filesystem   You could either truncate the filename to only save the  8Df54234 portion  or you would have to create the necessary directories before writing files  Note that it could be multi-level nested directories   An easier way would be to use the AWS Command-Line Interface  CLI   which will do all this work for you  eg   aws s3 cp --recursive s3   my bucket name local folder   There s also a sync option that will only copy new and modified files

User · Answer

If you want to call a bash script using python  here is a simple method to load a file from a folder in S3 bucket to a local folder  in a Linux machine     import boto3 import subprocess import os     TOEDIT    my bucket name    your my bucket name  bucket folder name    your bucket folder name  local folder path    your local folder path     TOEDIT       1 Load thes list of files existing in the bucket folder FILES NAMES      s3   boto3 resource  s3   my bucket   s3 Bucket      format my bucket name   for object summary in my bucket objects filter Prefix       format bucket folder name          print object summary key      FILES NAMES append object summary key     2 List only new files that do not exist in local folder  to not copy everything   new filenames   list set FILES NAMES  -set os listdir local folder path       3 Time to load files in your destination folder  for new filename in new filenames      upload S3files CMD      aws s3 cp s3                  format my bucket name bucket folder name new filename  local folder path       subprocess call   subprocess call  upload S3files CMD   shell True      if subprocess call    0          print  ALERT  loading files not working correctly  please re-check new loaded files

User · Answer

I have been running into this problem for a while and with all of the different forums I ve been through I haven t see a full end-to-end snip-it of what works  So  I went ahead and took all the pieces  add some stuff on my own  and have created a full end-to-end S3 Downloader  This will not only download files automatically but if the S3 files are in subdirectories  it will create them on the local storage  In my application s instance  I need to set permissions and owners so I have added that too  can be comment out if not needed   This has been tested and works in a Docker environment  K8  but I have added the environmental variables in the script just in case you want to test run it locally  I hope this helps someone out in their quest of finding S3 Download automation  I also welcome any advice  info  etc  on how this can be better optimized if needed     usr bin python3 import gc import logging import os import signal import sys import time from datetime import datetime  import boto from boto exception import S3ResponseError from pythonjsonlogger import jsonlogger  formatter   jsonlogger JsonFormatter    message   levelname   name   asctime   filename   lineno   funcName     json handler out   logging StreamHandler   json handler out setFormatter formatter    Manual Testing Variables If Needed  os environ  quot DOWNLOAD LOCATION PATH quot      quot some path quot   os environ  quot BUCKET NAME quot      quot some bucket quot   os environ  quot AWS ACCESS KEY quot      quot some access key quot   os environ  quot AWS SECRET KEY quot      quot some secret quot   os environ  quot LOG LEVEL SELECTOR quot      quot DEBUG  INFO  or ERROR quot    Setting Log Level Test logger   logging getLogger  json   logger addHandler json handler out  logger levels          ERROR    logging ERROR       INFO    logging INFO       DEBUG    logging DEBUG   logger level selector   os environ  quot LOG LEVEL SELECTOR quot   logger setLevel logger level selector    Getting Date Time now   datetime now   logger info  quot Current date and time    quot   logger info now strftime  quot  Y- m- d  H  M  S quot      Establishing S3 Variables and Download Location download location path   os environ  quot DOWNLOAD LOCATION PATH quot   bucket name   os environ  quot BUCKET NAME quot   aws access key id   os environ  quot AWS ACCESS KEY quot   aws access secret key   os environ  quot AWS SECRET KEY quot   logger debug  quot Bucket   s quot    bucket name  logger debug  quot Key   s quot    aws access key id  logger debug  quot Secret   s quot    aws access secret key  logger debug  quot Download location path   s quot    download location path    Creating Download Directory if not os path exists download location path       logger info  quot Making download directory quot       os makedirs download location path    Signal Hooks are fun class GracefulKiller      kill now   False     def   init   self           signal signal signal SIGINT  self exit gracefully          signal signal signal SIGTERM  self exit gracefully      def exit gracefully self  signum  frame           self kill now   True   Downloading from S3 Bucket def download s3 bucket        conn   boto connect s3 aws access key id  aws access secret key      logger debug  quot Connection established   quot       bucket   conn get bucket bucket name      logger debug  quot Bucket   s quot    str bucket       bucket list   bucket list        logger info  quot Number of items to download   0  quot  format len bucket list         for s3 item in bucket list          key string   str s3 item key          logger debug  quot S3 Bucket Item to download   s quot    key string          s3 path   download location path    quot   quot    key string         logger debug  quot Downloading to   s quot    s3 path          local dir   os path dirname s3 path           if not os path exists local dir               logger info  quot Local directory doesn t exist  creating it     s quot    local dir              os makedirs local dir              logger info  quot Updating local directory permissions to  s quot    local dir   Comment or Uncomment Permissions based on Local Usage             os chmod local dir  0o775              os chown local dir  60001  60001          logger debug  quot Local directory for download   s quot    local dir          try              logger info  quot Downloading File   s quot    key string              s3 item get contents to filename s3 path              logger info  quot Successfully downloaded File   s quot    s3 path               Updating Permissions             logger info  quot Updating Permissions for  s quot    str s3 path    Comment or Uncomment Permissions based on Local Usage             os chmod s3 path  0o664              os chown s3 path  60001  60001          except  OSError  S3ResponseError  as e              logger error  quot Fatal error in s3 item get contents to filename quot   exc info True                logger error  quot Exception in file download from S3     quot  format e               continue         logger info  quot Deleting  s from S3 Bucket quot    str s3 item key           s3 item delete    def main        killer   GracefulKiller       while not killer kill now          logger info  quot Checking for new files on S3 to download    quot           download s3 bucket           logger info  quot Done checking for new files  will check in 120s    quot           gc collect           sys stdout flush           time sleep 120  if   name         main         main

User · Answer

Reposting  glefait  s answer with an if condition at the end to avoid os error 20  The first key it gets is the folder name itself which cannot be written in the destination path    def download dir client  resource  dist  local   tmp   bucket  your bucket        paginator   client get paginator  list objects       for result in paginator paginate Bucket bucket  Delimiter      Prefix dist           if result get  CommonPrefixes   is not None              for subdir in result get  CommonPrefixes                    download dir client  resource  subdir get  Prefix    local  bucket          for file in result get  Contents                    print  Content    result              dest pathname   os path join local  file get  Key                print  Dest path    dest pathname              if not os path exists os path dirname dest pathname                    print  here last if                   os makedirs os path dirname dest pathname               print  else file key     file get  Key                if not file get  Key      dist                  print  Key not equal    file get  Key                    resource meta client download file bucket  file get  Key    dest pathname enter code here

User · Answer

I have the same needs and created the following function that download recursively the files   The directories are created locally only if they contain files   import boto3 import os  def download dir client  resource  dist  local   tmp   bucket  your bucket        paginator   client get paginator  list objects       for result in paginator paginate Bucket bucket  Delimiter      Prefix dist           if result get  CommonPrefixes   is not None              for subdir in result get  CommonPrefixes                    download dir client  resource  subdir get  Prefix    local  bucket          for file in result get  Contents                    dest pathname   os path join local  file get  Key                if not os path exists os path dirname dest pathname                    os makedirs os path dirname dest pathname               resource meta client download file bucket  file get  Key    dest pathname    The function is called that way   def  start        client   boto3 client  s3       resource   boto3 resource  s3       download dir client  resource   clientconf      tmp   bucket  my-bucket

User · Answer

import boto3  os  s3   boto3 client  s3    def download bucket bucket       paginator   s3 get paginator  list objects v2       pages   paginator paginate Bucket bucket      for page in pages        if  Contents  in page          for obj in page  Contents                os path dirname obj  Key    and os makedirs os path dirname obj  Key     exist ok True               try                  s3 download file bucket  obj  Key    obj  Key                except NotADirectoryError                  pass    Change bucket name to name of bucket that you want to download download bucket bucket name   This should work for all number of objects  also when there are more than 1000   Each paginator page can contain up to 1000 objects Notice extra param in os makedirs function - exist ok True which cause that it s not throwing error when path exist

[python] Boto3 to download all files from a S3 Bucket

Examples related to python

Examples related to amazon-web-services

Examples related to amazon-s3

Examples related to boto3