Download large file in python with requests

Question

Requests is a really nice library  I d like to use it for downloading big files   gt 1GB   The problem is it s not possible to keep whole file in memory  I need to read it in chunks  And this is a problem with the following code  import requests  def DownloadFile url      local filename   url split      -1      r   requests get url      f   open local filename   wb       for chunk in r iter content chunk size 512   1024            if chunk    filter out keep-alive new chunks             f write chunk      f close       return   For some reason it doesn t work this way  it still loads the response into memory before it is saved to a file  UPDATE If you need a small client  Python 2 x  3 x  which can download big files from FTP  you can find it here  It supports multithreading  amp  reconnects  it does monitor connections  also it tunes socket params for the download task

User · Answer

use wget module of python instead  Here is a snippet import wget wget download url

User · Answer

It s much easier if you use Response raw and shutil copyfileobj     import requests import shutil  def download file url       local filename   url split      -1      with requests get url  stream True  as r          with open local filename   wb   as f              shutil copyfileobj r raw  f       return local filename   This streams the file to disk without using excessive memory  and the code is simple

User · Answer

Your chunk size could be too large  have you tried dropping that - maybe 1024 bytes at a time   also  you could use with to tidy up the syntax   def DownloadFile url       local filename   url split      -1      r   requests get url      with open local filename   wb   as f          for chunk in r iter content chunk size 1024                if chunk    filter out keep-alive new chunks                 f write chunk      return    Incidentally  how are you deducing that the response has been loaded into memory   It sounds as if python isn t flushing the data to file  from other SO questions you could try f flush   and os fsync   to force the file write and free memory       with open local filename   wb   as f          for chunk in r iter content chunk size 1024                if chunk    filter out keep-alive new chunks                 f write chunk                  f flush                   os fsync f fileno

User · Answer

Based on the Roman s most upvoted comment above  here is my implementation  Including  quot download as quot  and  quot retries quot  mechanism  def download url  str  file path     attempts 2        quot  quot  quot Downloads a URL content into a file  with large file support by streaming        param url  URL to download      param file path  Local file name to contain the data downloaded      param attempts  Number of attempts      return  New file path  Empty string if the download failed      quot  quot  quot      if not file path          file path   os path realpath os path basename url       logger info f Downloading  url  content to  file path        url sections   urlparse url      if not url sections scheme          logger debug  The given url is missing a scheme  Adding http scheme           url   f http    url           logger debug f New url   url        for attempt in range 1  attempts 1           try              if attempt  gt  1                  time sleep 10     10 seconds wait time between downloads             with requests get url  stream True  as response                  response raise for status                   with open file path   wb   as out file                      for chunk in response iter content chunk size 1024 1024      1MB chunks                         out file write chunk                  logger info  Download finished successfully                   return file path         except Exception as ex              logger error f Attempt   attempt  failed with error   ex        return

User · Answer

With the following streaming code  the Python memory usage is restricted regardless of the size of the downloaded file  def download file url       local filename   url split      -1        NOTE the stream True parameter below     with requests get url  stream True  as r          r raise for status           with open local filename   wb   as f              for chunk in r iter content chunk size 8192                      If you have chunk encoded response uncomment if                   and set chunk size parameter to None                   if chunk                   f write chunk      return local filename  Note that the number of bytes returned using iter content is not exactly the chunk size  it s expected to be a random number that is often far bigger  and is expected to be different in every iteration  See body-content-workflow and Response iter content for further reference

User · Answer

Not exactly what OP was asking  but    it s ridiculously easy to do that with urllib   from urllib request import urlretrieve url    http   mirror pnl gov releases 16 04 2 ubuntu-16 04 2-desktop-amd64 iso  dst    ubuntu-16 04 2-desktop-amd64 iso  urlretrieve url  dst    Or this way  if you want to save it to a temporary file   from urllib request import urlopen from shutil import copyfileobj from tempfile import NamedTemporaryFile url    http   mirror pnl gov releases 16 04 2 ubuntu-16 04 2-desktop-amd64 iso  with urlopen url  as fsrc  NamedTemporaryFile delete False  as fdst      copyfileobj fsrc  fdst    I watched the process   watch  ps -p 18647 -o pid ppid pmem rsz vsz comm args  ls -al   iso    And I saw the file growing  but memory usage stayed at 17 MB  Am I missing something

[python] Download large file in python with requests

Examples related to python

Examples related to download

Examples related to stream

Examples related to python-requests