How to download a file over HTTP

Question

I have a small utility that I use to download an MP3 file from a website on a schedule and then builds updates a podcast XML file which I ve added to iTunes   The text processing that creates updates the XML file is written in Python  However  I use wget inside a Windows  bat file to download the actual MP3 file  I would prefer to have the entire utility written in Python   I struggled to find a way to actually download the file in Python  thus why I resorted to using wget   So  how do I download the file using Python

User · Answer

This may be a little late  But I saw pabloG s code and couldn t help adding a os system  cls   to make it look AWESOME  Check it out         import urllib2 os      url    http   download thinkbroadband com 10MB zip       file name   url split      -1      u   urllib2 urlopen url      f   open file name   wb       meta   u info       file size   int meta getheaders  Content-Length   0       print  Downloading   s Bytes   s     file name  file size      os system  cls       file size dl   0     block sz   8192     while True          buffer   u read block sz          if not buffer              break          file size dl    len buffer          f write buffer          status   r  10d    3 2f        file size dl  file size dl   100    file size          status   status   chr 8   len status  1          print status       f close     If running in an environment other than Windows  you will have to use something other then  cls   In MAC OS X and Linux it should be  clear

User · Answer

use wget module  import wget wget download  url

User · Answer

Simple yet Python 2  amp  Python 3 compatible way comes with six library   from six moves import urllib urllib request urlretrieve  http   www example com songs mp3 mp3    mp3 mp3

User · Answer

import urllib2 mp3file   urllib2 urlopen  http   www example com songs mp3 mp3   with open  test mp3   wb   as output    output write mp3file read      The wb in open  test mp3   wb   opens a file  and erases any existing file  in binary mode so you can save data with it instead of just text

User · Answer

One more  using urlretrieve   import urllib urllib urlretrieve   http   www example com songs mp3 mp3    mp3 mp3      for Python 3  use import urllib request and urllib request urlretrieve   Yet another one  with a  progressbar   import urllib2  url    http   download thinkbroadband com 10MB zip   file name   url split      -1  u   urllib2 urlopen url  f   open file name   wb   meta   u info   file size   int meta getheaders  Content-Length   0   print  Downloading   s Bytes   s     file name  file size   file size dl   0 block sz   8192 while True      buffer   u read block sz      if not buffer          break      file size dl    len buffer      f write buffer      status   r  10d    3 2f        file size dl  file size dl   100    file size      status   status   chr 8   len status  1      print status   f close

User · Answer

Wrote wget library in pure Python just for this purpose  It is pumped up urlretrieve with these features as of version 2 0

User · Answer

import os requests def download url       get response   requests get url stream True      file name    url split      -1      with open file name   wb   as f          for chunk in get response iter content chunk size 1024               if chunk    filter out keep-alive new chunks                 f write chunk    download  https   example com example jpg

User · Answer

Use urllib request urlopen    import urllib request with urllib request urlopen  http   www example com    as f      html   f read   decode  utf-8    This is the most basic way to use the library  minus any error handling  You can also do more complex stuff such as changing headers  On Python 2  the method is in urllib2  import urllib2 response   urllib2 urlopen  http   www example com    html   response read

User · Answer

Source code can be   import urllib sock   urllib urlopen  http   diveintopython org    htmlSource   sock read                               sock close                                           print htmlSource

User · Answer

An improved version of the PabloG code for Python 2 3      usr bin env python   - - coding  utf-8 - - from   future   import   division  absolute import  print function  unicode literals    import sys  os  tempfile  logging  if sys version info  gt    3        import urllib request as urllib2     import urllib parse as urlparse else      import urllib2     import urlparse  def download file url  dest None                Download and save a file specified by url to dest directory              u   urllib2 urlopen url       scheme  netloc  path  query  fragment   urlparse urlsplit url      filename   os path basename path      if not filename          filename    downloaded file      if dest          filename   os path join dest  filename       with open filename   wb   as f          meta   u info           meta func   meta getheaders if hasattr meta   getheaders   else meta get all         meta length   meta func  Content-Length           file size   None         if meta length              file size   int meta length 0           print  Downloading   0  Bytes   1   format url  file size            file size dl   0         block sz   8192         while True              buffer   u read block sz              if not buffer                  break              file size dl    len buffer              f write buffer               status     0 16   format file size dl              if file size                  status          0 6 2f     format file size dl   100   file size              status    chr 13              print status  end             print        return filename  if   name         main        Only run if this file is called directly     print  Testing with 10MB download       url    http   download thinkbroadband com 10MB zip      filename   download file url      print filename

User · Answer

Python 3   urllib request urlopen  import urllib request response   urllib request urlopen  http   www example com    html   response read    urllib request urlretrieve  import urllib request urllib request urlretrieve  http   www example com songs mp3 mp3    mp3 mp3     Note  According to the documentation  urllib request urlretrieve is a  legacy interface  and  might become deprecated in the future   thanks gerrit    Python 2   urllib2 urlopen  thanks Corey   import urllib2 response   urllib2 urlopen  http   www example com    html   response read    urllib urlretrieve  thanks PabloG   import urllib urllib urlretrieve  http   www example com songs mp3 mp3    mp3 mp3

User · Answer

I wanted do download all the files from a webpage  I tried wget but it was failing so I decided for the Python route and I found this thread    After reading it  I have made a little command line application  soupget  expanding on the excellent answers of PabloG and Stan and adding some useful options    It uses BeatifulSoup to collect all the URLs of the page and then download the ones with the desired extension s   Finally it can download multiple files in parallel   Here it is      usr bin env python3   - - coding  utf-8 - - from   future   import  division  absolute import  print function  unicode literals  import sys  os  argparse from bs4 import BeautifulSoup    --- insert Stan s script here ---   if sys version info  gt    3                 def download file url  dest None                 --- new stuff --- def collect all url page url  extensions               Recovers all links in page url checking for all the desired extensions             conn   urllib2 urlopen page url      html   conn read       soup   BeautifulSoup html   lxml       links   soup find all  a        results              for tag in links          link   tag get  href   None          if link is not None               for e in extensions                  if e in link                        Fallback for badly defined links                       checks for missing scheme or netloc                     if bool urlparse urlparse link  scheme  and bool urlparse urlparse link  netloc                           results append link                      else                          new url urlparse urljoin page url link                                                  results append new url      return results  if   name         main        Only run if this file is called directly       Command line arguments     parser   argparse ArgumentParser          description  Download all files from a webpage        parser add argument           -u    --url            help  Page url to request       parser add argument           -e    --ext            nargs              help  Extension s  to find           parser add argument           -d    --dest            default None          help  Destination where to save the files       parser add argument           -p    --par            action  store true   default False           help  Turns on parallel download       args   parser parse args          Recover files to download     all links   collect all url args url  args ext         Download     if not args par          for l in all links              try                  filename   download file l  args dest                  print l              except Exception as e                  print  Error while downloading      format e       else          from multiprocessing pool import ThreadPool         results   ThreadPool 10  imap unordered              lambda x  download file x  args dest   all links          for p in results              print p    An example of its usage is   python3 soupget py -p -e  lt list of extensions gt  -d  lt destination folder gt  -u  lt target webpage gt    And an actual example if you want to see it in action   python3 soupget py -p -e  xlsx  pdf  csv -u https   healthdata gov dataset chemicals-cosmetics

User · Answer

Another way is to call an external process such as curl exe  Curl by default displays a progress bar  average download speed  time left  and more all formatted neatly in a table  Put curl exe in the same directory as your script from subprocess import call url    quot  quot  call   quot curl quot    url    --output    quot song mp3 quot     Note  You cannot specify an output path with curl  so do an os rename afterwards

User · Answer

If speed matters to you  I made a small performance test for the modules urllib and wget  and regarding wget I tried once with status bar and once without  I took three different 500MB files to test with  different files- to eliminate the chance that there is some caching going on under the hood   Tested on debian machine  with python2   First  these are the results  they are similar in different runs      python wget test py  urlretrive test   starting urlretrive test   6 56                wget no bar test   starting wget no bar test   7 20                wget with bar test   starting 100                                                                           541335552   541335552 wget with bar test   50 49                  The way I performed the test is using  profile  decorator  This is the full code   import wget import urllib import time from functools import wraps  def profile func        wraps func      def inner  args           print func   name       starting          start   time time           ret   func  args          end   time time           print func   name          2f   format end - start          return ret     return inner  url1    http   host com 500a iso  url2    http   host com 500b iso  url3    http   host com 500c iso   def do nothing  args       pass   profile def urlretrive test url       return urllib urlretrieve url    profile def wget no bar test url       return wget download url  out   tmp    bar do nothing    profile def wget with bar test url       return wget download url  out   tmp     urlretrive test url1  print                  time sleep 1   wget no bar test url2  print                  time sleep 1   wget with bar test url3  print                  time sleep 1    urllib seems to be the fastest

User · Answer

Following are the most commonly used calls for downloading files in python    urllib urlretrieve   url to file   file name  urllib2 urlopen  url to file   requests get url  wget download  url   file name    Note  urlopen and urlretrieve are found to perform relatively bad with downloading large files  size   500 MB   requests get stores the file in-memory until download is complete

User · Answer

Late answer  but for python gt  3 6 you can use   import dload dload save url      Install dload with   pip3 install dload

User · Answer

I wrote the following  which works in vanilla Python 2 or Python 3     import sys try      import urllib request     python3   True except ImportError      import urllib2     python3   False   def progress callback simple downloaded total       sys stdout write            r             len str total  -len str downloaded          str downloaded       d  total               3 2f      100 0 float downloaded  float total             sys stdout flush    def download srcurl  dstfilepath  progress callback None  block size 8192       def  download helper response  out file  file size           if progress callback  None  progress callback 0 file size          if block size    None              buffer   response read               out file write buffer               if progress callback  None  progress callback file size file size          else              file size dl   0             while True                  buffer   response read block size                  if not buffer  break                  file size dl    len buffer                  out file write buffer                   if progress callback  None  progress callback file size dl file size      with open dstfilepath  wb   as out file          if python3              with urllib request urlopen srcurl  as response                  file size   int response getheader  Content-Length                     download helper response out file file size          else              response   urllib2 urlopen srcurl              meta   response info               file size   int meta getheaders  Content-Length   0                download helper response out file file size   import traceback try      download           https   geometrian com data programming projects glLib glLib 20Reloaded 200 5 9 0 5 9 zip            output zip           progress callback simple       except      traceback print exc       input       Notes    Supports a  progress bar  callback  Download is a 4 MB test  zip from my website

User · Answer

In python3 you can use urllib3 and shutil libraires  Download them by using pip or pip3  Depending whether python3 is default or not   pip3 install urllib3 shutil   Then run this code  import urllib request import shutil  url    http   www somewebsite com something pdf  output file    save this name pdf  with urllib request urlopen url  as response  open output file   wb   as out file      shutil copyfileobj response  out file    Note that you download urllib3 but use urllib in code

User · Answer

urlretrieve and requests get are simple  however the reality not  I have fetched data for couple sites  including text and images  the above two probably solve most of the tasks  but for a more universal solution I suggest the use of urlopen  As it is included in Python 3 standard library  your code could run on any machine that run Python 3 without pre-installing site-package  import urllib request url request   urllib request Request url  headers headers  url connect   urllib request urlopen url request    remember to open file in bytes mode with open filename   wb   as f      while True          buffer   url connect read buffer size          if not buffer  break           an integer value of size of written data         data wrote   f write buffer    you could probably use with-open-as manner url connect close     This answer provides a solution to HTTP 403 Forbidden when downloading file over http using Python  I have tried only requests and urllib modules  the other module may provide something better  but this is the one I used to solve most of the problems

User · Answer

I agree with Corey  urllib2 is more complete than urllib and should likely be the module used if you want to do more complex things  but to make the answers more complete  urllib is a simpler module if you want just the basics   import urllib response   urllib urlopen  http   www example com sound mp3   mp3   response read     Will work fine  Or  if you don t want to deal with the  response  object you can call read   directly   import urllib mp3   urllib urlopen  http   www example com sound mp3   read

User · Answer

Just for the sake of completeness  it is also possible to call any program for retrieving files using the subprocess package  Programs dedicated to retrieving files are more powerful than Python functions like urlretrieve  For example  wget can download directories recursively  -R   can deal with FTP  redirects  HTTP proxies  can avoid re-downloading existing files  -nc   and aria2 can do multi-connection downloads which can potentially speed up your downloads   import subprocess subprocess check output   wget    -O    example output file html    https   example com      In Jupyter Notebook  one can also call programs directly with the   syntax    wget -O example output file html https   example com

User · Answer

If you have wget installed  you can use parallel sync   pip install parallel sync  from parallel sync import wget urls     http   something png    http   somthing tar gz    http   somthing zip   wget download   tmp   urls    or a single file  wget download   tmp   urls 0   filenames  x zip   extract True    Doc  https   pythonhosted org parallel sync pages examples html  This is pretty powerful  It can download files in parallel  retry upon failure   and it can even download files on a remote machine

User · Answer

You can get the progress feedback with urlretrieve as well   def report blocknr  blocksize  size       current   blocknr blocksize     sys stdout write   r 0  2f    format 100 0 current size    def downloadFile url       print   n  url     fname   url split      -1      print fname     urllib urlretrieve url  fname  report

User · Answer

You can use PycURL on Python 2 and 3   import pycurl  FILE DEST    pycurl html  FILE SRC    http   pycurl io    with open FILE DEST   wb   as f      c   pycurl Curl       c setopt c URL  FILE SRC      c setopt c WRITEDATA  f      c perform       c close

User · Answer

In 2012  use the python requests library   gt  gt  gt  import requests  gt  gt  gt    gt  gt  gt  url    http   download thinkbroadband com 10MB zip   gt  gt  gt  r   requests get url   gt  gt  gt  print len r content  10485760   You can run pip install requests to get it   Requests has many advantages over the alternatives because the API is much simpler  This is especially true if you have to do authentication  urllib and urllib2 are pretty unintuitive and painful in this case     2015-12-30  People have expressed admiration for the progress bar  It s cool  sure  There are several off-the-shelf solutions now  including tqdm   from tqdm import tqdm import requests  url    http   download thinkbroadband com 10MB zip  response   requests get url  stream True   with open  10MB    wb   as handle      for data in tqdm response iter content             handle write data    This is essentially the implementation  kvance described 30 months ago

[python] How to download a file over HTTP?

Examples related to python

Examples related to http

Examples related to urllib