Download and save PDF file with Python requests module

Question

I am trying to download a PDF file from a website and save it to disk  My attempts either fail with encoding errors or result in blank PDFs   In  1   import requests  In  2   url    http   www hrecos org  images Data forweb HRTVBSH Metadata pdf   In  3   response   requests get url   In  4   with open   tmp metadata pdf    wb   as f              f write response text  --------------------------------------------------------------------------- UnicodeEncodeError                        Traceback  most recent call last   lt ipython-input-4-4be915a4f032 gt  in  lt module gt          1 with open   tmp metadata pdf    wb   as f  ---- gt  2     f write response text        3   UnicodeEncodeError   ascii  codec can t encode characters in position 11-14  ordinal not in range 128   In  5   import codecs  In  6   with codecs open   tmp metadata pdf    wb   encoding  utf8   as f              f write response text             I know it is a codec problem of some kind but I can t seem to get it to work

User · Answer

Generally  this should work in Python3  import urllib request     urllib request get url   Remember that urllib and urllib2 don t work properly after Python2  If in some mysterious cases requests don t work  happened with me   you can also try using wget download url   Related  Here s a decent explanation solution to find and download all pdf files on a webpage  https   medium com  dementorwriter notesdownloader-use-web-scraping-to-download-all-pdfs-with-python-511ea9f55e48

User · Answer

Please note I m a beginner  If My solution is wrong  please feel free to correct and or let me know  I may learn something new too   My solution   Change the downloadPath accordingly  to where you want your file to be saved  Feel free to use the absolute path too for your usage    Save the below as downloadFile py   Usage  python downloadFile py url-of-the-file-to-download new-file-name extension  Remember to add an extension   Example usage  python downloadFile py http   www google co uk google html  import requests import sys import os  def downloadFile url  fileName       with open fileName   wb   as file          response   requests get url          file write response content    scriptPath   sys path 0  downloadPath   os path join scriptPath      Downloads    url   sys argv 1  fileName   sys argv 2        print  path of the script      scriptPath  print  downloading file to      downloadPath  downloadFile url  downloadPath   fileName  print  file downloaded      print  exiting program

User · Answer

You should use response content in this case   with open   tmp metadata pdf    wb   as f      f write response content    From the document      You can also access the response body as bytes  for non-text requests    gt  gt  gt  r content b    repository    open issues  0  url   https   github com        So that means  response text return the output as a string object  use it when you re downloading a text file  Such as HTML file  etc   And response content return the output as bytes object  use it when you re downloading a binary file  Such as PDF file  audio file  image  etc     You can also use response raw instead  However  use it when the file which you re about to download is large  Below is a basic example which you can also find in the document   import requests  url    http   www hrecos org  images Data forweb HRTVBSH Metadata pdf  r   requests get url  stream True   with open   tmp metadata pdf    wb   as fd      for chunk in r iter content chunk size           fd write chunk    chunk size is the chunk size which you want to use  If you set it as 2000  then requests will download that file the first 2000 bytes  write them into the file  and do this again  again and again  unless it finished   So this can save your RAM  But I d prefer use response content instead in this case since your file is small  As you can see use response raw is complex     Relates     How to download large file in python with requests py    How to download image using requests

User · Answer

In Python 3  I find pathlib is the easiest way to do this  Request s response content marries up nicely with pathlib s write bytes  from pathlib import Path import requests filename   Path  metadata pdf   url    http   www hrecos org  images Data forweb HRTVBSH Metadata pdf  response   requests get url  filename write bytes response content

User · Answer

You can use urllib     import urllib request urllib request urlretrieve url   filename pdf

User · Answer

regarding Kevin answer to write in a folder tmp  it should be like this   with open    tmp metadata pdf    wb   as f      f write response content    he forgot   before the address and of-course your folder tmp should have been created already

[python] Download and save PDF file with Python requests module

Examples related to python

Examples related to python-2.7

Examples related to python-requests