[python] Download and save PDF file with Python requests module

I am trying to download a PDF file from a website and save it to disk. My attempts either fail with encoding errors or result in blank PDFs.

In [1]: import requests

In [2]: url = 'http://www.hrecos.org//images/Data/forweb/HRTVBSH.Metadata.pdf'

In [3]: response = requests.get(url)

In [4]: with open('/tmp/metadata.pdf', 'wb') as f:
   ...:     f.write(response.text)
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-4-4be915a4f032> in <module>()
      1 with open('/tmp/metadata.pdf', 'wb') as f:
----> 2     f.write(response.text)
      3 

UnicodeEncodeError: 'ascii' codec can't encode characters in position 11-14: ordinal not in range(128)

In [5]: import codecs

In [6]: with codecs.open('/tmp/metadata.pdf', 'wb', encoding='utf8') as f:
   ...:     f.write(response.text)
   ...: 

I know it is a codec problem of some kind but I can't seem to get it to work.

This question is related to python python-2.7 python-requests

The answer is


regarding Kevin answer to write in a folder tmp, it should be like this:

with open('./tmp/metadata.pdf', 'wb') as f:
    f.write(response.content)

he forgot . before the address and of-course your folder tmp should have been created already


Please note I'm a beginner. If My solution is wrong, please feel free to correct and/or let me know. I may learn something new too.

My solution:

Change the downloadPath accordingly to where you want your file to be saved. Feel free to use the absolute path too for your usage.

Save the below as downloadFile.py.

Usage: python downloadFile.py url-of-the-file-to-download new-file-name.extension

Remember to add an extension!

Example usage: python downloadFile.py http://www.google.co.uk google.html

import requests
import sys
import os

def downloadFile(url, fileName):
    with open(fileName, "wb") as file:
        response = requests.get(url)
        file.write(response.content)


scriptPath = sys.path[0]
downloadPath = os.path.join(scriptPath, '../Downloads/')
url = sys.argv[1]
fileName = sys.argv[2]      
print('path of the script: ' + scriptPath)
print('downloading file to: ' + downloadPath)
downloadFile(url, downloadPath + fileName)
print('file downloaded...')
print('exiting program...')

Generally, this should work in Python3:

import urllib.request 
..
urllib.request.get(url)

Remember that urllib and urllib2 don't work properly after Python2.

If in some mysterious cases requests don't work (happened with me), you can also try using

wget.download(url)

Related:

Here's a decent explanation/solution to find and download all pdf files on a webpage:

https://medium.com/@dementorwriter/notesdownloader-use-web-scraping-to-download-all-pdfs-with-python-511ea9f55e48


In Python 3, I find pathlib is the easiest way to do this. Request's response.content marries up nicely with pathlib's write_bytes.

from pathlib import Path
import requests
filename = Path('metadata.pdf')
url = 'http://www.hrecos.org//images/Data/forweb/HRTVBSH.Metadata.pdf'
response = requests.get(url)
filename.write_bytes(response.content)

You can use urllib:

import urllib.request
urllib.request.urlretrieve(url, "filename.pdf")

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to python-2.7

Numpy, multiply array with scalar Not able to install Python packages [SSL: TLSV1_ALERT_PROTOCOL_VERSION] How to create a new text file using Python Could not find a version that satisfies the requirement tensorflow Python: Pandas pd.read_excel giving ImportError: Install xlrd >= 0.9.0 for Excel support Display/Print one column from a DataFrame of Series in Pandas How to calculate 1st and 3rd quartiles? How can I read pdf in python? How to completely uninstall python 2.7.13 on Ubuntu 16.04 Check key exist in python dict

Examples related to python-requests

Requests (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.") Error in PyCharm requesting website Use python requests to download CSV Download and save PDF file with Python requests module ImportError: No module named 'Queue' How to use cookies in Python Requests Saving response from Requests to file How to get Python requests to trust a self signed SSL certificate? How to install requests module in Python 3.4, instead of 2.7 Upload Image using POST form data in Python-requests SSL InsecurePlatform error when using Requests package