I'd like to offer an updated Python 3 version of Vishal's excellent answer, which was using Python 2, along with some explanation of the adaptations / changes, which may have been already mentioned.
from io import BytesIO
from zipfile import ZipFile
import urllib.request
url = urllib.request.urlopen("http://www.unece.org/fileadmin/DAM/cefact/locode/loc162txt.zip")
with ZipFile(BytesIO(url.read())) as my_zip_file:
for contained_file in my_zip_file.namelist():
# with open(("unzipped_and_read_" + contained_file + ".file"), "wb") as output:
for line in my_zip_file.open(contained_file).readlines():
print(line)
# output.write(line)
Necessary changes:
StringIO
module in Python 3 (it's been moved to io.StringIO
). Instead, I use io.BytesIO
]2, because we will be handling a bytestream -- Docs, also this thread.urllib.urlopen
function from Python 2.6 and earlier has been discontinued; urllib.request.urlopen()
corresponds to the old urllib2.urlopen
.", Docs and this thread.Note:
b'some text'
. This is expected, as they aren't strings - remember, we're reading a bytestream. Have a look at Dan04's excellent answer.A few minor changes I made:
with ... as
instead of zipfile = ...
according to the Docs..namelist()
to cycle through all the files in the zip and print their contents.ZipFile
object into the with
statement, although I'm not sure if that's better."unzipped_and_read_"
to the beginning of the filename and a ".file"
extension (I prefer not to use ".txt"
for files with bytestrings). The indenting of the code will, of course, need to be adjusted if you want to use it.
"wb"
; I have a feeling that writing binary opens a can of worms anyway...What I didn't do:
Here's a way:
import urllib.request
import shutil
with urllib.request.urlopen("http://www.unece.org/fileadmin/DAM/cefact/locode/2015-2_UNLOCODE_SecretariatNotes.pdf") as response, open("downloaded_file.pdf", 'w') as out_file:
shutil.copyfileobj(response, out_file)