[python] How to unzip gz file using Python

I need to extract a gz file that I have downloaded from an FTP site to a local Windows file server. I have the variables set for the local path of the file, and I know it can be used by GZIP muddle.

How can I do this? The file inside the GZ file is an XML file.

This question is related to python python-2.7 gzip

The answer is


from sh import gunzip

gunzip('/tmp/file1.gz')

If you are parsing the file after unzipping it, don't forget to use decode() method, is necessary when you open a file as binary.

import gzip
with gzip.open(file.gz, 'rb') as f:
    for line in f:
        print(line.decode().strip())

It is very simple.. Here you go !!

import gzip

#path_to_file_to_be_extracted

ip = sample.gzip

#output file to be filled

op = open("output_file","w") 

with gzip.open(ip,"rb") as ip_byte:
    op.write(ip_byte.read().decode("utf-8")
    wf.close()

Maybe you want pass it to pandas also.

with gzip.open('features_train.csv.gz') as f:

    features_train = pd.read_csv(f)

features_train.head()

From the documentation:

import gzip
f = gzip.open('file.txt.gz', 'rb')
file_content = f.read()
f.close()

import gzip
import shutil
with gzip.open('file.txt.gz', 'rb') as f_in:
    with open('file.txt', 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)

Not an exact answer because you're using xml data and there is currently no pd.read_xml() function (as of v0.23.4), but pandas (starting with v0.21.0) can uncompress the file for you! Thanks Wes!

import pandas as pd
import os
fn = '../data/file_to_load.json.gz'
print(os.path.isfile(fn))
df = pd.read_json(fn, lines=True, compression='gzip')
df.tail()

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to python-2.7

Numpy, multiply array with scalar Not able to install Python packages [SSL: TLSV1_ALERT_PROTOCOL_VERSION] How to create a new text file using Python Could not find a version that satisfies the requirement tensorflow Python: Pandas pd.read_excel giving ImportError: Install xlrd >= 0.9.0 for Excel support Display/Print one column from a DataFrame of Series in Pandas How to calculate 1st and 3rd quartiles? How can I read pdf in python? How to completely uninstall python 2.7.13 on Ubuntu 16.04 Check key exist in python dict

Examples related to gzip

gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now How to unzip gz file using Python How to enable GZIP compression in IIS 7.5 Using GZIP compression with Spring Boot/MVC/JavaConfig with RESTful How are zlib, gzip and zip related? What do they have in common and how are they different? How to uncompress a tar.gz in another directory compression and decompression of string data in java Extract and delete all .gz in a directory- Linux How to extract filename.tar.gz file Read from a gzip file in python