[python] Extract Google Drive zip from Google colab notebook

I already have a zip of (2K images) dataset on a google drive. I have to use it in a ML training algorithm. Below Code extracts the content in a string format:

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import io
import zipfile
# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# Download a file based on its file ID.
#
# A file ID looks like: laggVyWshwcyP6kEI-y_W3P8D26sz
file_id = '1T80o3Jh3tHPO7hI5FBxcX-jFnxEuUE9K' #-- Updated File ID for my zip
downloaded = drive.CreateFile({'id': file_id})
#print('Downloaded content "{}"'.format(downloaded.GetContentString(encoding='cp862')))

But I have to extract and store it in a separate directory as it would be easier for processing (as well as for understanding) of the dataset.

I tried to extract it further, but getting "Not a zipfile error"

dataset = io.BytesIO(downloaded.encode('cp862'))
zip_ref = zipfile.ZipFile(dataset, "r")
zip_ref.extractall()
zip_ref.close()

Google Drive Dataset

Note: Dataset is just for reference, I have already downloaded this zip to my google drive, and I'm referring to file in my drive only.

This question is related to python google-drive-api google-colaboratory zipfile

The answer is


Mount GDrive:

from google.colab import drive
drive.mount('/content/gdrive')

Open the link -> copy authorization code -> paste that into the prompt and press "Enter"

Check GDrive access:

!ls "/content/gdrive/My Drive"

Unzip (q stands for "quiet") file from GDrive:

!unzip -q "/content/gdrive/My Drive/dataset.zip"

Try this:

!unpack file.zip

If its now working or file is 7z try below

!apt-get install p7zip-full
!p7zip -d file_name.tar.7z
!tar -xvf file_name.tar

Or

!pip install pyunpack
!pip install patool

from pyunpack import Archive
Archive(‘file_name.tar.7z’).extractall(‘path/to/’)
!tar -xvf file_name.tar

First create a new directory:

!mkdir file_destination

Now, it's the time to inflate the directory with the unzipped files with this:

!unzip file_location -d file_destination

For Python

Connect to drive,

from google.colab import drive
drive.mount('/content/drive')

Check for directory

!ls and !pwd

For unzip

!unzip drive/"My Drive"/images.zip

First, install unzip on colab:

!apt install unzip

then use unzip to extract your files:

!unzip  source.zip -d destination.zip

TO unzip a file to a directory:

!unzip path_to_file.zip -d path_to_directory

Instead of GetContentString(), use GetContentFile() instead. It will save the file instead of returning the string.

downloaded.GetContentFile('images.zip') 

Then you can unzip it later with unzip.


SIMPLE WAY TO CONNECT

1) You'll have to verify authentication

from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()

2)To fuse google drive

!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse

3)To verify credentials

import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

4)Create a drive name to use it in colab ('gdrive') and check if it's working

!mkdir gdrive
!google-drive-ocamlfuse gdrive
!ls gdrive
!cd gdrive

To extract Google Drive zip from a Google colab notebook:

import zipfile
from google.colab import drive

drive.mount('/content/drive/')

zip_ref = zipfile.ZipFile("/content/drive/My Drive/ML/DataSet.zip", 'r')
zip_ref.extractall("/tmp")
zip_ref.close()

You can simply use this

!unzip file_location

After mounting on drive, use shutil.unpack_archive. It works with almost all archive formats (e.g., “zip”, “tar”, “gztar”, “bztar”, “xztar”) and it's simple:

import shutil
shutil.unpack_archive("filename", "path_to_extract")

in my idea, you must go to a certain path for example:

from google.colab import drive drive.mount('/content/drive/') cd drive/MyDrive/f/

then :

!apt install unzip !unzip zip_folder.zip -d unzip_folder enter image description here


Colab research team has a notebook for helping you out.

Still, in short, if you are dealing with a zip file, like for me it is mostly thousands of images and I want to store them in a folder within drive then do this --

!unzip -u "/content/drive/My Drive/folder/example.zip" -d "/content/drive/My Drive/folder/NewFolder"

-u part controls extraction only if new/necessary. It is important if suddenly you lose connection or hardware switches off.

-d creates the directory and extracted files are stored there.

Of course before doing this you need to mount your drive

from google.colab import drive 
drive.mount('/content/drive')

I hope this helps! Cheers!!


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to google-drive-api

Extract Google Drive zip from Google colab notebook How to automatically import data from uploaded CSV or XLS file into Google Sheets wget/curl large file from google drive How to embed a Google Drive folder in a website Direct download from Google Drive using Google Drive API Is there a Google Keep API? Google drive limit number of download How do I display images from Google Drive on a website? How to get the file ID so I can perform a download of a file from Google Drive API on Android? SVN Repository on Google Drive or DropBox

Examples related to google-colaboratory

How to prevent Google Colab from disconnecting? How do I install Python packages in Google's Colab? Extract Google Drive zip from Google colab notebook Importing .py files in Google Colab Google Colab: how to read data from my google drive? Changing directory in Google colab (breaking out of the python interpreter) Import data into Google Colaboratory wget/curl large file from google drive

Examples related to zipfile

Extract Google Drive zip from Google colab notebook Python: Open file in zip without temporarily extracting it Unzipping files in Python How to create a zip archive of a directory in Python?