[python] Pandas: Looking up the list of sheets in an excel file

I have tried xlrd, pandas, openpyxl and other such libraries and all of them seem to take exponential time as the file size increase as it reads the entire file. The other solutions mentioned above where they used 'on_demand' did not work for me. If you just want to get the sheet names initially, the following function works for xlsx files.

def get_sheet_details(file_path):
    sheets = []
    file_name = os.path.splitext(os.path.split(file_path)[-1])[0]
    # Make a temporary directory with the file name
    directory_to_extract_to = os.path.join(settings.MEDIA_ROOT, file_name)
    os.mkdir(directory_to_extract_to)

    # Extract the xlsx file as it is just a zip file
    zip_ref = zipfile.ZipFile(file_path, 'r')
    zip_ref.extractall(directory_to_extract_to)
    zip_ref.close()

    # Open the workbook.xml which is very light and only has meta data, get sheets from it
    path_to_workbook = os.path.join(directory_to_extract_to, 'xl', 'workbook.xml')
    with open(path_to_workbook, 'r') as f:
        xml = f.read()
        dictionary = xmltodict.parse(xml)
        for sheet in dictionary['workbook']['sheets']['sheet']:
            sheet_details = {
                'id': sheet['@sheetId'],
                'name': sheet['@name']
            }
            sheets.append(sheet_details)

    # Delete the extracted files directory
    shutil.rmtree(directory_to_extract_to)
    return sheets

Since all xlsx are basically zipped files, we extract the underlying xml data and read sheet names from the workbook directly which takes a fraction of a second as compared to the library functions.

Benchmarking: (On a 6mb xlsx file with 4 sheets)
Pandas, xlrd: 12 seconds
openpyxl: 24 seconds
Proposed method: 0.4 seconds

Since my requirement was just reading the sheet names, the unnecessary overhead of reading the entire time was bugging me so I took this route instead.

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to excel

Python: Pandas pd.read_excel giving ImportError: Install xlrd >= 0.9.0 for Excel support Converting unix time into date-time via excel How to increment a letter N times per iteration and store in an array? 'Microsoft.ACE.OLEDB.16.0' provider is not registered on the local machine. (System.Data) How to import an Excel file into SQL Server? Copy filtered data to another sheet using VBA Better way to find last used row Could pandas use column as index? Check if a value is in an array or not with Excel VBA How to sort dates from Oldest to Newest in Excel?

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to openpyxl

How to save a new sheet in an existing excel file, using Pandas? How to install Openpyxl with pip No module named 'openpyxl' - Python 3.4 - Ubuntu Pandas: Looking up the list of sheets in an excel file Is it possible to get an Excel document's row count without loading the entire document into memory? openpyxl - adjust column width size

Examples related to xlrd

xlrd.biffh.XLRDError: Excel xlsx file; not supported Edit existing excel workbooks and sheets with xlrd and xlwt Read Excel File in Python Pandas: Looking up the list of sheets in an excel file python xlrd unsupported format, or corrupt file. writing to existing workbook using xlwt Create a .csv file with values from a Python list