Pandas Looking up the list of sheets in an excel file

Question

The new version of Pandas uses the following interface to load Excel files   read excel  path to file xls    Sheet1   index col None  na values   NA      but what if I don t know the sheets that are available    For example  I am working with excel files that the following sheets     Data 1  Data 2      Data N  foo  bar   but I don t know N a priori   Is there any way to get the list of sheets from an excel document in Pandas

User · Answer

If you   care about performance don t need the data in the file at execution time  want to go with conventional libraries vs rolling your own solution  Below was benchmarked on a  10Mb xlsx  xlsb file  xlsx  xls from openpyxl import load workbook  def get sheetnames xlsx filepath       wb   load workbook filepath  read only True  keep links False      return wb sheetnames  Benchmarks    14x speed improvement   get sheetnames xlsx vs pd read excel 225 ms    6 21 ms per loop  mean    std  dev  of 7 runs  1 loop each  3 25 s    140 ms per loop  mean    std  dev  of 7 runs  1 loop each   xlsb from pyxlsb import open workbook  def get sheetnames xlsb filepath     with open workbook filepath  as wb       return wb sheets  Benchmarks    56x speed improvement   get sheetnames xlsb vs pd read excel 96 4 ms    1 61 ms per loop  mean    std  dev  of 7 runs  10 loops each  5 36 s    162 ms per loop  mean    std  dev  of 7 runs  1 loop each   Notes   This is a good resource - http   www python-excel org  xlrd is no longer maintained as of 2020

User · Answer

You should explicitly  specify the second parameter  sheetname  as None  like this     df   pandas read excel   yourPath FileName xlsx   None      df  are all sheets as a dictionary of DataFrames  you can verify it by run this   df keys     result like this     u 201610   u 201601   u 201701   u 201702   u 201703   u 201704   u 201705   u 201706   u 201612   u fund   u 201603   u 201602   u 201605   u 201607   u 201606   u 201608   u 201512   u 201611   u 201604     please refer pandas doc for more details  https   pandas pydata org pandas-docs stable generated pandas read excel html

User · Answer

Building on  dhwanil shah  s answer  you do not need to extract the whole file  With zf open it is possible to read from a zipped file directly   import xml etree ElementTree as ET import zipfile  def xlsxSheets f       zf   zipfile ZipFile f       f   zf open r xl workbook xml        l   f readline       l   f readline       root   ET fromstring l      sheets        for c in root findall   http   schemas openxmlformats org spreadsheetml 2006 main sheets              sheets append c attrib  name        return sheets   The two consecutive readlines are ugly  but the content is only in the second line of the text  No need to parse the whole file   This solution seems to be much faster than the read excel version  and most likely also faster than the full extract version

User · Answer

I have tried xlrd  pandas  openpyxl and other such libraries and all of them seem to take exponential time as the file size increase as it reads the entire file  The other solutions mentioned above where they used  on demand  did not work for me  If you just want to get the sheet names initially  the following function works for xlsx files   def get sheet details file path       sheets          file name   os path splitext os path split file path  -1   0        Make a temporary directory with the file name     directory to extract to   os path join settings MEDIA ROOT  file name      os mkdir directory to extract to         Extract the xlsx file as it is just a zip file     zip ref   zipfile ZipFile file path   r       zip ref extractall directory to extract to      zip ref close          Open the workbook xml which is very light and only has meta data  get sheets from it     path to workbook   os path join directory to extract to   xl    workbook xml       with open path to workbook   r   as f          xml   f read           dictionary   xmltodict parse xml          for sheet in dictionary  workbook    sheets    sheet                sheet details                      id   sheet   sheetId                     name   sheet   name                             sheets append sheet details         Delete the extracted files directory     shutil rmtree directory to extract to      return sheets   Since all xlsx are basically zipped files  we extract the underlying xml data and read sheet names from the workbook directly which takes a fraction of a second as compared to the library functions   Benchmarking   On a 6mb xlsx file with 4 sheets   Pandas  xlrd  12 seconds  openpyxl  24 seconds  Proposed method  0 4 seconds   Since my requirement was just reading the sheet names  the unnecessary overhead of reading the entire time was bugging me so I took this route instead

User · Answer

from openpyxl import load workbook  sheets   load workbook excel file  read only True  sheetnames   For a 5MB Excel file I m working with  load workbook without the read only flag took 8 24s  With the read only flag it only took 39 6 ms  If you still want to use an Excel library and not drop to an xml solution  that s much faster than the methods that parse the whole file

User · Answer

You can still use the ExcelFile class  and the sheet names attribute    xl   pd ExcelFile  foo xls    xl sheet names    see all sheet names  xl parse sheet name     read a specific sheet to DataFrame   see docs for parse for more options

User · Answer

This is the fastest way I have found  inspired by  divingTobi s answer  All The answers based on xlrd  openpyxl or pandas are slow for me  as they all load the whole file first    from zipfile import ZipFile from bs4 import BeautifulSoup    you also need to install  lxml  for the XML parser  with ZipFile file  as zipped file      summary   zipped file open r xl workbook xml   read   soup   BeautifulSoup summary   xml   sheets    sheet get  name   for sheet in soup find all  sheet

[python] Pandas: Looking up the list of sheets in an excel file

`xlsx, xls`

`xlsb`

Examples related to python

Examples related to excel

Examples related to pandas

Examples related to openpyxl

Examples related to xlrd