If you:
Below was benchmarked on a ~10Mb xlsx
, xlsb
file.
xlsx, xls
from openpyxl import load_workbook
def get_sheetnames_xlsx(filepath):
wb = load_workbook(filepath, read_only=True, keep_links=False)
return wb.sheetnames
Benchmarks: ~ 14x speed improvement
# get_sheetnames_xlsx vs pd.read_excel
225 ms ± 6.21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.25 s ± 140 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
xlsb
from pyxlsb import open_workbook
def get_sheetnames_xlsb(filepath):
with open_workbook(filepath) as wb:
return wb.sheets
Benchmarks: ~ 56x speed improvement
# get_sheetnames_xlsb vs pd.read_excel
96.4 ms ± 1.61 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
5.36 s ± 162 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Notes:
xlrd
is no longer maintained as of 2020