Is it possible to get an Excel document s row count without loading the entire document into memory

Question

I m working on an application that processes huge Excel 2007 files  and I m using OpenPyXL to do it  OpenPyXL has two different methods of reading an Excel file - one  normal  method where the entire document is loaded into memory at once  and one method where iterators are used to read row-by-row   The problem is that when I m using the iterator method  I don t get any document meta-data like column widths and row column count  and i really need this data  I assume this data is stored in the Excel document close to the top  so it shouldn t be necessary to load the whole 10MB file into memory to get access to it   So  is there a way to get ahold of the row column count and column widths without loading the entire document into memory first

User · Answer

https   pythonhosted org pyexcel iapi pyexcel sheets Sheet html  see   row range    Utility function to get row range  if you use pyexcel  can call row range get max rows   python 3 4 test pass

User · Answer

The solution suggested in this answer has been deprecated  and might no longer work   Taking a look at the source code of OpenPyXL  IterableWorksheet  I ve figured out how to get the column and row count from an iterator worksheet  wb   load workbook path  use iterators True  sheet   wb worksheets 0   row count   sheet get highest row   - 1 column count   letter to index sheet get highest column      1  IterableWorksheet get highest column returns a string with the column letter that you can see in Excel  e g   quot A quot    quot B quot    quot C quot  etc  Therefore I ve also written a function to translate the column letter to a zero based index  def letter to index letter        quot  quot  quot Converts a column letter  e g   quot A quot    quot B quot    quot AA quot    quot BC quot  etc  to a zero based     column index       A becomes 0  B becomes 1  Z becomes 25  AA becomes 26 etc       Args          letter  str   The column index letter      Returns          The column index as an integer       quot  quot  quot      letter   letter upper       result   0      for index  char in enumerate reversed letter              Get the ASCII number of the letter and subtract 64 so that A           corresponds to 1          num   ord char  - 64            Multiply the number with 26 to the power of  index  to get the correct           value of the letter based on it s index in the string          final num    26    index    num          result    final num        Subtract 1 from the result to make it zero-based before returning      return result - 1  I still haven t figured out how to get the column sizes though  so I ve decided to use a fixed-width font and automatically scaled columns in my application

User · Answer

Adding on to what Hubro said  apparently get highest row   has been deprecated  Using the max row and max column properties returns the row and column count  For example       wb   load workbook path  use iterators True      sheet   wb worksheets 0       row count   sheet max row     column count   sheet max column

User · Answer

Python 3  import openpyxl as xl  wb   xl load workbook  Sample xlsx   enumerate    the 2 lines under do the same   sheet   wb get sheet by name  sheet    sheet   wb worksheets 0   row count   sheet max row column count   sheet max column   this works fore me

User · Answer

This might be extremely convoluted and I might be missing the obvious  but without OpenPyXL filling in the column dimensions in Iterable Worksheets  see my comment above   the only way I can see of finding the column size without loading everything is to parse the xml directly   from xml etree ElementTree import iterparse from openpyxl import load workbook wb load workbook   path to workbook xlsx   use iterators True  ws wb worksheets 0  xml   ws  xml source xml seek 0   for   x in iterparse xml        name  x tag split      -1      if name   col           print  Column   max s  Width    width s  x attrib   width   x attrib  width        if name   cols           print  break before reading the rest of the file          break

[python] Is it possible to get an Excel document's row count without loading the entire document into memory?

Examples related to python

Examples related to openpyxl