Reading parsing Excel xls files with Python

Question

What is the best way to read Excel  XLS  files with Python  not CSV files    Is there a built-in package which is supported by default in Python to do this task

User · Answer

You might also consider running the  non-python  program xls2csv  Feed it an xls file  and you should get back a csv

User · Answer

If you need old XLS format  Below code for ansii  cp1251    import xlrd  file u C  Landau task 6200 xlsx   try      book   xlrd open workbook file encoding override  cp1251     except      book   xlrd open workbook file  print  The number of worksheets is  0   format book nsheets   print  Worksheet name s    0   format book sheet names     sh   book sheet by index 0  print   0   1   2   format sh name  sh nrows  sh ncols   print  Cell D30 is  0   format sh cell value rowx 29  colx 3    for rx in range sh nrows      print sh row rx

User · Answer

with open csv filename  as file          data   file read        with open xl file name   w   as file          file write data    You can turn CSV to excel like above with inbuilt packages  CSV can be handled with an inbuilt package of dictreader and dictwriter which will work the same way as python dictionary works  which makes it a ton easy I am currently unaware of any inbuilt packages for excel but I had come across openpyxl  It was also pretty straight forward and simple You can see the code snippet below hope this helps      import openpyxl     book   openpyxl load workbook filename      sheet   book active      result  sheet  AP2       print result value

User · Answer

Using pandas  import pandas as pd  xls   pd ExcelFile r quot yourfilename xls quot    use r before absolute file path   sheetX   xls parse 2   2 is the sheet number 1 thus if the file has only 1 sheet write 0 in paranthesis  var1   sheetX  ColumnName    print var1 1    1 is the row number

User · Answer

I highly recommend xlrd for reading  xls files   voyager mentioned the use of COM automation  Having done this myself a few years ago  be warned that doing this is a real PITA  The number of caveats is huge and the documentation is lacking and annoying  I ran into many weird bugs and gotchas  some of which took many hours to figure out   UPDATE  For newer  xlsx files  the recommended library for reading and writing appears to be openpyxl  thanks  Ikar Pohorsk

User · Answer

Python Excelerator handles this task as well  http   ghantoos org 2007 10 25 python-pyexcelerator-small-howto   It s also available in Debian and Ubuntu    sudo apt-get install python-excelerator

User · Answer

You can choose any one of them http   www python-excel org  I would recommended python xlrd library   install it using   pip install xlrd   import using   import xlrd   to open a workbook   workbook   xlrd open workbook  your file name xlsx     open sheet by name  worksheet   workbook sheet by name  Name of the Sheet     open sheet by index  worksheet   workbook sheet by index 0    read  cell value   worksheet cell 0  0  value

User · Answer

For xlsx I like the solution posted earlier as https://web.archive.org/web/20180216070531/https://stackoverflow.com/questions/4371163/reading-xlsx-files-using-python. I uses modules from the standard library only.

def xlsx(fname):
    import zipfile
    from xml.etree.ElementTree import iterparse
    z = zipfile.ZipFile(fname)
    strings = [el.text for e, el in iterparse(z.open('xl/sharedStrings.xml')) if el.tag.endswith('}t')]
    rows = []
    row = {}
    value = ''
    for e, el in iterparse(z.open('xl/worksheets/sheet1.xml')):
        if el.tag.endswith('}v'):  # Example: <v>84</v>                            
            value = el.text
        if el.tag.endswith('}c'):  # Example: <c r="A3" t="s"><v>84</v></c>                                 
            if el.attrib.get('t') == 's':
                value = strings[int(value)]
            letter = el.attrib['r']  # Example: AZ22                         
            while letter[-1].isdigit():
                letter = letter[:-1]
            row[letter] = value
            value = ''
        if el.tag.endswith('}row'):
            rows.append(row)
            row = {}
    return rows

Improvements added are fetching content by sheet name, using re to get the column and checking if sharedstrings are used.

def xlsx(fname,sheet):
    import zipfile
    from xml.etree.ElementTree import iterparse
    import re
    z = zipfile.ZipFile(fname)
    if 'xl/sharedStrings.xml' in z.namelist():
        # Get shared strings
        strings = [element.text for event, element
                   in iterparse(z.open('xl/sharedStrings.xml')) 
                   if element.tag.endswith('}t')]
    sheetdict = { element.attrib['name']:element.attrib['sheetId'] for event,element in iterparse(z.open('xl/workbook.xml'))
                                      if element.tag.endswith('}sheet') }
    rows = []
    row = {}
    value = ''

    if sheet in sheets:
    sheetfile = 'xl/worksheets/sheet'+sheets[sheet]+'.xml'
    #print(sheet,sheetfile)
    for event, element in iterparse(z.open(sheetfile)):
        # get value or index to shared strings
        if element.tag.endswith('}v') or element.tag.endswith('}t'):
            value = element.text
        # If value is a shared string, use value as an index
        if element.tag.endswith('}c'):
            if element.attrib.get('t') == 's':
                value = strings[int(value)]
            # split the row/col information so that the row leter(s) can be separate
            letter = re.sub('\d','',element.attrib['r'])
            row[letter] = value
            value = ''
        if element.tag.endswith('}row'):
            rows.append(row)
            row = {}

    return rows

User · Answer

If the file is really an old  xls  this works for me on python3 just using base open   and pandas  df   pandas read csv open f  encoding    UTF-8    sep   t    Note that the file I m using is tab delimited  less or a text editor should be able to read  xls so that you can sniff out the delimiter  I did not have a lot of luck with xlrd because of     I think     UTF-8 issues

User · Answer

For older  xls files  you can use xlrd either you can use xlrd directly by importing it  Like below import xlrd wb   xlrd open workbook file name   Or you can also use pandas pd read excel   method  but do not forget to specify the engine  though the default is xlrd  it has to be specified  pd read excel file name  engine   xlrd   Both of them work for older  xls file formats  Infact I came across this when I used OpenPyXL  i got the below error InvalidFileException  openpyxl does not support the old  xls file format  please use xlrd to read this file  or convert it to the more recent  xlsx file format

User · Answer

For older Excel files there is the OleFileIO PL module that can read the OLE structured storage format used

User · Answer

I think Pandas is the best way to go  There is already one answer here with Pandas using ExcelFile function  but it did not work properly for me  From here I found the read excel function which works just fine   import pandas as pd dfs   pd read excel  your file name xlsx   sheet name  your sheet name   print dfs head 10     P S  You need to have the xlrd installed for read excel function to work  Update 21-03-2020  As you may see here  there are issues with the xlrd engine and it is going to be deprecated  The openpyxl is the best replacement  So as described here  the canonical syntax should be   dfs   pd read excel  your file name xlsx   sheet name  your sheet name   engine  openpyxl

User · Answer

You can use any of the libraries listed here  like Pyxlreader that is based on JExcelApi  or xlwt   plus COM automation to use Excel itself for the reading of the files  but for that you are introducing Office as a dependency of your software  which might not be always an option

[python] Reading/parsing Excel (xls) files with Python

The answer is

Examples related to python

Examples related to xls

Tags