[python] xls to csv converter

As much as I hate to rely on Windows Excel proprietary software, which is not cross-platform, my testing of csvkit for .xls, which uses xlrd under the hood, failed to correctly parse dates (even when using the commandline parameters to specify strptime format).

For example, this xls file, when parsed with csvkit, will convert cell G1 of 12/31/2002 to 37621, whereas when converted to csv via excel -> save_as (using below) cell G1 will be "December 31, 2002".

import re
import os
from win32com.client import Dispatch
xlCSVMSDOS = 24

class CsvConverter(object):
    def __init__(self, *, input_dir, output_dir):
        self._excel = None
        self.input_dir = input_dir
        self.output_dir = output_dir

        if not os.path.isdir(self.output_dir):
            os.makedirs(self.output_dir)

    def isSheetEmpty(self, sheet):
        # https://archive.is/RuxR7
        # WorksheetFunction.CountA(ActiveSheet.UsedRange) = 0 And ActiveSheet.Shapes.Count = 0

        return \
            (not self._excel.WorksheetFunction.CountA(sheet.UsedRange)) \
            and \
            (not sheet.Shapes.Count)

    def getNonEmptySheets(self, wb, as_name=False):
        return [ \
            (sheet.Name if as_name else sheet) \
            for sheet in wb.Sheets \
            if not self.isSheetEmpty(sheet) \
        ]

    def saveWorkbookAsCsv(self, wb, csv_path):
        non_empty_sheet_names = self.getNonEmptySheets(wb, as_name=True)

        assert (len(non_empty_sheet_names) == 1), \
            "Expected exactly 1 sheet but found %i non-empty sheets: '%s'" \
            %(
                len(non_empty_sheet_names),
                "', '".join(name.replace("'", r"\'") for name in non_empty_sheet_names)
            )

        wb.Worksheets(non_empty_sheet_names[0]).SaveAs(csv_path, xlCSVMSDOS)
        wb.Saved = 1

    def isXlsFilename(self, filename):
        return bool(re.search(r'(?i)\.xls$', filename))

    def batchConvertXlsToCsv(self):
        xls_names = tuple( filename for filename in next(os.walk(self.input_dir))[2] if self.isXlsFilename(filename) )

        self._excel = Dispatch('Excel.Application')
        try:
            for xls_name in xls_names:
                csv_path = os.path.join(self.output_dir, '%s.csv' %os.path.splitext(xls_name)[0])
                if not os.path.isfile(csv_path):
                    workbook = self._excel.Workbooks.Open(os.path.join(self.input_dir, xls_name))
                    try:
                        self.saveWorkbookAsCsv(workbook, csv_path)
                    finally:
                        workbook.Close()
        finally:
            if not len(self._excel.Workbooks):
                self._excel.Quit()

            self._excel = None

if __name__ == '__main__':
    self = CsvConverter(
        input_dir='C:\\data\\xls\\',
        output_dir='C:\\data\\csv\\'
    )

    self.batchConvertXlsToCsv()

The above will take an input_dir containing .xls and output them to output_dir as .csv -- it will assert that there is exactly 1 non-empty sheet in the .xls; if you need to handle multiple sheets into multiple csv then you'll need to edit saveWorkbookAsCsv.

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to excel

Python: Pandas pd.read_excel giving ImportError: Install xlrd >= 0.9.0 for Excel support Converting unix time into date-time via excel How to increment a letter N times per iteration and store in an array? 'Microsoft.ACE.OLEDB.16.0' provider is not registered on the local machine. (System.Data) How to import an Excel file into SQL Server? Copy filtered data to another sheet using VBA Better way to find last used row Could pandas use column as index? Check if a value is in an array or not with Excel VBA How to sort dates from Oldest to Newest in Excel?

Examples related to csv

Pandas: ValueError: cannot convert float NaN to integer Export result set on Dbeaver to CSV Convert txt to csv python script How to import an Excel file into SQL Server? "CSV file does not exist" for a filename with embedded quotes Save Dataframe to csv directly to s3 Python Data-frame Object has no Attribute (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape How to write to a CSV line by line? How to check encoding of a CSV file

Examples related to xls

Export to xls using angularjs How can I quickly and easily convert spreadsheet data to JSON? count files in specific folder and display the number into 1 cel xls to csv converter How to parse Excel (XLS) file in Javascript/HTML5 Converting JSON to XLS/CSV in Java Importing Excel files into R, xlsx or xls .NET Excel Library that can read/write .xls files Reading/parsing Excel (xls) files with Python How to change pivot table data source in Excel?

Examples related to export-to-csv

Excel: macro to export worksheet as CSV file without leaving my current Excel sheet Export to csv/excel from kibana How to export data from Spark SQL to CSV How to export a table dataframe in PySpark to csv? Java - Writing strings to a CSV file How to export dataGridView data Instantly to Excel on button click? Export javascript data to CSV file without server interaction How to create and download a csv file from php script? Export to CSV using jQuery and html export html table to csv