[c#] Optimal way to Read an Excel file (.xls/.xlsx)

I know that there are different ways to read an Excel file:

  • Iterop
  • Oledb
  • Open Xml SDK

Compatibility is not a question because the program will be executed in a controlled environment.

My Requirement :
Read a file to a DataTable / CUstom Entities (I don't know how to make dynamic properties/fields to an object[column names will be variating in an Excel file])

Use DataTable/Custom Entities to perform some operations using its data.

Update DataTable with the results of the operations

Write it back to excel file.

Which would be simpler.

Also if possible advice me on custom Entities (adding properties/fields to an object dynamically)

This question is related to c# excel oledb openxml-sdk excel-interop

The answer is


Read from excel, modify and write back

 /// <summary>
/// /Reads an excel file and converts it into dataset with each sheet as each table of the dataset
/// </summary>
/// <param name="filename"></param>
/// <param name="headers">If set to true the first row will be considered as headers</param>
/// <returns></returns>
public DataSet Import(string filename, bool headers = true)
{
    var _xl = new Excel.Application();
    var wb = _xl.Workbooks.Open(filename);
    var sheets = wb.Sheets;
    DataSet dataSet = null;
    if (sheets != null && sheets.Count != 0)
    {
        dataSet = new DataSet();
        foreach (var item in sheets)
        {
            var sheet = (Excel.Worksheet)item;
            DataTable dt = null;
            if (sheet != null)
            {
                dt = new DataTable();
                var ColumnCount = ((Excel.Range)sheet.UsedRange.Rows[1, Type.Missing]).Columns.Count;
                var rowCount = ((Excel.Range)sheet.UsedRange.Columns[1, Type.Missing]).Rows.Count;

                for (int j = 0; j < ColumnCount; j++)
                {
                    var cell = (Excel.Range)sheet.Cells[1, j + 1];
                    var column = new DataColumn(headers ? cell.Value : string.Empty);
                    dt.Columns.Add(column);
                }

                for (int i = 0; i < rowCount; i++)
                {
                    var r = dt.NewRow();
                    for (int j = 0; j < ColumnCount; j++)
                    {
                        var cell = (Excel.Range)sheet.Cells[i + 1 + (headers ? 1 : 0), j + 1];
                        r[j] = cell.Value;
                    }
                    dt.Rows.Add(r);
                }

            }
            dataSet.Tables.Add(dt);
        }
    }
    _xl.Quit();
    return dataSet;
}



 public string Export(DataTable dt, bool headers = false)
    {
        var wb = _xl.Workbooks.Add();
        var sheet = (Excel.Worksheet)wb.ActiveSheet;
        //process columns
        for (int i = 0; i < dt.Columns.Count; i++)
        {
            var col = dt.Columns[i];
            //added columns to the top of sheet
            var currentCell = (Excel.Range)sheet.Cells[1, i + 1];
            currentCell.Value = col.ToString();
            currentCell.Font.Bold = true;
            //process rows
            for (int j = 0; j < dt.Rows.Count; j++)
            {
                var row = dt.Rows[j];
                //added rows to sheet
                var cell = (Excel.Range)sheet.Cells[j + 1 + 1, i + 1];
                cell.Value = row[i];
            }
            currentCell.EntireColumn.AutoFit();
        }
        var fileName="{somepath/somefile.xlsx}";
        wb.SaveCopyAs(fileName);
        _xl.Quit();
        return fileName;
    }

I realize this question was asked nearly 7 years ago but it's still a top Google search result for certain keywords regarding importing excel data with C#, so I wanted to provide an alternative based on some recent tech developments.

Importing Excel data has become such a common task to my everyday duties, that I've streamlined the process and documented the method on my blog: best way to read excel file in c#.

I use NPOI because it can read/write Excel files without Microsoft Office installed and it doesn't use COM+ or any interops. That means it can work in the cloud!

But the real magic comes from pairing up with NPOI Mapper from Donny Tian because it allows me to map the Excel columns to properties in my C# classes without writing any code. It's beautiful.

Here is the basic idea:

I create a .net class that matches/maps the Excel columns I'm interested in:

        class CustomExcelFormat
        {
            [Column("District")]
            public int District { get; set; }

            [Column("DM")]
            public string FullName { get; set; }

            [Column("Email Address")]
            public string EmailAddress { get; set; }

            [Column("Username")]
            public string Username { get; set; }

            public string FirstName
            {
                get
                {
                    return Username.Split('.')[0];
                }
            }

            public string LastName
            {
                get
                {
                    return Username.Split('.')[1];
                }
            }
        }

Notice, it allows me to map based on column name if I want to!

Then when I process the excel file all I need to do is something like this:

        public void Execute(string localPath, int sheetIndex)
        {
            IWorkbook workbook;
            using (FileStream file = new FileStream(localPath, FileMode.Open, FileAccess.Read))
            {
                workbook = WorkbookFactory.Create(file);
            }

            var importer = new Mapper(workbook);
            var items = importer.Take<CustomExcelFormat>(sheetIndex);
            foreach(var item in items)
            {
                var row = item.Value;
                if (string.IsNullOrEmpty(row.EmailAddress))
                    continue;

                UpdateUser(row);
            }

            DataContext.SaveChanges();
        }

Now, admittedly, my code does not modify the Excel file itself. I am instead saving the data to a database using Entity Framework (that's why you see "UpdateUser" and "SaveChanges" in my example). But there is already a good discussion on SO about how to save/modify a file using NPOI.


If you can restrict it to just (Open Office XML format) *.xlsx files, then probably the most popular library would be EPPLus.

Bonus is, there are no other dependencies. Just install using nuget:

Install-Package EPPlus

Using OLE Query, it's quite simple (e.g. sheetName is Sheet1):

DataTable LoadWorksheetInDataTable(string fileName, string sheetName)
{           
    DataTable sheetData = new DataTable();
    using (OleDbConnection conn = this.returnConnection(fileName))
    {
       conn.Open();
       // retrieve the data using data adapter
       OleDbDataAdapter sheetAdapter = new OleDbDataAdapter("select * from [" + sheetName + "$]", conn);
       sheetAdapter.Fill(sheetData);
       conn.Close();
    }                        
    return sheetData;
}

private OleDbConnection returnConnection(string fileName)
{
    return new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + fileName + "; Jet OLEDB:Engine Type=5;Extended Properties=\"Excel 8.0;\"");
}

For newer Excel versions:

return new OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + fileName + ";Extended Properties=Excel 12.0;");

You can also use Excel Data Reader an open source project on CodePlex. Its works really well to export data from Excel sheets.

The sample code given on the link specified:

FileStream stream = File.Open(filePath, FileMode.Open, FileAccess.Read);

//1. Reading from a binary Excel file ('97-2003 format; *.xls)
IExcelDataReader excelReader = ExcelReaderFactory.CreateBinaryReader(stream);
//...
//2. Reading from a OpenXml Excel file (2007 format; *.xlsx)
IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
//...
//3. DataSet - The result of each spreadsheet will be created in the result.Tables
DataSet result = excelReader.AsDataSet();
//...
//4. DataSet - Create column names from first row
excelReader.IsFirstRowAsColumnNames = true;
DataSet result = excelReader.AsDataSet();

//5. Data Reader methods
while (excelReader.Read())
{
//excelReader.GetInt32(0);
}

//6. Free resources (IExcelDataReader is IDisposable)
excelReader.Close();

Reference: How do I import from Excel to a DataSet using Microsoft.Office.Interop.Excel?


Try to use this free way to this, https://freenetexcel.codeplex.com

 Workbook workbook = new Workbook();

 workbook.LoadFromFile(@"..\..\parts.xls",ExcelVersion.Version97to2003);
 //Initialize worksheet
 Worksheet sheet = workbook.Worksheets[0];

 DataTable dataTable = sheet.ExportDataTable();

Try to use Aspose.cells library (not free, but trial is enough to read), it is quite good

Install-package Aspose.cells

There is sample code:

using Aspose.Cells;
using System;

namespace ExcelReader
{
    class Program
    {
        static void Main(string[] args)
        {
            // Replace path for your file
            readXLS(@"C:\MyExcelFile.xls"); // or "*.xlsx"
            Console.ReadKey();
        }

        public static void readXLS(string PathToMyExcel)
        {
            //Open your template file.
            Workbook wb = new Workbook(PathToMyExcel);

            //Get the first worksheet.
            Worksheet worksheet = wb.Worksheets[0];

            //Get cells
            Cells cells = worksheet.Cells;

            // Get row and column count
            int rowCount = cells.MaxDataRow;
            int columnCount = cells.MaxDataColumn;

            // Current cell value
            string strCell = "";

            Console.WriteLine(String.Format("rowCount={0}, columnCount={1}", rowCount, columnCount));

            for (int row = 0; row <= rowCount; row++) // Numeration starts from 0 to MaxDataRow
            {
                for (int column = 0; column <= columnCount; column++)  // Numeration starts from 0 to MaxDataColumn
                {
                    strCell = "";
                    strCell = Convert.ToString(cells[row, column].Value);
                    if (String.IsNullOrEmpty(strCell))
                    {
                        continue;
                    }
                    else
                    {
                        // Do your staff here
                        Console.WriteLine(strCell);
                    }
                }
            }
        }
    }
}

Take a look at Linq-to-Excel. It's pretty neat.

var book = new LinqToExcel.ExcelQueryFactory(@"File.xlsx");

var query =
    from row in book.Worksheet("Stock Entry")
    let item = new
    {
        Code = row["Code"].Cast<string>(),
        Supplier = row["Supplier"].Cast<string>(),
        Ref = row["Ref"].Cast<string>(),
    }
    where item.Supplier == "Walmart"
    select item;

It also allows for strongly-typed row access too.


Examples related to c#

How can I convert this one line of ActionScript to C#? Microsoft Advertising SDK doesn't deliverer ads How to use a global array in C#? How to correctly write async method? C# - insert values from file into two arrays Uploading into folder in FTP? Are these methods thread safe? dotnet ef not found in .NET Core 3 HTTP Error 500.30 - ANCM In-Process Start Failure Best way to "push" into C# array

Examples related to excel

Python: Pandas pd.read_excel giving ImportError: Install xlrd >= 0.9.0 for Excel support Converting unix time into date-time via excel How to increment a letter N times per iteration and store in an array? 'Microsoft.ACE.OLEDB.16.0' provider is not registered on the local machine. (System.Data) How to import an Excel file into SQL Server? Copy filtered data to another sheet using VBA Better way to find last used row Could pandas use column as index? Check if a value is in an array or not with Excel VBA How to sort dates from Oldest to Newest in Excel?

Examples related to oledb

'Microsoft.ACE.OLEDB.16.0' provider is not registered on the local machine. (System.Data) How to import an Excel file into SQL Server? how to resolve DTS_E_OLEDBERROR. in ssis Check if ADODB connection is open Import Excel to Datagridview "Cannot create an instance of OLE DB provider" error as Windows Authentication user Optimal way to Read an Excel file (.xls/.xlsx) How to connect to a MS Access file (mdb) using C#? "The 'Microsoft.ACE.OLEDB.12.0' provider is not registered on the local machine" Error in importing process of xlsx to a sql server How can I export data to an Excel file

Examples related to openxml-sdk

Optimal way to Read an Excel file (.xls/.xlsx) Export DataTable to Excel with Open Xml SDK in c# How to include the reference of DocumentFormat.OpenXml.dll on Mono2.10?

Examples related to excel-interop

Closing Excel Application Process in C# after Data Access Optimal way to Read an Excel file (.xls/.xlsx) How to fix 'Microsoft Excel cannot open or save any more documents' How do I import from Excel to a DataSet using Microsoft.Office.Interop.Excel? HRESULT: 0x800A03EC on Worksheet.range How to count the number of rows in excel with data? System.Runtime.InteropServices.COMException (0x800A03EC) How do I auto size columns through the Excel interop objects? Exporting the values in List to excel