[java] Read XLSX file in Java

I need to read an Excel 2007 XLSX file in a Java application. Does anyone know of a good API to accomplish this task?

This question is related to java excel parsing

The answer is


Have you looked at the poorly obfuscated API?

Nevermind:

HSSF is the POI Project's pure Java implementation of the Excel '97(-2007) file format. It does not support the new Excel 2007 .xlsx OOXML file format, which is not OLE2 based.

You might consider using a JDBC-ODBC bridge instead.


You can use Apache Tika for that:

String parse(File xlsxFile) {
    return new Tika().parseToString(xlsxFile);
}

Tika uses Apache POI for parsing XLSX files.

Here are some usage examples for Tiki.

Alternatively, if you want to handle each cell of the spreadsheet individually, here's one way to do this with POI:

void parse(File xlsx) {
    try (XSSFWorkbook workbook = new XSSFWorkbook(xlsx)) {
        // Handle each cell in each sheet
        workbook.forEach(sheet -> sheet.forEach(row -> row.forEach(this::handle)));
    }
    catch (InvalidFormatException | IOException e) {
        System.out.println("Can't parse file " + xlsx);
    }
}

void handle(Cell cell) {
    final String cellContent;
    switch (cell.getCellType()) {
        case Cell.CELL_TYPE_STRING:
            cellContent = cell.getStringCellValue();
            break;
        case Cell.CELL_TYPE_NUMERIC:
            cellContent = String.valueOf(cell.getNumericCellValue());
            break;
        case Cell.CELL_TYPE_BOOLEAN:
            cellContent = String.valueOf(cell.getBooleanCellValue());
            break;
        default:
            cellContent = "Don't know how to handle cell " + cell;
    }
    System.out.println(cellContent);
}

Try this:

  1. Unzip XLSX file
  2. Read XML files
  3. Compose and use data

Example code:

    public Workbook getTemplateData(String xlsxFile) {
    Workbook workbook = new Workbook();
    parseSharedStrings(xlsxFile);
    parseWorkesheet(xlsxFile, workbook);
    parseComments(xlsxFile, workbook);
    for (Worksheet worksheet : workbook.sheets) {
        worksheet.dimension = manager.getDimension(worksheet);
    }

    return workbook;
}

private void parseComments(String tmpFile, Workbook workbook) {
    try {
        FileInputStream fin = new FileInputStream(tmpFile);
        final ZipInputStream zin = new ZipInputStream(fin);
        InputStream in = getInputStream(zin);
        while (true) {
            ZipEntry entry = zin.getNextEntry();
            if (entry == null)
                break;

            String name = entry.getName();
            if (name.endsWith(".xml")) { //$NON-NLS-1$
                if (name.contains(COMMENTS)) {
                    parseComments(in, workbook);
                }
            }
            zin.closeEntry();
        }
        in.close();
        zin.close();
        fin.close();
    } catch (FileNotFoundException e) {
        System.out.println(e);
    } catch (IOException e) {
        e.printStackTrace();
    }
}

private void parseComments(InputStream in, Workbook workbook) {
    try {
        DefaultHandler handler = getCommentHandler(workbook);
        SAXParser saxParser = getSAXParser();
        saxParser.parse(in, handler);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

private DefaultHandler getCommentHandler(Workbook workbook) {
    final Worksheet ws = workbook.sheets.get(0);
    return new DefaultHandler() {
        String lastTag = "";
        private Cell ccell;

        @Override
        public void startElement(String uri, String localName,
                String qName, Attributes attributes) throws SAXException {
            lastTag = qName;
            if (lastTag.equals("comment")) {
                String cellName = attributes.getValue("ref");
                int r = manager.getRowIndex(cellName);
                int c = manager.getColumnIndex(cellName);
                Row row = ws.rows.get(r);
                if (row == null) {
                    row = new Row();
                    row.index = r;
                    ws.rows.put(r, row);
                }
                ccell = row.cells.get(c);
                if (ccell == null) {
                    ccell = new Cell();
                    ccell.cellName = cellName;
                    row.cells.put(c, ccell);
                }
            }
        }

        @Override
        public void characters(char[] ch, int start, int length)
                throws SAXException {
            String val = "";
            if (ccell != null && lastTag.equals("t")) {
                for (int i = start; i < start + length; i++) {
                    val += ch[i];
                }
                if (ccell.comment == null)
                    ccell.comment = val;
                else {
                    ccell.comment += val;
                }
            }
        }
    };
}

private void parseSharedStrings(String tmpFile) {
    try {
        FileInputStream fin = new FileInputStream(tmpFile);
        final ZipInputStream zin = new ZipInputStream(fin);
        InputStream in = getInputStream(zin);
        while (true) {
            ZipEntry entry = zin.getNextEntry();
            if (entry == null)
                break;
            String name = entry.getName();
            if (name.endsWith(".xml")) { //$NON-NLS-1$
                if (name.startsWith(SHARED_STRINGS)) {
                    parseStrings(in);
                }
            }
            zin.closeEntry();
        }
        in.close();
        zin.close();
        fin.close();
    } catch (FileNotFoundException e) {
        System.out.println(e);
    } catch (IOException e) {
        e.printStackTrace();
    }
}

public void parseWorkesheet(String tmpFile, Workbook workbook) {
    try {
        FileInputStream fin = new FileInputStream(tmpFile);
        final ZipInputStream zin = new ZipInputStream(fin);
        InputStream in = getInputStream(zin);
        while (true) {
            ZipEntry entry = zin.getNextEntry();
            if (entry == null)
                break;

            String name = entry.getName();
            if (name.endsWith(".xml")) { //$NON-NLS-1$
                if (name.contains("worksheets")) {
                    Worksheet worksheet = new Worksheet();
                    worksheet.name = name;
                    parseWorksheet(in, worksheet);
                    workbook.sheets.add(worksheet);
                }
            }
            zin.closeEntry();
        }
        in.close();
        zin.close();
        fin.close();
    } catch (FileNotFoundException e) {
        System.out.println(e);
    } catch (IOException e) {
        e.printStackTrace();
    }
}

public void parseWorksheet(InputStream in, Worksheet worksheet)
        throws IOException {
    // read sheet1 sharedStrings
    // styles, strings, formulas ...
    try {
        DefaultHandler handler = getDefaultHandler(worksheet);
        SAXParser saxParser = getSAXParser();
        saxParser.parse(in, handler);
    } catch (SAXException e) {
        e.printStackTrace();
    } catch (ParserConfigurationException e) {
        e.printStackTrace();
    }
}

where Workbook class:

public class Workbook {
Integer id = null;
public List<Worksheet> sheets = new ArrayList<Worksheet>();}

and Worksheet class:

public class Worksheet {
public Integer id = null;
public String name = null;
public String dimension = null;
public Map<Integer, Row> rows = new TreeMap<Integer, Row>();
public Map<Integer, Column> columns = new TreeMap<Integer, Column>();
public List<Span> spans = new ArrayList<Span>();}

and Row class:

public class Row {
public Integer id = null;
public Integer index = null;
public Row tmpRow = null;
public Style style = null;
public Double height = null;
public Map<Integer,Cell> cells = new TreeMap<Integer, Cell>();
public String spans = null;
public Integer customHeight = null;}

and Cell class:

public class Cell {
public Integer id = null;
public Integer rowIndex = null;
public Integer colIndex = null;
public String cellName = null;
public String text = null;
public String formula = null;
public String comment = null;
public Style style = null;
public Object value = null;
public Cell tmpCell = null;}

and Column class:

public class Column {
    public Integer index = null;
    public Style style = null;
    public String width = null;
    public Column tmpColumn = null;
}

and Span class:

public class Span {
    Integer id = null;
    String topLeft = null;
    String bottomRight = null;
}

I had to do this in .NET and I couldn't find any API's out there. My solution was to unzip the .xlsx, and dive right into manipulating the XML. It's not so bad once you create your helper classes and such.

There are some "gotchas" like the nodes all have to be sorted according to the way excel expects them, that I didn't find in the official docs. Excel has its own date timestamping, so you'll need to make a conversion formula.


I don't know if it is up to date for Excel 2007, but for earlier versions I use the JExcelAPI


This one maybe work for you, it can read/write Excel 2007 xlsx file. SmartXLS


I had to do this in .NET and I couldn't find any API's out there. My solution was to unzip the .xlsx, and dive right into manipulating the XML. It's not so bad once you create your helper classes and such.

There are some "gotchas" like the nodes all have to be sorted according to the way excel expects them, that I didn't find in the official docs. Excel has its own date timestamping, so you'll need to make a conversion formula.


This one maybe work for you, it can read/write Excel 2007 xlsx file. SmartXLS


Apache POI 3.5 have added support to all the OOXML (docx, xlsx, etc.)

See the XSSF sub project


If you want to work with xlsx, you'll have to use the org.apache.poi.ss package. This package has class XSSF, which can be used for parsing xlxs file. This Sample code works on Excel 2007 or later (.xlsx)

OPCPackage pkg = OPCPackage.open(new ByteArrayInputStream(data));
            Workbook wb = new XSSFWorkbook(pkg);
            Sheet sheet = wb.getSheetAt(0);
            Iterator<Row> rows = sheet.rowIterator();

        while (rows.hasNext()) {
            int j = 5;
            Person person= new Person ();
            Row row = rows.next();
            if (row.getRowNum() > 0) {
                person.setPersonId((int)(row.getCell(0).getNumericCellValue()));
                person.setFirstName(row.getCell(1).getStringCellValue());
                person.setLastName(row.getCell(2).getStringCellValue());
                person.setGroupId((int)(row.getCell(3).getNumericCellValue()));
                person.setUserName(row.getCell(4).getStringCellValue());
                person.setCreditId((int)(row.getCell(5).getNumericCellValue()));
            }

        }

Excel 1998-2003 file (.xls) - you may use HSSF library.
  just use :  Workbook wb = new HSSFWorkbook(pkg);

You can use Apache Tika for that:

String parse(File xlsxFile) {
    return new Tika().parseToString(xlsxFile);
}

Tika uses Apache POI for parsing XLSX files.

Here are some usage examples for Tiki.

Alternatively, if you want to handle each cell of the spreadsheet individually, here's one way to do this with POI:

void parse(File xlsx) {
    try (XSSFWorkbook workbook = new XSSFWorkbook(xlsx)) {
        // Handle each cell in each sheet
        workbook.forEach(sheet -> sheet.forEach(row -> row.forEach(this::handle)));
    }
    catch (InvalidFormatException | IOException e) {
        System.out.println("Can't parse file " + xlsx);
    }
}

void handle(Cell cell) {
    final String cellContent;
    switch (cell.getCellType()) {
        case Cell.CELL_TYPE_STRING:
            cellContent = cell.getStringCellValue();
            break;
        case Cell.CELL_TYPE_NUMERIC:
            cellContent = String.valueOf(cell.getNumericCellValue());
            break;
        case Cell.CELL_TYPE_BOOLEAN:
            cellContent = String.valueOf(cell.getBooleanCellValue());
            break;
        default:
            cellContent = "Don't know how to handle cell " + cell;
    }
    System.out.println(cellContent);
}

Aspose.Cells for Java supports XLSX format. You may find more details and further help in Aspose.Cells for Java Documentation. Please see if this helps.

Disclosure: I work as developer evangelist at Aspose.


I'm not very happy with any of the options so I ended up requesting the file in Excel 97 formate. The POI works great for that. Thanks everyone for the help.


I had to do this in .NET and I couldn't find any API's out there. My solution was to unzip the .xlsx, and dive right into manipulating the XML. It's not so bad once you create your helper classes and such.

There are some "gotchas" like the nodes all have to be sorted according to the way excel expects them, that I didn't find in the official docs. Excel has its own date timestamping, so you'll need to make a conversion formula.


I don't know if it is up to date for Excel 2007, but for earlier versions I use the JExcelAPI


Have you looked at the poorly obfuscated API?

Nevermind:

HSSF is the POI Project's pure Java implementation of the Excel '97(-2007) file format. It does not support the new Excel 2007 .xlsx OOXML file format, which is not OLE2 based.

You might consider using a JDBC-ODBC bridge instead.


If you want to work with xlsx, you'll have to use the org.apache.poi.ss package. This package has class XSSF, which can be used for parsing xlxs file. This Sample code works on Excel 2007 or later (.xlsx)

OPCPackage pkg = OPCPackage.open(new ByteArrayInputStream(data));
            Workbook wb = new XSSFWorkbook(pkg);
            Sheet sheet = wb.getSheetAt(0);
            Iterator<Row> rows = sheet.rowIterator();

        while (rows.hasNext()) {
            int j = 5;
            Person person= new Person ();
            Row row = rows.next();
            if (row.getRowNum() > 0) {
                person.setPersonId((int)(row.getCell(0).getNumericCellValue()));
                person.setFirstName(row.getCell(1).getStringCellValue());
                person.setLastName(row.getCell(2).getStringCellValue());
                person.setGroupId((int)(row.getCell(3).getNumericCellValue()));
                person.setUserName(row.getCell(4).getStringCellValue());
                person.setCreditId((int)(row.getCell(5).getNumericCellValue()));
            }

        }

Excel 1998-2003 file (.xls) - you may use HSSF library.
  just use :  Workbook wb = new HSSFWorkbook(pkg);

I don't know if it is up to date for Excel 2007, but for earlier versions I use the JExcelAPI


docx4j now covers xlsx as well.

"Why would you use docx4j to do this", I hear you ask, "rather than POI, which focuses on xlsx and binary xls?"

Probably because you like JAXB (as opposed to XML Beans), or you are already using docx4j for docx or pptx, and need to be able to do some stuff with xlsx as well.

Another possible reason is that the jar XML Beans generates from the OpenXML schemas is too big for your purposes. (To get around this, POI offers a 'lite' subset: the 'big' ooxml-schemas-1.0.jar is 14.5 MB! But if you need to support arbitrary spreadsheets, you'll probably need the complete jar). In contrast, the whole of docx4j/pptx4j/xlsx4j weighs in at about the same as POI's lite subset.

If you are processing spreadsheets only (ie not docx or pptx), and preceding paragraph is not a concern for you, then you would probably be best off using POI.


I had to do this in .NET and I couldn't find any API's out there. My solution was to unzip the .xlsx, and dive right into manipulating the XML. It's not so bad once you create your helper classes and such.

There are some "gotchas" like the nodes all have to be sorted according to the way excel expects them, that I didn't find in the official docs. Excel has its own date timestamping, so you'll need to make a conversion formula.


Might be a little late, but the beta POI now supports xlsx.


I'm not very happy with any of the options so I ended up requesting the file in Excel 97 formate. The POI works great for that. Thanks everyone for the help.


Have you looked at the poorly obfuscated API?

Nevermind:

HSSF is the POI Project's pure Java implementation of the Excel '97(-2007) file format. It does not support the new Excel 2007 .xlsx OOXML file format, which is not OLE2 based.

You might consider using a JDBC-ODBC bridge instead.


I don't know if it is up to date for Excel 2007, but for earlier versions I use the JExcelAPI


Aspose.Cells for Java supports XLSX format. You may find more details and further help in Aspose.Cells for Java Documentation. Please see if this helps.

Disclosure: I work as developer evangelist at Aspose.


Might be a little late, but the beta POI now supports xlsx.


This one maybe work for you, it can read/write Excel 2007 xlsx file. SmartXLS


I'm not very happy with any of the options so I ended up requesting the file in Excel 97 formate. The POI works great for that. Thanks everyone for the help.


Try this:

  1. Unzip XLSX file
  2. Read XML files
  3. Compose and use data

Example code:

    public Workbook getTemplateData(String xlsxFile) {
    Workbook workbook = new Workbook();
    parseSharedStrings(xlsxFile);
    parseWorkesheet(xlsxFile, workbook);
    parseComments(xlsxFile, workbook);
    for (Worksheet worksheet : workbook.sheets) {
        worksheet.dimension = manager.getDimension(worksheet);
    }

    return workbook;
}

private void parseComments(String tmpFile, Workbook workbook) {
    try {
        FileInputStream fin = new FileInputStream(tmpFile);
        final ZipInputStream zin = new ZipInputStream(fin);
        InputStream in = getInputStream(zin);
        while (true) {
            ZipEntry entry = zin.getNextEntry();
            if (entry == null)
                break;

            String name = entry.getName();
            if (name.endsWith(".xml")) { //$NON-NLS-1$
                if (name.contains(COMMENTS)) {
                    parseComments(in, workbook);
                }
            }
            zin.closeEntry();
        }
        in.close();
        zin.close();
        fin.close();
    } catch (FileNotFoundException e) {
        System.out.println(e);
    } catch (IOException e) {
        e.printStackTrace();
    }
}

private void parseComments(InputStream in, Workbook workbook) {
    try {
        DefaultHandler handler = getCommentHandler(workbook);
        SAXParser saxParser = getSAXParser();
        saxParser.parse(in, handler);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

private DefaultHandler getCommentHandler(Workbook workbook) {
    final Worksheet ws = workbook.sheets.get(0);
    return new DefaultHandler() {
        String lastTag = "";
        private Cell ccell;

        @Override
        public void startElement(String uri, String localName,
                String qName, Attributes attributes) throws SAXException {
            lastTag = qName;
            if (lastTag.equals("comment")) {
                String cellName = attributes.getValue("ref");
                int r = manager.getRowIndex(cellName);
                int c = manager.getColumnIndex(cellName);
                Row row = ws.rows.get(r);
                if (row == null) {
                    row = new Row();
                    row.index = r;
                    ws.rows.put(r, row);
                }
                ccell = row.cells.get(c);
                if (ccell == null) {
                    ccell = new Cell();
                    ccell.cellName = cellName;
                    row.cells.put(c, ccell);
                }
            }
        }

        @Override
        public void characters(char[] ch, int start, int length)
                throws SAXException {
            String val = "";
            if (ccell != null && lastTag.equals("t")) {
                for (int i = start; i < start + length; i++) {
                    val += ch[i];
                }
                if (ccell.comment == null)
                    ccell.comment = val;
                else {
                    ccell.comment += val;
                }
            }
        }
    };
}

private void parseSharedStrings(String tmpFile) {
    try {
        FileInputStream fin = new FileInputStream(tmpFile);
        final ZipInputStream zin = new ZipInputStream(fin);
        InputStream in = getInputStream(zin);
        while (true) {
            ZipEntry entry = zin.getNextEntry();
            if (entry == null)
                break;
            String name = entry.getName();
            if (name.endsWith(".xml")) { //$NON-NLS-1$
                if (name.startsWith(SHARED_STRINGS)) {
                    parseStrings(in);
                }
            }
            zin.closeEntry();
        }
        in.close();
        zin.close();
        fin.close();
    } catch (FileNotFoundException e) {
        System.out.println(e);
    } catch (IOException e) {
        e.printStackTrace();
    }
}

public void parseWorkesheet(String tmpFile, Workbook workbook) {
    try {
        FileInputStream fin = new FileInputStream(tmpFile);
        final ZipInputStream zin = new ZipInputStream(fin);
        InputStream in = getInputStream(zin);
        while (true) {
            ZipEntry entry = zin.getNextEntry();
            if (entry == null)
                break;

            String name = entry.getName();
            if (name.endsWith(".xml")) { //$NON-NLS-1$
                if (name.contains("worksheets")) {
                    Worksheet worksheet = new Worksheet();
                    worksheet.name = name;
                    parseWorksheet(in, worksheet);
                    workbook.sheets.add(worksheet);
                }
            }
            zin.closeEntry();
        }
        in.close();
        zin.close();
        fin.close();
    } catch (FileNotFoundException e) {
        System.out.println(e);
    } catch (IOException e) {
        e.printStackTrace();
    }
}

public void parseWorksheet(InputStream in, Worksheet worksheet)
        throws IOException {
    // read sheet1 sharedStrings
    // styles, strings, formulas ...
    try {
        DefaultHandler handler = getDefaultHandler(worksheet);
        SAXParser saxParser = getSAXParser();
        saxParser.parse(in, handler);
    } catch (SAXException e) {
        e.printStackTrace();
    } catch (ParserConfigurationException e) {
        e.printStackTrace();
    }
}

where Workbook class:

public class Workbook {
Integer id = null;
public List<Worksheet> sheets = new ArrayList<Worksheet>();}

and Worksheet class:

public class Worksheet {
public Integer id = null;
public String name = null;
public String dimension = null;
public Map<Integer, Row> rows = new TreeMap<Integer, Row>();
public Map<Integer, Column> columns = new TreeMap<Integer, Column>();
public List<Span> spans = new ArrayList<Span>();}

and Row class:

public class Row {
public Integer id = null;
public Integer index = null;
public Row tmpRow = null;
public Style style = null;
public Double height = null;
public Map<Integer,Cell> cells = new TreeMap<Integer, Cell>();
public String spans = null;
public Integer customHeight = null;}

and Cell class:

public class Cell {
public Integer id = null;
public Integer rowIndex = null;
public Integer colIndex = null;
public String cellName = null;
public String text = null;
public String formula = null;
public String comment = null;
public Style style = null;
public Object value = null;
public Cell tmpCell = null;}

and Column class:

public class Column {
    public Integer index = null;
    public Style style = null;
    public String width = null;
    public Column tmpColumn = null;
}

and Span class:

public class Span {
    Integer id = null;
    String topLeft = null;
    String bottomRight = null;
}

Have you looked at the poorly obfuscated API?

Nevermind:

HSSF is the POI Project's pure Java implementation of the Excel '97(-2007) file format. It does not support the new Excel 2007 .xlsx OOXML file format, which is not OLE2 based.

You might consider using a JDBC-ODBC bridge instead.


Apache POI 3.5 have added support to all the OOXML (docx, xlsx, etc.)

See the XSSF sub project


I'm not very happy with any of the options so I ended up requesting the file in Excel 97 formate. The POI works great for that. Thanks everyone for the help.


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to excel

Python: Pandas pd.read_excel giving ImportError: Install xlrd >= 0.9.0 for Excel support Converting unix time into date-time via excel How to increment a letter N times per iteration and store in an array? 'Microsoft.ACE.OLEDB.16.0' provider is not registered on the local machine. (System.Data) How to import an Excel file into SQL Server? Copy filtered data to another sheet using VBA Better way to find last used row Could pandas use column as index? Check if a value is in an array or not with Excel VBA How to sort dates from Oldest to Newest in Excel?

Examples related to parsing

Got a NumberFormatException while trying to parse a text file for objects Uncaught SyntaxError: Unexpected end of JSON input at JSON.parse (<anonymous>) Python/Json:Expecting property name enclosed in double quotes Correctly Parsing JSON in Swift 3 How to get response as String using retrofit without using GSON or any other library in android UIButton action in table view cell "Expected BEGIN_OBJECT but was STRING at line 1 column 1" How to convert an XML file to nice pandas dataframe? How to extract multiple JSON objects from one file? How to sum digits of an integer in java?