[java] Best XML parser for Java

I need to read smallish (few MB at the most, UTF-8 encoded) XML files, rummage around looking at various elements and attributes, perhaps modify a few and write the XML back out again to disk (preferably with nice, indented formatting).

What would be the best XML parser for my needs? There are lots to choose from. Some I'm aware of are:

And of course the one in the JDK (I'm using Java 6). I'm familiar with Xerces but find it clunky.

Recommendations?

This question is related to java xml parsing

The answer is


I have found dom4j to be the tool for working with XML. Especially compared to Xerces.


If you care less about performance, I'm a big fan of Apache Digester, since it essentially lets you map directly from XML to Java Beans.

Otherwise, you have to first parse, and then construct your objects.


In addition to SAX and DOM there is STaX parsing available using XMLStreamReader which is an xml pull parser.


I think you should not consider any specific parser implementation. Java API for XML Processing lets you use any conforming parser implementation in a standard way. The code should be much more portable, and when you realise that a specific parser has grown too old, you can replace it with another without changing a line of your code (if you do it correctly).

Basically there are three ways of handling XML in a standard way:

  • SAX This is the simplest API. You read the XML by defining a Handler class that receives the data inside elements/attributes when the XML gets processed in a serial way. It is faster and simpler if you only plan to read some attributes/elements and/or write some values back (your case).
  • DOM This method creates an object tree which lets you modify/access it randomly so it is better for complex XML manipulation and handling.
  • StAX This is in the middle of the path between SAX and DOM. You just write code to pull the data from the parser you are interested in when it is processed.

Forget about proprietary APIs such as JDOM or Apache ones (i.e. Apache Xerces XMLSerializer) because will tie you to a specific implementation that can evolve in time or lose backwards compatibility, which will make you change your code in the future when you want to upgrade to a new version of JDOM or whatever parser you use. If you stick to Java standard API (using factories and interfaces) your code will be much more modular and maintainable.

There is no need to say that all (I haven't checked all, but I'm almost sure) of the parsers proposed comply with a JAXP implementation so technically you can use all, no matter which.


Here is a nice comparision on DOM, SAX, StAX & TrAX (Source: http://download.oracle.com/docs/cd/E17802_01/webservices/webservices/docs/1.6/tutorial/doc/SJSXP2.html )

Feature                  StAX                  SAX                      DOM                  TrAX

API Type                Pull,streaming     Push,streaming    In memory tree    XSLT Rule

Ease of Use          High                    Medium                 High                    Medium

XPath Capability   No                       No                        Yes                      Yes

CPU & Memory     Good                  Good                    Varies                  Varies

Forward Only        Yes                    Yes                        No                       No

Read XML              Yes                    Yes                        Yes                     Yes

Write XML              Yes                    No                          Yes                     Yes

CRUD                      No                      No                         Yes                     No


If you care less about performance, I'm a big fan of Apache Digester, since it essentially lets you map directly from XML to Java Beans.

Otherwise, you have to first parse, and then construct your objects.


In addition to SAX and DOM there is STaX parsing available using XMLStreamReader which is an xml pull parser.


I wouldn't recommended this is you've got a lot of "thinking" in your app, but using XSLT could be better (and potentially faster with XSLT-to-bytecode compilation) than Java manipulation.


Simple XML http://simple.sourceforge.net/ is very easy for (de)serializing objects.


If you care less about performance, I'm a big fan of Apache Digester, since it essentially lets you map directly from XML to Java Beans.

Otherwise, you have to first parse, and then construct your objects.


Simple XML http://simple.sourceforge.net/ is very easy for (de)serializing objects.


I wouldn't recommended this is you've got a lot of "thinking" in your app, but using XSLT could be better (and potentially faster with XSLT-to-bytecode compilation) than Java manipulation.


I have found dom4j to be the tool for working with XML. Especially compared to Xerces.


I think you should not consider any specific parser implementation. Java API for XML Processing lets you use any conforming parser implementation in a standard way. The code should be much more portable, and when you realise that a specific parser has grown too old, you can replace it with another without changing a line of your code (if you do it correctly).

Basically there are three ways of handling XML in a standard way:

  • SAX This is the simplest API. You read the XML by defining a Handler class that receives the data inside elements/attributes when the XML gets processed in a serial way. It is faster and simpler if you only plan to read some attributes/elements and/or write some values back (your case).
  • DOM This method creates an object tree which lets you modify/access it randomly so it is better for complex XML manipulation and handling.
  • StAX This is in the middle of the path between SAX and DOM. You just write code to pull the data from the parser you are interested in when it is processed.

Forget about proprietary APIs such as JDOM or Apache ones (i.e. Apache Xerces XMLSerializer) because will tie you to a specific implementation that can evolve in time or lose backwards compatibility, which will make you change your code in the future when you want to upgrade to a new version of JDOM or whatever parser you use. If you stick to Java standard API (using factories and interfaces) your code will be much more modular and maintainable.

There is no need to say that all (I haven't checked all, but I'm almost sure) of the parsers proposed comply with a JAXP implementation so technically you can use all, no matter which.


I have found dom4j to be the tool for working with XML. Especially compared to Xerces.


I think you should not consider any specific parser implementation. Java API for XML Processing lets you use any conforming parser implementation in a standard way. The code should be much more portable, and when you realise that a specific parser has grown too old, you can replace it with another without changing a line of your code (if you do it correctly).

Basically there are three ways of handling XML in a standard way:

  • SAX This is the simplest API. You read the XML by defining a Handler class that receives the data inside elements/attributes when the XML gets processed in a serial way. It is faster and simpler if you only plan to read some attributes/elements and/or write some values back (your case).
  • DOM This method creates an object tree which lets you modify/access it randomly so it is better for complex XML manipulation and handling.
  • StAX This is in the middle of the path between SAX and DOM. You just write code to pull the data from the parser you are interested in when it is processed.

Forget about proprietary APIs such as JDOM or Apache ones (i.e. Apache Xerces XMLSerializer) because will tie you to a specific implementation that can evolve in time or lose backwards compatibility, which will make you change your code in the future when you want to upgrade to a new version of JDOM or whatever parser you use. If you stick to Java standard API (using factories and interfaces) your code will be much more modular and maintainable.

There is no need to say that all (I haven't checked all, but I'm almost sure) of the parsers proposed comply with a JAXP implementation so technically you can use all, no matter which.


If you care less about performance, I'm a big fan of Apache Digester, since it essentially lets you map directly from XML to Java Beans.

Otherwise, you have to first parse, and then construct your objects.


In addition to SAX and DOM there is STaX parsing available using XMLStreamReader which is an xml pull parser.


Here is a nice comparision on DOM, SAX, StAX & TrAX (Source: http://download.oracle.com/docs/cd/E17802_01/webservices/webservices/docs/1.6/tutorial/doc/SJSXP2.html )

Feature                  StAX                  SAX                      DOM                  TrAX

API Type                Pull,streaming     Push,streaming    In memory tree    XSLT Rule

Ease of Use          High                    Medium                 High                    Medium

XPath Capability   No                       No                        Yes                      Yes

CPU & Memory     Good                  Good                    Varies                  Varies

Forward Only        Yes                    Yes                        No                       No

Read XML              Yes                    Yes                        Yes                     Yes

Write XML              Yes                    No                          Yes                     Yes

CRUD                      No                      No                         Yes                     No


I have found dom4j to be the tool for working with XML. Especially compared to Xerces.


I wouldn't recommended this is you've got a lot of "thinking" in your app, but using XSLT could be better (and potentially faster with XSLT-to-bytecode compilation) than Java manipulation.


I think you should not consider any specific parser implementation. Java API for XML Processing lets you use any conforming parser implementation in a standard way. The code should be much more portable, and when you realise that a specific parser has grown too old, you can replace it with another without changing a line of your code (if you do it correctly).

Basically there are three ways of handling XML in a standard way:

  • SAX This is the simplest API. You read the XML by defining a Handler class that receives the data inside elements/attributes when the XML gets processed in a serial way. It is faster and simpler if you only plan to read some attributes/elements and/or write some values back (your case).
  • DOM This method creates an object tree which lets you modify/access it randomly so it is better for complex XML manipulation and handling.
  • StAX This is in the middle of the path between SAX and DOM. You just write code to pull the data from the parser you are interested in when it is processed.

Forget about proprietary APIs such as JDOM or Apache ones (i.e. Apache Xerces XMLSerializer) because will tie you to a specific implementation that can evolve in time or lose backwards compatibility, which will make you change your code in the future when you want to upgrade to a new version of JDOM or whatever parser you use. If you stick to Java standard API (using factories and interfaces) your code will be much more modular and maintainable.

There is no need to say that all (I haven't checked all, but I'm almost sure) of the parsers proposed comply with a JAXP implementation so technically you can use all, no matter which.


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to xml

strange error in my Animation Drawable How do I POST XML data to a webservice with Postman? PHP XML Extension: Not installed How to add a Hint in spinner in XML Generating Request/Response XML from a WSDL Manifest Merger failed with multiple errors in Android Studio How to set menu to Toolbar in Android How to add colored border on cardview? Android: ScrollView vs NestedScrollView WARNING: Exception encountered during context initialization - cancelling refresh attempt

Examples related to parsing

Got a NumberFormatException while trying to parse a text file for objects Uncaught SyntaxError: Unexpected end of JSON input at JSON.parse (<anonymous>) Python/Json:Expecting property name enclosed in double quotes Correctly Parsing JSON in Swift 3 How to get response as String using retrofit without using GSON or any other library in android UIButton action in table view cell "Expected BEGIN_OBJECT but was STRING at line 1 column 1" How to convert an XML file to nice pandas dataframe? How to extract multiple JSON objects from one file? How to sum digits of an integer in java?