I need to read smallish (few MB at the most, UTF-8 encoded) XML files, rummage around looking at various elements and attributes, perhaps modify a few and write the XML back out again to disk (preferably with nice, indented formatting).
What would be the best XML parser for my needs? There are lots to choose from. Some I'm aware of are:
And of course the one in the JDK (I'm using Java 6). I'm familiar with Xerces but find it clunky.
Recommendations?
I have found dom4j to be the tool for working with XML. Especially compared to Xerces.
If you care less about performance, I'm a big fan of Apache Digester, since it essentially lets you map directly from XML to Java Beans.
Otherwise, you have to first parse, and then construct your objects.
In addition to SAX and DOM there is STaX parsing available using XMLStreamReader which is an xml pull parser.
I think you should not consider any specific parser implementation. Java API for XML Processing lets you use any conforming parser implementation in a standard way. The code should be much more portable, and when you realise that a specific parser has grown too old, you can replace it with another without changing a line of your code (if you do it correctly).
Basically there are three ways of handling XML in a standard way:
Forget about proprietary APIs such as JDOM or Apache ones (i.e. Apache Xerces XMLSerializer) because will tie you to a specific implementation that can evolve in time or lose backwards compatibility, which will make you change your code in the future when you want to upgrade to a new version of JDOM or whatever parser you use. If you stick to Java standard API (using factories and interfaces) your code will be much more modular and maintainable.
There is no need to say that all (I haven't checked all, but I'm almost sure) of the parsers proposed comply with a JAXP implementation so technically you can use all, no matter which.
Here is a nice comparision on DOM, SAX, StAX & TrAX (Source: http://download.oracle.com/docs/cd/E17802_01/webservices/webservices/docs/1.6/tutorial/doc/SJSXP2.html )
Feature StAX SAX DOM TrAX
API Type Pull,streaming Push,streaming In memory tree XSLT Rule
Ease of Use High Medium High Medium
XPath Capability No No Yes Yes
CPU & Memory Good Good Varies Varies
Forward Only Yes Yes No No
Read XML Yes Yes Yes Yes
Write XML Yes No Yes Yes
CRUD No No Yes No
If you care less about performance, I'm a big fan of Apache Digester, since it essentially lets you map directly from XML to Java Beans.
Otherwise, you have to first parse, and then construct your objects.
In addition to SAX and DOM there is STaX parsing available using XMLStreamReader which is an xml pull parser.
I wouldn't recommended this is you've got a lot of "thinking" in your app, but using XSLT could be better (and potentially faster with XSLT-to-bytecode compilation) than Java manipulation.
Simple XML http://simple.sourceforge.net/ is very easy for (de)serializing objects.
If you care less about performance, I'm a big fan of Apache Digester, since it essentially lets you map directly from XML to Java Beans.
Otherwise, you have to first parse, and then construct your objects.
Simple XML http://simple.sourceforge.net/ is very easy for (de)serializing objects.
I wouldn't recommended this is you've got a lot of "thinking" in your app, but using XSLT could be better (and potentially faster with XSLT-to-bytecode compilation) than Java manipulation.
I have found dom4j to be the tool for working with XML. Especially compared to Xerces.
I think you should not consider any specific parser implementation. Java API for XML Processing lets you use any conforming parser implementation in a standard way. The code should be much more portable, and when you realise that a specific parser has grown too old, you can replace it with another without changing a line of your code (if you do it correctly).
Basically there are three ways of handling XML in a standard way:
Forget about proprietary APIs such as JDOM or Apache ones (i.e. Apache Xerces XMLSerializer) because will tie you to a specific implementation that can evolve in time or lose backwards compatibility, which will make you change your code in the future when you want to upgrade to a new version of JDOM or whatever parser you use. If you stick to Java standard API (using factories and interfaces) your code will be much more modular and maintainable.
There is no need to say that all (I haven't checked all, but I'm almost sure) of the parsers proposed comply with a JAXP implementation so technically you can use all, no matter which.
I have found dom4j to be the tool for working with XML. Especially compared to Xerces.
I think you should not consider any specific parser implementation. Java API for XML Processing lets you use any conforming parser implementation in a standard way. The code should be much more portable, and when you realise that a specific parser has grown too old, you can replace it with another without changing a line of your code (if you do it correctly).
Basically there are three ways of handling XML in a standard way:
Forget about proprietary APIs such as JDOM or Apache ones (i.e. Apache Xerces XMLSerializer) because will tie you to a specific implementation that can evolve in time or lose backwards compatibility, which will make you change your code in the future when you want to upgrade to a new version of JDOM or whatever parser you use. If you stick to Java standard API (using factories and interfaces) your code will be much more modular and maintainable.
There is no need to say that all (I haven't checked all, but I'm almost sure) of the parsers proposed comply with a JAXP implementation so technically you can use all, no matter which.
If you care less about performance, I'm a big fan of Apache Digester, since it essentially lets you map directly from XML to Java Beans.
Otherwise, you have to first parse, and then construct your objects.
In addition to SAX and DOM there is STaX parsing available using XMLStreamReader which is an xml pull parser.
Here is a nice comparision on DOM, SAX, StAX & TrAX (Source: http://download.oracle.com/docs/cd/E17802_01/webservices/webservices/docs/1.6/tutorial/doc/SJSXP2.html )
Feature StAX SAX DOM TrAX
API Type Pull,streaming Push,streaming In memory tree XSLT Rule
Ease of Use High Medium High Medium
XPath Capability No No Yes Yes
CPU & Memory Good Good Varies Varies
Forward Only Yes Yes No No
Read XML Yes Yes Yes Yes
Write XML Yes No Yes Yes
CRUD No No Yes No
I have found dom4j to be the tool for working with XML. Especially compared to Xerces.
I wouldn't recommended this is you've got a lot of "thinking" in your app, but using XSLT could be better (and potentially faster with XSLT-to-bytecode compilation) than Java manipulation.
I think you should not consider any specific parser implementation. Java API for XML Processing lets you use any conforming parser implementation in a standard way. The code should be much more portable, and when you realise that a specific parser has grown too old, you can replace it with another without changing a line of your code (if you do it correctly).
Basically there are three ways of handling XML in a standard way:
Forget about proprietary APIs such as JDOM or Apache ones (i.e. Apache Xerces XMLSerializer) because will tie you to a specific implementation that can evolve in time or lose backwards compatibility, which will make you change your code in the future when you want to upgrade to a new version of JDOM or whatever parser you use. If you stick to Java standard API (using factories and interfaces) your code will be much more modular and maintainable.
There is no need to say that all (I haven't checked all, but I'm almost sure) of the parsers proposed comply with a JAXP implementation so technically you can use all, no matter which.
Source: Stackoverflow.com