[java] Dealing with "Xerces hell" in Java/Maven?

In my office, the mere mention of the word Xerces is enough to incite murderous rage from developers. A cursory glance at the other Xerces questions on SO seem to indicate that almost all Maven users are "touched" by this problem at some point. Unfortunately, understanding the problem requires a bit of knowledge about the history of Xerces...

History

  • Xerces is the most widely used XML parser in the Java ecosystem. Almost every library or framework written in Java uses Xerces in some capacity (transitively, if not directly).

  • The Xerces jars included in the official binaries are, to this day, not versioned. For example, the Xerces 2.11.0 implementation jar is named xercesImpl.jar and not xercesImpl-2.11.0.jar.

  • The Xerces team does not use Maven, which means they do not upload an official release to Maven Central.

  • Xerces used to be released as a single jar (xerces.jar), but was split into two jars, one containing the API (xml-apis.jar) and one containing the implementations of those APIs (xercesImpl.jar). Many older Maven POMs still declare a dependency on xerces.jar. At some point in the past, Xerces was also released as xmlParserAPIs.jar, which some older POMs also depend on.

  • The versions assigned to the xml-apis and xercesImpl jars by those who deploy their jars to Maven repositories are often different. For example, xml-apis might be given version 1.3.03 and xercesImpl might be given version 2.8.0, even though both are from Xerces 2.8.0. This is because people often tag the xml-apis jar with the version of the specifications that it implements. There is a very nice, but incomplete breakdown of this here.

  • To complicate matters, Xerces is the XML parser used in the reference implementation of the Java API for XML Processing (JAXP), included in the JRE. The implementation classes are repackaged under the com.sun.* namespace, which makes it dangerous to access them directly, as they may not be available in some JREs. However, not all of the Xerces functionality is exposed via the java.* and javax.* APIs; for example, there is no API that exposes Xerces serialization.

  • Adding to the confusing mess, almost all servlet containers (JBoss, Jetty, Glassfish, Tomcat, etc.), ship with Xerces in one or more of their /lib folders.

Problems

Conflict Resolution

For some -- or perhaps all -- of the reasons above, many organizations publish and consume custom builds of Xerces in their POMs. This is not really a problem if you have a small application and are only using Maven Central, but it quickly becomes an issue for enterprise software where Artifactory or Nexus is proxying multiple repositories (JBoss, Hibernate, etc.):

xml-apis proxied by Artifactory

For example, organization A might publish xml-apis as:

<groupId>org.apache.xerces</groupId>
<artifactId>xml-apis</artifactId>
<version>2.9.1</version>

Meanwhile, organization B might publish the same jar as:

<groupId>xml-apis</groupId>
<artifactId>xml-apis</artifactId>
<version>1.3.04</version>

Although B's jar is a lower version than A's jar, Maven does not know that they are the same artifact because they have different groupIds. Thus, it cannot perform conflict resolution and both jars will be included as resolved dependencies:

resolved dependencies with multiple xml-apis

Classloader Hell

As mentioned above, the JRE ships with Xerces in the JAXP RI. While it would be nice to mark all Xerces Maven dependencies as <exclusion>s or as <provided>, the third-party code you depend on may or may not work with the version provided in JAXP of the JDK you're using. In addition, you have the Xerces jars shipped in your servlet container to contend with. This leaves you with a number of choices: Do you delete the servlet version and hope that your container runs on the JAXP version? Is it better to leave the servlet version, and hope that your application frameworks run on the servlet version? If one or two of the unresolved conflicts outlined above manage to slip into your product (easy to happen in a large organization), you quickly find yourself in classloader hell, wondering which version of Xerces the classloader is picking at runtime and whether or not it will pick the same jar in Windows and Linux (probably not).

Solutions?

We've tried marking all Xerces Maven dependencies as <provided> or as an <exclusion>, but this is difficult to enforce (especially with a large team) given that the artifacts have so many aliases (xml-apis, xerces, xercesImpl, xmlParserAPIs, etc.). Additionally, our third party libs/frameworks may not run on the JAXP version or the version provided by a servlet container.

How can we best address this problem with Maven? Do we have to exercise such fine-grained control over our dependencies, and then rely on tiered classloading? Is there some way to globally exclude all Xerces dependencies, and force all of our frameworks/libs to use the JAXP version?


UPDATE: Joshua Spiewak has uploaded a patched version of the Xerces build scripts to XERCESJ-1454 that allows for upload to Maven Central. Vote/watch/contribute to this issue and let's fix this problem once and for all.

This question is related to java maven classloader dependency-management xerces

The answer is


Apparently xerces:xml-apis:1.4.01 is no longer in maven central, which is however what xerces:xercesImpl:2.11.0 references.

This works for me:

<dependency>
  <groupId>xerces</groupId>
  <artifactId>xercesImpl</artifactId>
  <version>2.11.0</version>
  <exclusions>
    <exclusion>
      <groupId>xerces</groupId>
      <artifactId>xml-apis</artifactId>
    </exclusion>
  </exclusions>
</dependency>
<dependency>
  <groupId>xml-apis</groupId>
  <artifactId>xml-apis</artifactId>
  <version>1.4.01</version>
</dependency>

I know this doesn't answer the question exactly, but for ppl coming in from google that happen to use Gradle for their dependency management:

I managed to get rid of all xerces/Java8 issues with Gradle like this:

configurations {
    all*.exclude group: 'xml-apis'
    all*.exclude group: 'xerces'
}

You should debug first, to help identify your level of XML hell. In my opinion, the first step is to add

-Djavax.xml.parsers.SAXParserFactory=com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
-Djavax.xml.transform.TransformerFactory=com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl
-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl

to the command line. If that works, then start excluding libraries. If not, then add

-Djaxp.debug=1

to the command-line.


Every maven project should stop depending on xerces, they probably don't really. XML APIs and an Impl has been part of Java since 1.4. There is no need to depend on xerces or XML APIs, its like saying you depend on Java or Swing. This is implicit.

If I was boss of a maven repo I'd write a script to recursively remove xerces dependencies and write a read me that says this repo requires Java 1.4.

Anything that actually breaks because it references Xerces directly via org.apache imports needs a code fix to bring it up to Java 1.4 level (and has done since 2002) or solution at JVM level via endorsed libs, not in maven.


There are 2.11.0 JARs (and source JARs!) of Xerces in Maven Central since 20th February 2013! See Xerces in Maven Central. I wonder why they haven't resolved https://issues.apache.org/jira/browse/XERCESJ-1454...

I've used:

<dependency>
    <groupId>xerces</groupId>
    <artifactId>xercesImpl</artifactId>
    <version>2.11.0</version>
</dependency>

and all dependencies have resolved fine - even proper xml-apis-1.4.01!

And what's most important (and what wasn't obvious in the past) - the JAR in Maven Central is the same JAR as in the official Xerces-J-bin.2.11.0.zip distribution.

I couldn't however find xml-schema-1.1-beta version - it can't be a Maven classifier-ed version because of additional dependencies.


You could use the maven enforcer plugin with the banned dependency rule. This would allow you to ban all the aliases that you don't want and allow only the one you do want. These rules will fail the maven build of your project when violated. Furthermore, if this rule applies to all projects in an enterprise you could put the plugin configuration in a corporate parent pom.

see:


My friend that's very simple, here an example:

<dependency>
    <groupId>xalan</groupId>
    <artifactId>xalan</artifactId>
    <version>2.7.2</version>
    <scope>${my-scope}</scope>
    <exclusions>
        <exclusion>
        <groupId>xml-apis</groupId>
        <artifactId>xml-apis</artifactId>
    </exclusion>
</dependency>

And if you want to check in the terminal(windows console for this example) that your maven tree has no problems:

mvn dependency:tree -Dverbose | grep --color=always '(.* conflict\|^' | less -r

Frankly, pretty much everything that we've encountered works just fine w/ the JAXP version, so we always exclude xml-apis and xercesImpl.


There is another option that hasn't been explored here: declaring Xerces dependencies in Maven as optional:

<dependency>
   <groupId>xerces</groupId>
   <artifactId>xercesImpl</artifactId>
   <version>...</version>
   <optional>true</optional>
</dependency>

Basically what this does is to force all dependents to declare their version of Xerces or their project won't compile. If they want to override this dependency, they are welcome to do so, but then they will own the potential problem.

This creates a strong incentive for downstream projects to:

  • Make an active decision. Do they go with the same version of Xerces or use something else?
  • Actually test their parsing (e.g. through unit testing) and classloading as well as not to clutter up their classpath.

Not all developers keep track of newly introduced dependencies (e.g. with mvn dependency:tree). This approach will immediately bring the matter to their attention.

It works quite well at our organization. Before its introduction, we used to live in the same hell the OP is describing.


I guess there is one question you need to answer:

Does there exist a xerces*.jar that everything in your application can live with?

If not you are basically screwed and would have to use something like OSGI, which allows you to have different versions of a library loaded at the same time. Be warned that it basically replaces jar version issues with classloader issues ...

If there exists such a version you could make your repository return that version for all kinds of dependencies. It's an ugly hack and would end up with the same xerces implementation in your classpath multiple times but better than having multiple different versions of xerces.

You could exclude every dependency to xerces and add one to the version you want to use.

I wonder if you can write some kind of version resolution strategy as a plugin for maven. This would probably the nicest solution but if at all feasible needs some research and coding.

For the version contained in your runtime environment, you'll have to make sure it either gets removed from the application classpath or the application jars get considered first for classloading before the lib folder of the server get considered.

So to wrap it up: It's a mess and that won't change.


What would help, except for excluding, is modular dependencies.

With one flat classloading (standalone app), or semi-hierarchical (JBoss AS/EAP 5.x) this was a problem.

But with modular frameworks like OSGi and JBoss Modules, this is not so much pain anymore. The libraries may use whichever library they want, independently.

Of course, it's still most recommendable to stick with just a single implementation and version, but if there's no other way (using extra features from more libs), then modularizing might save you.

A good example of JBoss Modules in action is, naturally, JBoss AS 7 / EAP 6 / WildFly 8, for which it was primarily developed.

Example module definition:

<?xml version="1.0" encoding="UTF-8"?>
<module xmlns="urn:jboss:module:1.1" name="org.jboss.msc">
    <main-class name="org.jboss.msc.Version"/>
    <properties>
        <property name="my.property" value="foo"/>
    </properties>
    <resources>
        <resource-root path="jboss-msc-1.0.1.GA.jar"/>
    </resources>
    <dependencies>
        <module name="javax.api"/>
        <module name="org.jboss.logging"/>
        <module name="org.jboss.modules"/>
        <!-- Optional deps -->
        <module name="javax.inject.api" optional="true"/>
        <module name="org.jboss.threads" optional="true"/>
    </dependencies>
</module>

In comparison with OSGi, JBoss Modules is simpler and faster. While missing certain features, it's sufficient for most projects which are (mostly) under control of one vendor, and allow stunning fast boot (due to paralelized dependencies resolving).

Note that there's a modularization effort underway for Java 8, but AFAIK that's primarily to modularize the JRE itself, not sure whether it will be applicable to apps.


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to maven

Maven dependencies are failing with a 501 error Why am I getting Unknown error in line 1 of pom.xml? Why am I getting "Received fatal alert: protocol_version" or "peer not authenticated" from Maven Central? How to resolve Unable to load authentication plugin 'caching_sha2_password' issue Unable to compile simple Java 10 / Java 11 project with Maven ERROR Source option 1.5 is no longer supported. Use 1.6 or later 'react-scripts' is not recognized as an internal or external command How to create a Java / Maven project that works in Visual Studio Code? "The POM for ... is missing, no dependency information available" even though it exists in Maven Repository Java.lang.NoClassDefFoundError: com/fasterxml/jackson/databind/exc/InvalidDefinitionException

Examples related to classloader

Caused By: java.lang.NoClassDefFoundError: org/apache/log4j/Logger How to get names of classes inside a jar file? How do I put all required JAR files in a library folder inside the final JAR file with Maven? Dealing with "Xerces hell" in Java/Maven? What is the difference between Class.getResource() and ClassLoader.getResource()? How to use ClassLoader.getResources() correctly? this.getClass().getClassLoader().getResource("...") and NullPointerException Load properties file in JAR? Difference between thread's context class loader and normal classloader URL to load resources from the classpath in Java

Examples related to dependency-management

What's the difference between implementation and compile in Gradle? How to install a specific version of package using Composer? How to clear cache in Yarn? Javascript require() function giving ReferenceError: require is not defined How to add local .jar file dependency to build.gradle file? How to clean old dependencies from maven repositories? Android Studio: Add jar as library? Dealing with "Xerces hell" in Java/Maven? How do I add a Maven dependency in Eclipse? Force re-download of release dependency using Maven

Examples related to xerces

Dealing with "Xerces hell" in Java/Maven?