Best way to compare 2 XML documents in Java

Question

I m trying to write an automated test of an application that basically translates a custom message format into an XML message and sends it out the other end   I ve got a good set of input output message pairs so all I need to do is send the input messages in and listen for the XML message to come out the other end   When it comes time to compare the actual output to the expected output I m running into some problems   My first thought was just to do string comparisons on the expected and actual messages   This doens t work very well because the example data we have isn t always formatted consistently and there are often times different aliases used for the XML namespace  and sometimes namespaces aren t used at all    I know I can parse both strings and then walk through each element and compare them myself and this wouldn t be too difficult to do  but I get the feeling there s a better way or a library I could leverage    So  boiled down  the question is   Given two Java Strings which both contain valid XML how would you go about determining if they are semantically equivalent   Bonus points if you have a way to determine what the differences are

User · Answer

Below code works for me      String xml1       String xml2       XMLUnit setIgnoreWhitespace true   XMLUnit setIgnoreAttributeOrder true   XMLAssert assertXMLEqual actualxml  xmlInDb

User · Answer

Using JExamXML with java application       import com a7soft examxml ExamXML      import com a7soft examxml Options                                       Reads two XML files into two strings        String s1   readFile  orders1 xml           String s2   readFile  orders xml               Loads options saved in a property file        Options loadOptions  options               Compares two Strings representing XML entities        System out println  ExamXML compareXMLString  s1  s2

User · Answer

This will compare full string XMLs  reformatting them on the way   It makes it easy to work with your IDE  IntelliJ  Eclipse   cos you just click and visually see the difference in the XML files   import org apache xml security c14n CanonicalizationException  import org apache xml security c14n Canonicalizer  import org apache xml security c14n InvalidCanonicalizerException  import org w3c dom Element  import org w3c dom bootstrap DOMImplementationRegistry  import org w3c dom ls DOMImplementationLS  import org w3c dom ls LSSerializer  import org xml sax InputSource  import org xml sax SAXException   import javax xml parsers DocumentBuilderFactory  import javax xml parsers ParserConfigurationException  import javax xml transform TransformerException  import java io IOException  import java io StringReader   import static org apache xml security Init init  import static org junit Assert assertEquals   public class XmlUtils       static           init               public static String toCanonicalXml String xml  throws InvalidCanonicalizerException  ParserConfigurationException  SAXException  CanonicalizationException  IOException           Canonicalizer canon   Canonicalizer getInstance Canonicalizer ALGO ID C14N OMIT COMMENTS           byte canonXmlBytes     canon canonicalize xml getBytes             return new String canonXmlBytes              public static String prettyFormat String input  throws TransformerException  ParserConfigurationException  IOException  SAXException  InstantiationException  IllegalAccessException  ClassNotFoundException           InputSource src   new InputSource new StringReader input            Element document   DocumentBuilderFactory newInstance   newDocumentBuilder   parse src  getDocumentElement            Boolean keepDeclaration   input startsWith   lt  xml            DOMImplementationRegistry registry   DOMImplementationRegistry newInstance            DOMImplementationLS impl    DOMImplementationLS  registry getDOMImplementation  LS            LSSerializer writer   impl createLSSerializer            writer getDomConfig   setParameter  format-pretty-print   Boolean TRUE           writer getDomConfig   setParameter  xml-declaration   keepDeclaration           return writer writeToString document              public static void assertXMLEqual String expected  String actual  throws ParserConfigurationException  IOException  SAXException  CanonicalizationException  InvalidCanonicalizerException  TransformerException  IllegalAccessException  ClassNotFoundException  InstantiationException           String canonicalExpected   prettyFormat toCanonicalXml expected            String canonicalActual   prettyFormat toCanonicalXml actual            assertEquals canonicalExpected  canonicalActual             I prefer this to XmlUnit because the client code  test code  is cleaner

User · Answer

skaffman seems to be giving a good answer   another way is probably to format the XML using a commmand line utility like xmlstarlet http   xmlstar sourceforge net   and then format both the strings and then use any diff utility library  to diff the resulting output files  I don t know if this is a good solution when issues are with namespaces

User · Answer

The latest version of XMLUnit can help the job of asserting two XML are equal  Also XMLUnit setIgnoreWhitespace   and XMLUnit setIgnoreAttributeOrder   may be necessary to the case in question   See working code of a simple example of XML Unit use below   import org custommonkey xmlunit DetailedDiff  import org custommonkey xmlunit XMLUnit  import org junit Assert   public class TestXml        public static void main String   args  throws Exception           String result     lt abc             attr   value1                  title   something   gt              lt  abc gt               will be ok         assertXMLEquals   lt abc attr   value1   title   something   gt  lt  abc gt    result              public static void assertXMLEquals String expectedXML  String actualXML  throws Exception           XMLUnit setIgnoreWhitespace true           XMLUnit setIgnoreAttributeOrder true            DetailedDiff diff   new DetailedDiff XMLUnit compareXML expectedXML  actualXML             List lt   gt  allDifferences   diff getAllDifferences            Assert assertEquals  Differences found     diff toString    0  allDifferences size                If using Maven  add this to your pom xml    lt dependency gt       lt groupId gt xmlunit lt  groupId gt       lt artifactId gt xmlunit lt  artifactId gt       lt version gt 1 4 lt  version gt   lt  dependency gt

User · Answer

Since you say  semantically equivalent  I assume you mean that you want to do more than just literally verify that the xml outputs are  string  equals  and that you d want something like     lt foo gt    some stuff here lt  foo gt  lt  code gt   and    lt foo gt some stuff here lt  foo gt  lt  code gt   do read as equivalent  Ultimately it s going to matter how you re defining  semantically equivalent  on whatever object you re reconstituting the message from  Simply build that object from the messages and use a custom equals   to define what you re looking for

User · Answer

I m using Altova DiffDog which has options to compare XML files structurally  ignoring string data    This means that  if checking the  ignore text  option     lt foo a  xxx  b  xxx  gt xxx lt  foo gt    and   lt foo b  yyy  a  yyy  gt yyy lt  foo gt     are equal in the sense that they have structural equality  This is handy if you have example files that differ in data  but not structure

User · Answer

AssertJ 1 4  has specific assertions to compare XML content   String expectedXml     lt foo   gt    String actualXml     lt bar   gt    assertThat actualXml  isXmlEqualTo expectedXml     Here is the Documentation

User · Answer

The following will check if the documents are equal using standard JDK libraries    DocumentBuilderFactory dbf   DocumentBuilderFactory newInstance    dbf setNamespaceAware true   dbf setCoalescing true   dbf setIgnoringElementContentWhitespace true   dbf setIgnoringComments true   DocumentBuilder db   dbf newDocumentBuilder     Document doc1   db parse new File  file1 xml     doc1 normalizeDocument     Document doc2   db parse new File  file2 xml     doc2 normalizeDocument     Assert assertTrue doc1 isEqualNode doc2      normalize   is there to make sure there are no cycles  there technically wouldn t be any   The above code will require the white spaces to be the same within the elements though  because it preserves and evaluates it   The standard XML parser that comes with Java does not allow you to set a feature to provide a canonical version or understand xml space if that is going to be a problem then you may need a replacement XML parser such as xerces or use JDOM

User · Answer

Building on Tom s answer  here s an example using XMLUnit v2   It uses these maven dependencies       lt dependency gt           lt groupId gt org xmlunit lt  groupId gt           lt artifactId gt xmlunit-core lt  artifactId gt           lt version gt 2 0 0 lt  version gt           lt scope gt test lt  scope gt       lt  dependency gt       lt dependency gt           lt groupId gt org xmlunit lt  groupId gt           lt artifactId gt xmlunit-matchers lt  artifactId gt           lt version gt 2 0 0 lt  version gt           lt scope gt test lt  scope gt       lt  dependency gt      and here s the test code  import static org junit Assert assertThat  import static org xmlunit matchers CompareMatcher isIdenticalTo  import org xmlunit builder Input  import org xmlunit input WhitespaceStrippedSource   public class SomeTest extends XMLTestCase        Test     public void test             String result     lt root gt  lt  root gt            String expected     lt root gt    lt  root gt                ignore whitespace differences            https   github com xmlunit user-guide wiki Providing-Input-to-XMLUnit whitespacestrippedsource         assertThat result  isIdenticalTo new WhitespaceStrippedSource Input from expected  build                assertThat result  isIdenticalTo Input from expected  build         will fail due to whitespace differences           The documentation that outlines this is https   github com xmlunit xmlunit comparing-two-documents

User · Answer

I required the same functionality as requested in the main question  As I was not allowed to use any 3rd party libraries  I have created my own solution basing on  Archimedes Trajano solution   Following is my solution   import java io ByteArrayInputStream  import java nio charset Charset  import java util HashMap  import java util Map  import java util Map Entry  import java util regex Matcher  import java util regex Pattern  import javax xml parsers DocumentBuilder  import javax xml parsers DocumentBuilderFactory  import javax xml parsers ParserConfigurationException   import org junit Assert  import org w3c dom Document          Asserts for asserting XML strings      public final class AssertXml        private AssertXml                private static Pattern NAMESPACE PATTERN   Pattern compile  xmlns  ns  d                                Asserts that two XML are of identical content  namespace aliases are ignored                   param expectedXml expected XML         param actualXml actual XML         throws Exception thrown if XML parsing fails             public static void assertEqualXmls String expectedXml  String actualXml  throws Exception              Find all namespace mappings         Map lt String  String gt  fullnamespace2newAlias   new HashMap lt String  String gt             generateNewAliasesForNamespacesFromXml expectedXml  fullnamespace2newAlias           generateNewAliasesForNamespacesFromXml actualXml  fullnamespace2newAlias            for  Entry lt String  String gt  entry   fullnamespace2newAlias entrySet                  String newAlias   entry getValue                String namespace   entry getKey                Pattern nsReplacePattern   Pattern compile  xmlns  ns  d         namespace                      expectedXml   transletaNamespaceAliasesToNewAlias expectedXml  newAlias  nsReplacePattern               actualXml   transletaNamespaceAliasesToNewAlias actualXml  newAlias  nsReplacePattern                         nomralize namespaces accoring to given mapping          DocumentBuilder db   initDocumentParserFactory             Document expectedDocuemnt   db parse new ByteArrayInputStream expectedXml getBytes Charset forName  UTF-8               expectedDocuemnt normalizeDocument             Document actualDocument   db parse new ByteArrayInputStream actualXml getBytes Charset forName  UTF-8               actualDocument normalizeDocument             if   expectedDocuemnt isEqualNode actualDocument                 Assert assertEquals expectedXml  actualXml     just to better visualize the diffeences i e  in eclipse                       private static DocumentBuilder initDocumentParserFactory   throws ParserConfigurationException           DocumentBuilderFactory dbf   DocumentBuilderFactory newInstance            dbf setNamespaceAware false           dbf setCoalescing true           dbf setIgnoringElementContentWhitespace true           dbf setIgnoringComments true           DocumentBuilder db   dbf newDocumentBuilder            return db             private static String transletaNamespaceAliasesToNewAlias String xml  String newAlias  Pattern namespacePattern            Matcher nsMatcherExp   namespacePattern matcher xml           if  nsMatcherExp find                  xml   xml replaceAll nsMatcherExp group 1           newAlias                     xml   xml replaceAll nsMatcherExp group 1         newAlias                           return xml             private static void generateNewAliasesForNamespacesFromXml String xml  Map lt String  String gt  fullnamespace2newAlias            Matcher nsMatcher   NAMESPACE PATTERN matcher xml           while  nsMatcher find                  if   fullnamespace2newAlias containsKey nsMatcher group 2                      fullnamespace2newAlias put nsMatcher group 2    nsTr     fullnamespace2newAlias size     1                                       It compares two XML strings and takes care of any mismatching namespace mappings by translating them to unique values in both input strings   Can be fine tuned i e  in case of translation of namespaces  But for my requirements just does the job

User · Answer

Sounds like a job for XMLUnit   http   www xmlunit org  https   github com xmlunit   Example   public class SomeTest extends XMLTestCase      Test   public void test         String xml1           String xml2            XMLUnit setIgnoreWhitespace true      ignore whitespace differences         can also compare xml Documents  InputSources  Readers  Diffs     assertXMLEqual xml1  xml2       assertXMLEquals comes from XMLTestCase

User · Answer

Thanks  I extended this  try this      import java io ByteArrayInputStream  import java util LinkedHashMap  import java util List  import java util Map   import javax xml parsers DocumentBuilder  import javax xml parsers DocumentBuilderFactory   import org w3c dom Document  import org w3c dom NamedNodeMap  import org w3c dom Node   public class XmlDiff        private boolean nodeTypeDiff   true      private boolean nodeValueDiff   true       public boolean diff  String xml1  String xml2  List lt String gt  diffs   throws Exception               DocumentBuilderFactory dbf   DocumentBuilderFactory newInstance            dbf setNamespaceAware true           dbf setCoalescing true           dbf setIgnoringElementContentWhitespace true           dbf setIgnoringComments true           DocumentBuilder db   dbf newDocumentBuilder              Document doc1   db parse new ByteArrayInputStream xml1 getBytes              Document doc2   db parse new ByteArrayInputStream xml2 getBytes               doc1 normalizeDocument            doc2 normalizeDocument             return diff  doc1  doc2  diffs                           Diff 2 nodes and put the diffs in the list              public boolean diff  Node node1  Node node2  List lt String gt  diffs   throws Exception               if  diffNodeExists  node1  node2  diffs                           return true                     if  nodeTypeDiff                         diffNodeType node1  node2  diffs                       if  nodeValueDiff                         diffNodeValue node1  node2  diffs                        System out println node1 getNodeName           node2 getNodeName              diffAttributes  node1  node2  diffs            diffNodes  node1  node2  diffs             return diffs size    gt  0                        Diff the nodes             public boolean diffNodes  Node node1  Node node2  List lt String gt  diffs   throws Exception                 Sort by Name         Map lt String Node gt  children1   new LinkedHashMap lt String Node gt                   for  Node child1   node1 getFirstChild    child1    null  child1   child1 getNextSibling                           children1 put  child1 getNodeName    child1                         Sort by Name         Map lt String Node gt  children2   new LinkedHashMap lt String Node gt                   for  Node child2   node2 getFirstChild    child2   null  child2   child2 getNextSibling                           children2 put  child2 getNodeName    child2                         Diff all the children1         for  Node child1   children1 values                           Node child2   children2 remove  child1 getNodeName                  diff  child1  child2  diffs                         Diff all the children2 left over         for  Node child2   children2 values                           Node child1   children1 get  child2 getNodeName                  diff  child1  child2  diffs                       return diffs size    gt  0                         Diff the nodes             public boolean diffAttributes  Node node1  Node node2  List lt String gt  diffs   throws Exception                         Sort by Name         NamedNodeMap nodeMap1   node1 getAttributes            Map lt String Node gt  attributes1   new LinkedHashMap lt String Node gt                     for  int index   0  nodeMap1    null  amp  amp  index  lt  nodeMap1 getLength    index                           attributes1 put  nodeMap1 item index  getNodeName    nodeMap1 item index                          Sort by Name         NamedNodeMap nodeMap2   node2 getAttributes            Map lt String Node gt  attributes2   new LinkedHashMap lt String Node gt                     for  int index   0  nodeMap2    null  amp  amp  index  lt  nodeMap2 getLength    index                           attributes2 put  nodeMap2 item index  getNodeName    nodeMap2 item index                           Diff all the attributes1         for  Node attribute1   attributes1 values                           Node attribute2   attributes2 remove  attribute1 getNodeName                  diff  attribute1  attribute2  diffs                         Diff all the attributes2 left over         for  Node attribute2   attributes2 values                           Node attribute1   attributes1 get  attribute2 getNodeName                  diff  attribute1  attribute2  diffs                       return diffs size    gt  0                       Check that the nodes exist             public boolean diffNodeExists  Node node1  Node node2  List lt String gt  diffs   throws Exception               if  node1    null  amp  amp  node2    null                         diffs add  getPath node2      node     node1          node2     n                 return true                     if  node1    null  amp  amp  node2    null                         diffs add  getPath node2      node     node1          node2 getNodeName                  return true                     if  node1    null  amp  amp  node2    null                         diffs add  getPath node1      node     node1 getNodeName            node2                return true                     return false                        Diff the Node Type             public boolean diffNodeType  Node node1  Node node2  List lt String gt  diffs   throws Exception                      if  node1 getNodeType      node2 getNodeType                            diffs add  getPath node1      type     node1 getNodeType            node2 getNodeType                  return true                     return false                        Diff the Node Value             public boolean diffNodeValue  Node node1  Node node2  List lt String gt  diffs   throws Exception                      if  node1 getNodeValue      null  amp  amp  node2 getNodeValue      null                         return false                     if  node1 getNodeValue      null  amp  amp  node2 getNodeValue      null                         diffs add  getPath node1      type     node1          node2 getNodeValue                  return true                     if  node1 getNodeValue      null  amp  amp  node2 getNodeValue      null                         diffs add  getPath node1      type     node1 getNodeValue            node2                return true                     if   node1 getNodeValue   equals  node2 getNodeValue                             diffs add  getPath node1      type     node1 getNodeValue            node2 getNodeValue                  return true                     return false                         Get the node path             public String getPath  Node node                 StringBuilder path   new StringBuilder             do                                  path insert 0  node getNodeName                  path insert  0                           while    node   node getParentNode        null             return path toString

User · Answer

Xom has a Canonicalizer utility which turns your DOMs into a regular form  which you can then stringify and compare   So regardless of whitespace irregularities or attribute ordering  you can get regular  predictable comparisons of your documents   This works especially well in IDEs that have dedicated visual String comparators  like Eclipse  You get a visual representation of the semantic differences between the documents

[java] Best way to compare 2 XML documents in Java

Examples related to java

Examples related to xml

Examples related to testing

Examples related to parsing

Examples related to comparison