[java] Normalization in DOM parsing with java - how does it work?

I saw the line below in code for a DOM parser at this tutorial.

doc.getDocumentElement().normalize();

Why do we do this normalization ?
I read the docs but I could not understand a word.

Puts all Text nodes in the full depth of the sub-tree underneath this Node

Okay, then can someone show me (preferably with a picture) what this tree looks like ?

Can anyone explain me why normalization is needed?
What happens if we don't normalize ?

This question is related to java xml dom

The answer is


The rest of the sentence is:

where only structure (e.g., elements, comments, processing instructions, CDATA sections, and entity references) separates Text nodes, i.e., there are neither adjacent Text nodes nor empty Text nodes.

This basically means that the following XML element

<foo>hello 
wor
ld</foo>

could be represented like this in a denormalized node:

Element foo
    Text node: ""
    Text node: "Hello "
    Text node: "wor"
    Text node: "ld"

When normalized, the node will look like this

Element foo
    Text node: "Hello world"

And the same goes for attributes: <foo bar="Hello world"/>, comments, etc.


As an extension to @JBNizet's answer for more technical users here's what implementation of org.w3c.dom.Node interface in com.sun.org.apache.xerces.internal.dom.ParentNode looks like, gives you the idea how it actually works.

public void normalize() {
    // No need to normalize if already normalized.
    if (isNormalized()) {
        return;
    }
    if (needsSyncChildren()) {
        synchronizeChildren();
    }
    ChildNode kid;
    for (kid = firstChild; kid != null; kid = kid.nextSibling) {
         kid.normalize();
    }
    isNormalized(true);
}

It traverses all the nodes recursively and calls kid.normalize()
This mechanism is overridden in org.apache.xerces.dom.ElementImpl

public void normalize() {
     // No need to normalize if already normalized.
     if (isNormalized()) {
         return;
     }
     if (needsSyncChildren()) {
         synchronizeChildren();
     }
     ChildNode kid, next;
     for (kid = firstChild; kid != null; kid = next) {
         next = kid.nextSibling;

         // If kid is a text node, we need to check for one of two
         // conditions:
         //   1) There is an adjacent text node
         //   2) There is no adjacent text node, but kid is
         //      an empty text node.
         if ( kid.getNodeType() == Node.TEXT_NODE )
         {
             // If an adjacent text node, merge it with kid
             if ( next!=null && next.getNodeType() == Node.TEXT_NODE )
             {
                 ((Text)kid).appendData(next.getNodeValue());
                 removeChild( next );
                 next = kid; // Don't advance; there might be another.
             }
             else
             {
                 // If kid is empty, remove it
                 if ( kid.getNodeValue() == null || kid.getNodeValue().length() == 0 ) {
                     removeChild( kid );
                 }
             }
         }

         // Otherwise it might be an Element, which is handled recursively
         else if (kid.getNodeType() == Node.ELEMENT_NODE) {
             kid.normalize();
         }
     }

     // We must also normalize all of the attributes
     if ( attributes!=null )
     {
         for( int i=0; i<attributes.getLength(); ++i )
         {
             Node attr = attributes.item(i);
             attr.normalize();
         }
     }

    // changed() will have occurred when the removeChild() was done,
    // so does not have to be reissued.

     isNormalized(true);
 } 

Hope this saves you some time.


In simple, Normalisation is Reduction of Redundancies.
Examples of Redundancies:
a) white spaces outside of the root/document tags(...<document></document>...)
b) white spaces within start tag (<...>) and end tag (</...>)
c) white spaces between attributes and their values (ie. spaces between key name and =")
d) superfluous namespace declarations
e) line breaks/white spaces in texts of attributes and tags
f) comments etc...


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to xml

strange error in my Animation Drawable How do I POST XML data to a webservice with Postman? PHP XML Extension: Not installed How to add a Hint in spinner in XML Generating Request/Response XML from a WSDL Manifest Merger failed with multiple errors in Android Studio How to set menu to Toolbar in Android How to add colored border on cardview? Android: ScrollView vs NestedScrollView WARNING: Exception encountered during context initialization - cancelling refresh attempt

Examples related to dom

How do you set the document title in React? How to find if element with specific id exists or not Cannot read property 'style' of undefined -- Uncaught Type Error adding text to an existing text element in javascript via DOM Violation Long running JavaScript task took xx ms How to get `DOM Element` in Angular 2? Angular2, what is the correct way to disable an anchor element? React.js: How to append a component on click? Detect click outside React component DOM element to corresponding vue.js component