Normalization in DOM parsing with java - how does it work

Question

I saw the line below in code for a DOM parser at this tutorial   doc getDocumentElement   normalize      Why do we do this normalization   I read the docs but I could not understand a word      Puts all Text nodes in the full depth of the sub-tree underneath this Node   Okay  then can someone show me  preferably with a picture  what this tree looks like    Can anyone explain me why normalization is needed  What happens if we don t normalize

User · Accepted Answer

The rest of the sentence is      where only structure  e g   elements  comments  processing instructions  CDATA sections  and entity references  separates Text nodes  i e   there are neither adjacent Text nodes nor empty Text nodes    This basically means that the following XML element   lt foo gt hello  wor ld lt  foo gt    could be represented like this in a denormalized node   Element foo     Text node         Text node   Hello       Text node   wor      Text node   ld    When normalized  the node will look like this  Element foo     Text node   Hello world    And the same goes for attributes   lt foo bar  Hello world   gt   comments  etc

User · Answer

In simple  Normalisation is Reduction of Redundancies   Examples of Redundancies  a  white spaces outside of the root document tags     lt document  lt  document      b  white spaces within start tag   lt       and end tag   lt        c  white spaces between attributes and their values  ie  spaces between key name and      d  superfluous namespace declarations e  line breaks white spaces in texts of attributes and tags f  comments etc

User · Answer

As an extension to  JBNizet s answer for more technical users here s what implementation of org w3c dom Node interface in com sun org apache xerces internal dom ParentNode looks like  gives you the idea how it actually works    public void normalize            No need to normalize if already normalized      if  isNormalized              return            if  needsSyncChildren              synchronizeChildren              ChildNode kid      for  kid   firstChild  kid    null  kid   kid nextSibling             kid normalize              isNormalized true       It traverses all the nodes recursively and calls kid normalize   This mechanism is overridden in org apache xerces dom ElementImpl   public void normalize             No need to normalize if already normalized       if  isNormalized               return              if  needsSyncChildren               synchronizeChildren                ChildNode kid  next       for  kid   firstChild  kid    null  kid   next             next   kid nextSibling               If kid is a text node  we need to check for one of two             conditions                1  There is an adjacent text node               2  There is no adjacent text node  but kid is                  an empty text node           if   kid getNodeType      Node TEXT NODE                              If an adjacent text node  merge it with kid              if   next  null  amp  amp  next getNodeType      Node TEXT NODE                                     Text kid  appendData next getNodeValue                      removeChild  next                     next   kid     Don t advance  there might be another                              else                                    If kid is empty  remove it                  if   kid getNodeValue      null    kid getNodeValue   length      0                          removeChild  kid                                                              Otherwise it might be an Element  which is handled recursively          else if  kid getNodeType      Node ELEMENT NODE                 kid normalize                               We must also normalize all of the attributes      if   attributes  null                   for  int i 0  i lt attributes getLength      i                           Node attr   attributes item i                attr normalize                              changed   will have occurred when the removeChild   was done         so does not have to be reissued        isNormalized true         Hope this saves you some time

[java] Normalization in DOM parsing with java - how does it work?

Examples related to java

Examples related to xml

Examples related to dom