What is the difference between SAX and DOM

Question

I read some articles about the XML parsers and came across SAX and DOM   SAX is event-based and DOM is tree model -- I don t understand the differences between these concepts    From what I have understood  event-based means some kind of event happens to the node  Like when one clicks a particular node it will give all the sub nodes rather than loading all the nodes at the same time  But in the case of DOM parsing it will load all the nodes and make the tree model    Is my understanding correct    Please correct me If I am wrong or explain to me event-based and tree model in a simpler manner

User · Answer

In just a few words     SAX  Simple API for XML   Is a stream-based processor  You only have a tiny part in memory at any time and you  sniff  the XML stream by implementing callback code for events like tagStarted   etc  It uses almost no memory  but you can t do  DOM  stuff  like use xpath or traverse trees   DOM  Document Object Model   You load the whole thing into memory - it s a massive memory hog  You can blow memory with even medium sized documents  But you can use xpath and traverse the tree etc

User · Answer

Both SAX and DOM are used to parse the XML document  Both has advantages and disadvantages and can be used in our programming depending on the situation  SAX    Parses node by node Does not store the XML in memory We cant insert or delete a node Top to bottom traversing   DOM   Stores the entire XML document into memory before processing Occupies more memory We can insert or delete nodes Traverse in any direction    If we need to find a node and does not need to insert or delete we can go with SAX itself otherwise DOM provided we have more memory

User · Answer

You re comparing apples and pears  SAX is a parser that parses serialized DOM structures  There are many different parsers  and  event-based  refers to the parsing method   Maybe a small recap is in order    The document object model  DOM  is an abstract data model that describes a hierarchical  tree-based document structure  a document tree consists of nodes  namely element  attribute and text nodes  and some others   Nodes have parents  siblings and children and can be traversed  etc   all the stuff you re used to from doing JavaScript  which incidentally has nothing to do with the DOM   A DOM structure may be serialized  i e  written to a file  using a markup language like HTML or XML  An HTML or XML file thus contains a  written out  or  flattened out  version of an abstract document tree  For a computer to manipulate  or even display  a DOM tree from a file  it has to deserialize  or parse  the file and reconstruct the abstract tree in memory  This is where parsing comes in    Now we come to the nature of parsers  One way to parse would be to read in the entire document and recursively build up a tree structure in memory  and finally expose the entire result to the user   I suppose you could call these parsers  DOM parsers    That would be very handy for the user  I think that s what PHP s XML parser does   but it suffers from scalability problems and becomes very expensive for large documents   On the other hand  event-based parsing  as done by SAX  looks at the file linearly and simply makes call-backs to the user whenever it encounters a structural piece of data  like  this element started    that element ended    some text here   etc  This has the benefit that it can go on forever without concern for the input file size  but it s a lot more low-level because it requires the user to do all the actual processing work  by providing call-backs   To return to your original question  the term  event-based  refers to those parsing events that the parser raises as it traverses the XML file   The Wikipedia article has many details on the stages of SAX parsing

User · Answer

Here in simpler words   DOM   Tree model parser  Object based   Tree of nodes   DOM loads the file into the memory and then parse- the file  Has memory constraints since it loads the whole XML file before parsing  DOM is read and write  can insert or delete nodes   If the XML content is small  then prefer DOM parser  Backward and forward search is possible for searching the tags and evaluation of the  information inside the tags  So this gives the ease of navigation  Slower at run time    SAX   Event based parser  Sequence of events   SAX parses the file as it reads it  i e  parses node by node  No memory constraints as it does not store the XML content in the memory  SAX is read only i e  can   t insert or delete the node  Use SAX parser when memory content is large  SAX reads the XML file from top to bottom and backward navigation is not possible  Faster at run time

User · Answer

I will provide general Q amp A-oriented answer for this question   Answer to Questions     Why do we need XML parser    We need XML parser because we do not want to do everything in our application from scratch  and we need some  helper  programs or libraries to do something very low-level but very necessary to us  These low-level but necessary things include checking the well-formedness  validating the document against its DTD or schema  just for validating parsers   resolving character reference  understanding CDATA sections  and so on  XML parsers are just such  helper  programs and they will do all these jobs  With XML parser  we are shielded from a lot of these complexities and we could concentrate ourselves on just programming at high-level through the API s implemented by the parsers  and thus gain programming efficiency       Which one is better  SAX or DOM     Both SAX and DOM parser have their advantages and disadvantages  Which one is better should depend on the characteristics of your application  please refer to some questions below        Which parser can get better speed  DOM or SAX parsers    SAX parser can get better speed       What s the difference between tree-based API and event-based API    A tree-based API is centered around a tree structure and therefore provides interfaces on components of a tree  which is a DOM document  such as Document interface Node interface  NodeList interface  Element interface  Attr interface and so on  By contrast  however  an event-based API provides interfaces on handlers  There are four handler interfaces  ContentHandler interface  DTDHandler interface  EntityResolver interface and ErrorHandler interface      What is the difference between a DOM Parser and a SAX Parser    DOM parsers and SAX parsers work in different ways     A DOM parser creates a tree structure in memory from the input document and then waits for requests from client  But a SAX parser does not create any internal structure  Instead  it takes the occurrences of components of a input document as events  and tells the client what it reads as it reads through the input document  A DOM parser always serves the client application with the entire document no matter how much is actually needed by the client  But a SAX parser serves the client application always only with pieces of the document at any given time   With DOM parser  method calls in client application have to be explicit and forms a kind of chain  But with SAX  some certain methods  usually overriden by the cient  will be invoked automatically  implicitly  in a way which is called  callback  when some certain events occur  These methods do not have to be called explicitly by the client  though we could call them explicitly       How do we decide on which parser is good    Ideally a good parser should be fast  time efficient  space efficient  rich in functionality and easy to use  But in reality  none of the main parsers have all these features at the same time  For example  a DOM Parser is rich in functionality  because it creates a DOM tree in memory and allows you to access any part of the document repeatedly and allows you to modify the DOM tree   but it is space inefficient when the document is huge  and it takes a little bit long to learn how to work with it  A SAX Parser  however  is much more space efficient in case of big input document  because it creates no internal structure   What s more  it runs faster and is easier to learn than DOM Parser because its API is really simple  But from the functionality point of view  it provides less functions which mean that the users themselves have to take care of more  such as creating their own data structures  By the way  what is a good parser  I think the answer really depends on the characteristics of your application      What are some real world applications where using SAX parser is   advantageous than using DOM parser and vice versa  What are the usual   application for a DOM parser and for a SAX parser    In the following cases  using SAX parser is advantageous than using DOM parser    The input document is too big for available memory  actually in this case SAX is your only choice  You can process the document in small contiguous chunks of input  You do not need the entire document before you can do useful work You just want to use the parser to extract the information of interest  and all your computation will be completely based on the data structures created by yourself  Actually in most of our applications  we create data structures of our own which are usually not as complicated as the DOM tree  From this sense  I think  the chance of using a DOM parser is less than that of using a SAX parser    In the following cases  using DOM parser is advantageous than using SAX parser    Your application needs to access widely separately parts of the document at the same time  Your application may probably use a internal data structure which is almost as complicated as the document itself  Your application has to modify the document repeatedly  Your application has to store the document for a significant amount of time through many method calls    Example  Use a DOM parser or a SAX parser      Assume that an instructor has an XML document containing all the personal information of the students as well as the points his students made in his class  and he is now assigning final grades for the students using an application  What he wants to produce  is a list with the SSN and the grades  Also we assume that in his application  the instructor use no data structure such as arrays to store the student personal information and the points   If the instructor decides to give A s to those who earned the class average or above  and give B s to the others  then he d better to use a DOM parser in his application  The reason is that he has no way to know how much is the class average before the entire document gets processed  What he probably need to do in his application  is first to look through all the students  points and compute the average  and then look through the document again and assign the final grade to each student by comparing the points he earned to the class average   If  however  the instructor adopts such a grading policy that the students who got 90 points or more  are assigned A s and the others are assigned B s  then probably he d better use a SAX parser  The reason is  to assign each student a final grade  he do not need to wait for the entire document to be processed  He could immediately assign a grade to a student once the SAX parser reads the grade of this student   In the above analysis  we assumed that the instructor created no data structure of his own  What if he creates his own data structure  such as an array of strings to store the SSN and an array of integers to sto re the points   In this case  I think SAX is a better choice  before this could save both memory and time as well  yet get the job done   Well  one more consideration on this example  What if what the instructor wants to do is not to print a list  but to save the original document back with the grade of each student updated   In this case  a DOM parser should be a better choice no matter what grading policy he is adopting  He does not need to create any data structure of his own  What he needs to do is to first modify the DOM tree  i e   set value to the  grade  node  and then save the whole modified tree  If he choose to use a SAX parser instead of a DOM parser  then in this case he has to create a data structure which is almost as complicated as a DOM tree before he could get the job done    An Example     Problem statement   Write a Java program to extract all the   information about circles which are elements in a given XML document    We assume that each circle element has three child elements i e   x  y   and radius  as well as a color attribute  A sample document is given   below     lt  xml version  1 0   gt    lt  DOCTYPE shapes    lt  ELEMENT shapes  circle   gt   lt  ELEMENT circle  x y radius  gt   lt  ELEMENT x   PCDATA  gt   lt  ELEMENT y   PCDATA  gt   lt  ELEMENT radius   PCDATA  gt   lt  ATTLIST circle color CDATA  IMPLIED gt    gt    lt shapes gt              lt circle color  BLUE  gt                    lt x gt 20 lt  x gt                   lt y gt 20 lt  y gt                   lt radius gt 20 lt  radius gt              lt  circle gt             lt circle color  RED   gt                   lt x gt 40 lt  x gt                   lt y gt 40 lt  y gt                   lt radius gt 20 lt  radius gt              lt  circle gt   lt  shapes gt     Program with DOMparser  import java io    import org w3c dom    import org apache xerces parsers DOMParser    public class shapes DOM      static int numberOfCircles   0       total number of circles seen    static int x     new int 1000        X-coordinates of the centers    static int y     new int 1000        Y-coordinates of the centers      static int r     new int 1000        radius of the circle    static String color     new String 1000       colors of the circles      public static void main String   args              try              create a DOMParser          DOMParser parser new DOMParser             parser parse args 0                 get the DOM Document object          Document doc parser getDocument                 get all the circle nodes          NodeList nodelist   doc getElementsByTagName  circle             numberOfCircles    nodelist getLength                 retrieve all info about the circles          for int i 0  i lt nodelist getLength    i                      get one circle node             Node node   nodelist item i                   get the color attribute              NamedNodeMap attrs   node getAttributes                if attrs getLength    gt  0                 color i   String attrs getNamedItem  color   getNodeValue                    get the child nodes of a circle node              NodeList childnodelist   node getChildNodes                    get the x and y value              for int j 0  j lt childnodelist getLength    j                     Node childnode   childnodelist item j                  Node textnode   childnode getFirstChild     the only text node                String childnodename childnode getNodeName                    if childnodename equals  x                       x i   Integer parseInt textnode getNodeValue   trim                    else if childnodename equals  y                       y i   Integer parseInt textnode getNodeValue   trim                    else if childnodename equals  radius                       r i   Integer parseInt textnode getNodeValue   trim                                            print the result          System out println  circles   numberOfCircles            for int i 0 i lt numberOfCircles i                   String line                  line line   x   x i    y   y i    r   r i    color   color i                    System out println line                        catch  Exception e   e printStackTrace System err                Program with SAXparser  import java io    import org xml sax    import org xml sax helpers DefaultHandler  import org apache xerces parsers SAXParser    public class shapes SAX extends DefaultHandler       static int numberOfCircles   0       total number of circles seen    static int x     new int 1000        X-coordinates of the centers    static int y     new int 1000        Y-coordinates of the centers    static int r     new int 1000        radius of the circle    static String color     new String 1000       colors of the circles     static int flagX 0       to remember what element has occurred    static int flagY 0       to remember what element has occurred    static int flagR 0       to remember what element has occurred        main method     public static void main String   args             try           shapes SAX SAXHandler   new shapes SAX        an instance of this class          SAXParser parser new SAXParser                create a SAXParser object           parser setContentHandler SAXHandler           register with the ContentHandler           parser parse args 0             catch  Exception e   e printStackTrace System err        catch exeptions             override the startElement   method    public void startElement String uri  String localName                          String rawName  Attributes attributes             if rawName equals  circle                            if a circle element is seen             color numberOfCircles  attributes getValue  color        get the color attribute            else if rawName equals  x            if a x element is seen set the flag as 1              flagX 1           else if rawName equals  y            if a y element is seen set the flag as 2             flagY 1           else if rawName equals  radius       if a radius element is seen set the flag as 3              flagR 1              override the endElement   method    public void endElement String uri  String localName  String rawName                in this example we do not need to do anything else here          if rawName equals  circle                             if a circle element is ended              numberOfCircles     1                              increment the counter              override the characters   method    public void characters char characters    int start  int length             String characterData                  new String characters start length   trim       get the text           if flagX  1              indicate this text is for  lt x gt  element               x numberOfCircles    Integer parseInt characterData                flagX 0                      else if flagY  1        indicate this text is for  lt y gt  element               y numberOfCircles    Integer parseInt characterData                flagY 0                      else if flagR  1        indicate this text is for  lt radius gt  element               r numberOfCircles    Integer parseInt characterData                flagR 0                         override the endDocument   method    public void endDocument                 when the end of document is seen  just print the circle info           System out println  circles   numberOfCircles            for int i 0 i lt numberOfCircles i                   String line                  line line   x   x i    y   y i    r   r i    color   color i                    System out println line

User · Answer

Well  you are close    In SAX  events are triggered when the XML is being parsed  When the parser is parsing the XML  and encounters a tag starting  e g   lt something gt    then it triggers the tagStarted event  actual name of event might differ   Similarly when the end of the tag is met while parsing   lt  something gt    it triggers tagEnded  Using a SAX parser implies you need to handle these events and make sense of the data returned with each event   In DOM  there are no events triggered while parsing  The entire XML is parsed and a DOM tree  of the nodes in the XML  is generated and returned  Once parsed  the user can navigate the tree to access the various data previously embedded in the various nodes in the XML   In general  DOM is easier to use but has an overhead of parsing the entire XML before you can start using it

User · Answer

In practical  book xml   lt bookstore gt     lt book category  cooking  gt       lt title lang  en  gt Everyday Italian lt  title gt       lt author gt Giada De Laurentiis lt  author gt       lt year gt 2005 lt  year gt       lt price gt 30 00 lt  price gt     lt  book gt   lt  bookstore gt     DOM presents the xml document as a the following tree-structure in memory      DOM is W3C standard    DOM parser works on Document Object Model    DOM occupies more memory  preferred for small XML documents   DOM is Easy to navigate either forward or backward         SAX presents the xml document as event based like start element abc  end element abc  SAX is not W3C standard  it was developed by group of developers  SAX does not use memory  preferred for large XML documents  Backward navigation is not possible as it sequentially process the documents  Event happens to a node element and it gives all sub nodes Latin nodus     knot        This XML document  when passed through a SAX parser  will generate a sequence of events like the following   start element  bookstore start element  book with an attribute category equal to cooking start element  title with an attribute lang equal to en Text node  with data equal to Everyday Italian      end element  title       end element  book end element  bookstore

User · Answer

You are correct in your understanding of the DOM based model  The XML file will be loaded as a whole and all its contents will be built as an in-memory representation of the tree the document represents  This can be time- and memory-consuming  depending on how large the input file is  The benefit of this approach is that you can easily query any part of the document  and freely manipulate all the nodes in the tree   The DOM approach is typically used for small XML structures  where small depends on how much horsepower and memory your platform has  that may need to be modified and queried in different ways once they have been loaded   SAX on the other hand is designed to handle XML input of virtually any size  Instead of the XML framework doing the hard work for you in figuring out the structure of the document and preparing potentially lots of objects for all the nodes  attributes etc   SAX completely leaves that to you   What it basically does is read the input from the top and invoke callback methods you provide when certain  events  occur  An event might be hitting an opening tag  an attribute in the tag  finding text inside an element or coming across an end-tag   SAX stubbornly reads the input and tells you what it sees in this fashion  It is up to you to maintain all state-information you require  Usually this means you will build up some sort of state-machine   While this approach to XML processing is a lot more tedious  it can be very powerful  too  Imagine you want to just extract the titles of news articles from a blog feed  If you read this XML using DOM it would load all the article contents  all the images etc  that are contained in the XML into memory  even though you are not even interested in it   With SAX you can just check if the element name is  e  g    title  whenever your  startTag  event method is called  If so  you know that you needs to add whatever the next  elementText  event offers you  When you receive the  endTag  event call  you check again if this is the closing element of the  title   After that  you just ignore all further elements  until either the input ends  or another  startTag  with a name of  title  comes along  And so on     You could read through megabytes and megabytes of XML this way  just extracting the tiny amount of data you need   The negative side of this approach is of course  that you need to do a lot more book-keeping yourself  depending on what data you need to extract and how complicated the XML structure is  Furthermore  you naturally cannot modify the structure of the XML tree  because you never have it in hand as a whole   So in general  SAX is suitable for combing through potentially large amounts of data you receive with a specific  query  in mind  but need not modify  while DOM is more aimed at giving you full flexibility in changing structure and contents  at the expense of higher resource demand

[xml-parsing] What is the difference between SAX and DOM?

Examples related to xml-parsing

Examples related to saxparser

Examples related to domparser