How to use XMLReader in PHP

Question

I have the following XML file  the file is rather large and i haven t been able to get simplexml to open and read the file so i m trying XMLReader with no success in php   lt  xml version  1 0  encoding  ISO-8859-1   gt   lt products gt       lt last updated gt 2009-11-30 13 52 40 lt  last updated gt       lt product gt           lt element 1 gt foo lt  element 1 gt           lt element 2 gt foo lt  element 2 gt           lt element 3 gt foo lt  element 3 gt           lt element 4 gt foo lt  element 4 gt       lt  product gt       lt product gt           lt element 1 gt bar lt  element 1 gt           lt element 2 gt bar lt  element 2 gt           lt element 3 gt bar lt  element 3 gt           lt element 4 gt bar lt  element 4 gt       lt  product gt   lt  products gt    I ve unfortunately not found a good tutorial on this for PHP and would love to see how I can get each element content to store in a database

User · Answer

Simple example   public function productsAction          saveFileName    ceneo xml        filename    this- gt path    saveFileName      if file exists  filename           reader   new XMLReader         reader- gt open  filename         countElements   0       while  reader- gt read              if  reader- gt nodeType    XMLReader  ELEMENT                 nodeName    reader- gt name                     if  reader- gt nodeType    XMLReader  TEXT  amp  amp   empty  nodeName                 switch   nodeName                    case  id                       var dump  reader- gt value                       break                                   if  reader- gt nodeType    XMLReader  END ELEMENT  amp  amp   reader- gt name     offer                  countElements                         reader- gt close        exit print   lt pre gt      var dump  countElements

User · Answer

For xml formatted with attributes         data xml    lt building data gt   lt building address  some address  lat  28 902914  lng  -71 007235    gt   lt building address  some address  lat  48 892342  lng  -75 0423423    gt   lt building address  some address  lat  58 929753  lng  -79 1236987    gt   lt  building data gt    php code    reader   new XMLReader     if    reader- gt open  data xml          die  Failed to open  data xml        while  reader- gt read        if   reader- gt nodeType    XMLReader  ELEMENT  amp  amp   reader- gt name     building          address    reader- gt getAttribute  address         latitude    reader- gt getAttribute  lat         longitude    reader- gt getAttribute  lng        reader- gt close

User · Answer

Most of my XML parsing life is spent extracting nuggets of useful information out of truckloads of XML  Amazon MWS   As such  my answer assumes you want only specific information and you know where it is located   I find the easiest way to use XMLReader is to know which tags I want the information out of and use them  If you know the structure of the XML and it has lots of unique tags  I find that using the first case is the easy  Cases 2 and 3 are just to show you how it can be done for more complex tags  This is extremely fast  I have a discussion of speed over on What is the fastest XML parser in PHP   The most important thing to remember when doing tag-based parsing like this is to use if   myXML- gt nodeType    XMLReader  ELEMENT       - which checks to be sure we re only dealing with opening nodes and not whitespace or closing nodes or whatever   function parseMyXML   xml      pass in an XML string      myXML   new XMLReader         myXML- gt xml  xml        while   myXML- gt read        start reading          if   myXML- gt nodeType    XMLReader  ELEMENT      only opening tags               tag    myXML- gt name    make  tag contain the name of the tag             switch   tag                    case  Tag1     this tag contains no child elements  only the content we need  And it s unique                       variable    myXML- gt readInnerXML      now variable contains the contents of tag1                     break                   case  Tag2     this tag contains child elements  of which we only want one                      while  myXML- gt read        so we tell it to keep reading                         if   myXML- gt nodeType    XMLReader  ELEMENT  amp  amp   myXML- gt name      Amount        and when it finds the amount tag                                 variable2    myXML- gt readInnerXML         put it in  variable2                               break                                                                      break                   case  Tag3     tag3 also has children  which are not unique  but we need two of the children this time                      while  myXML- gt read                              if   myXML- gt nodeType    XMLReader  ELEMENT  amp  amp   myXML- gt name      Amount                                  variable3    myXML- gt readInnerXML                                break                            else if   myXML- gt nodeType    XMLReader  ELEMENT  amp  amp   myXML- gt name      Currency                                  variable4    myXML- gt readInnerXML                                break                                                                      break                                  myXML- gt close

User · Answer

The accepted answer gave me a good start  but brought in more classes and more processing than I would have liked  so this is my interpretation     xml reader   new XMLReader   xml reader- gt open  feed url       move the pointer to the first product while   xml reader- gt read    amp  amp   xml reader- gt name     product        loop through the products while   xml reader- gt name     product            load the current xml element into simplexml and we   re off and running       xml   simplexml load string  xml reader- gt readOuterXML             now you can use your simpleXML object   xml       echo  xml- gt element 1          move the pointer to the next product      xml reader- gt next  product          don   t forget to close the file  xml reader- gt close

User · Answer

This Works Better and Faster For Me   lt html gt   lt head gt   lt script gt  function showRSS str      if  str length  0        document getElementById  quot rssOutput quot   innerHTML  quot  quot       return        if  window XMLHttpRequest           code for IE7   Firefox  Chrome  Opera  Safari     xmlhttp new XMLHttpRequest        else       code for IE6  IE5     xmlhttp new ActiveXObject  quot Microsoft XMLHTTP quot          xmlhttp onreadystatechange function         if  this readyState  4  amp  amp  this status  200          document getElementById  quot rssOutput quot   innerHTML this responseText              xmlhttp open  quot GET quot   quot getrss php q  quot  str true     xmlhttp send       lt  script gt   lt  head gt   lt body gt    lt form gt   lt select onchange  quot showRSS this value  quot  gt   lt option value  quot  quot  gt Select an RSS-feed  lt  option gt   lt option value  quot Google quot  gt Google News lt  option gt   lt option value  quot ZDN quot  gt ZDNet News lt  option gt   lt option value  quot job quot  gt Job lt  option gt   lt  select gt   lt  form gt   lt br gt   lt div id  quot rssOutput quot  gt RSS-feed will be listed here    lt  div gt   lt  body gt   lt  html gt      The Backend File      lt  php   get the q parameter from URL  q   GET  quot q quot       find out which feed was selected if  q   quot Google quot        xml   quot http   news google com news ned us amp topic h amp output rss quot      elseif  q   quot ZDN quot        xml   quot https   www zdnet com news rss xml quot     elseif  q     quot job quot       xml   quot https   ngcareers com feed quot        xmlDoc   new DOMDocument     xmlDoc- gt load  xml      get elements from  quot  lt channel gt  quot   channel  xmlDoc- gt getElementsByTagName  channel  - gt item 0    channel title    channel- gt getElementsByTagName  title   - gt item 0 - gt childNodes- gt item 0 - gt nodeValue   channel link    channel- gt getElementsByTagName  link   - gt item 0 - gt childNodes- gt item 0 - gt nodeValue   channel desc    channel- gt getElementsByTagName  description   - gt item 0 - gt childNodes- gt item 0 - gt nodeValue     output elements from  quot  lt channel gt  quot  echo  quot  lt p gt  lt a href   quot     channel link      quot   gt  quot     channel title    quot  lt  a gt  quot    echo  quot  lt br gt  quot    echo  channel desc    quot  lt  p gt  quot       get and output  quot  lt item gt  quot  elements  x  xmlDoc- gt getElementsByTagName  item      count    x- gt length      print r   x- gt item 0 - gt getElementsByTagName  title  - gt item 0 - gt nodeValue      print r   x- gt item 0 - gt getElementsByTagName  link  - gt item 0 - gt nodeValue      print r   x- gt item 0 - gt getElementsByTagName  description  - gt item 0 - gt nodeValue      return   for   i 0   i  lt    count   i          Title    item title    x- gt item 0 - gt getElementsByTagName  title  - gt item 0 - gt nodeValue      Link    item link    x- gt item 0 - gt getElementsByTagName  link  - gt item 0 - gt nodeValue      Description    item desc    x- gt item 0 - gt getElementsByTagName  description  - gt item 0 - gt nodeValue      Category    item cat    x- gt item 0 - gt getElementsByTagName  category  - gt item 0 - gt nodeValue      echo   quot  lt p gt Title   lt a href   quot     item link      quot   gt  quot     item title    quot  lt  a gt  quot      echo   quot  lt br gt  quot      echo   quot Desc   quot   item desc      echo   quot  lt br gt  quot      echo   quot Category   quot   item cat    quot  lt  p gt  quot        gt

User · Answer

It all depends on how big the unit of work  but I guess you re trying to treat each  lt product  gt  nodes in succession   For that  the simplest way would be to use XMLReader to get to each node  then use SimpleXML to access them  This way  you keep the memory usage low because you re treating one node at a time and you still leverage SimpleXML s ease of use  For instance    z   new XMLReader   z- gt open  data xml      doc   new DOMDocument      move to the first  lt product   gt  node while   z- gt read    amp  amp   z- gt name      product        now that we re at the right depth  hop to the next  lt product  gt  until the end of the tree while   z- gt name      product            either one should work        node   new SimpleXMLElement  z- gt readOuterXML          node   simplexml import dom  doc- gt importNode  z- gt expand    true            now you can use  node without going insane about parsing     var dump  node- gt element 1           go to next  lt product   gt       z- gt next  product        Quick overview of pros and cons of different approaches   XMLReader only   Pros  fast  uses little memory Cons  excessively hard to write and debug  requires lots of userland code to do anything useful  Userland code is slow and prone to error  Plus  it leaves you with more lines of code to maintain   XMLReader   SimpleXML   Pros  doesn t use much memory  only the memory needed to process one node  and SimpleXML is  as the name implies  really easy to use  Cons  creating a SimpleXMLElement object for each node is not very fast  You really have to benchmark it to understand whether it s a problem for you  Even a modest machine would be able to process a thousand nodes per second  though    XMLReader   DOM   Pros  uses about as much memory as SimpleXML  and XMLReader  expand   is faster than creating a new SimpleXMLElement  I wish it was possible to use simplexml import dom   but it doesn t seem to work in that case Cons  DOM is annoying to work with  It s halfway between XMLReader and SimpleXML  Not as complicated and awkward as XMLReader  but light years away from working with SimpleXML    My advice  write a prototype with SimpleXML  see if it works for you  If performance is paramount  try DOM  Stay as far away from XMLReader as possible  Remember that the more code you write  the higher the possibility of you introducing bugs or introducing performance regressions

User · Answer

XMLReader is well documented on PHP site  This is a XML Pull Parser  which means it s used to iterate through nodes  or DOM Nodes  of given XML document  For example  you could go through the entire document you gave like this    lt  php  reader   new XMLReader    if    reader- gt open  data xml          die  Failed to open  data xml       while  reader- gt read           node    reader- gt expand           process  node       reader- gt close      gt    It is then up to you to decide how to deal with the node returned by XMLReader  expand

[php] How to use XMLReader in PHP?

Examples related to php

Examples related to xml

Examples related to parsing

Examples related to simplexml

Examples related to xmlreader