How to get the entire document HTML as a string

Question

Is there a way in JS to get the entire HTML within the html tags  as a string   document documentElement

User · Answer

I always use  document getElementsByTagName  html   0  innerHTML   Probably not the right way but I can understand it when I see it

User · Answer

I tried the various answers to see what is returned   I m using the latest version of Chrome   The suggestion document documentElement innerHTML  returned  lt head gt       lt  body gt      Gaby s suggestion document getElementsByTagName  html   0  innerHTML  returned the same   The suggestion document documentElement outerHTML  returned  lt html gt  lt head gt       lt  body gt  lt  html gt  which is everything apart from the  doctype       You can retrieve the doctype object with document doctype   This returns an object  not a string  so if you need to extract the details as strings for all doctypes up to and including HTML5 it is described here  Get DocType of an HTML as string with Javascript  I only wanted HTML5  so the following was enough for me to create the whole document   alert   lt  DOCTYPE HTML gt       n    document documentElement outerHTML

User · Answer

I am using outerHTML for elements  the main  lt html gt  container   and XMLSerializer for anything else including  lt  DOCTYPE gt   random comments outside the  lt html gt  container  or whatever else might be there  It seems that whitespace isn t preserved outside the  lt html gt  element  so I m adding newlines by default with sep   n     x000D   x000D  function get document html sep   n     x000D      let html       x000D      let xml   new XMLSerializer    x000D      for  let n of document childNodes    x000D          if  n nodeType    Node ELEMENT NODE  x000D              html    n outerHTML   sep  x000D          else x000D              html    xml serializeToString n    sep  x000D        x000D      return html  x000D    x000D   x000D  console log get document html   slice 0  200    x000D   x000D   x000D

User · Answer

I just need doctype html and should work fine in IE11  Edge and Chrome  I used below code it works fine   function downloadPage element  event        var isChrome    Chrome  test navigator userAgent   amp  amp   Google Inc  test navigator vendor        if   navigator userAgent indexOf  MSIE      -1        document documentMode    true             document execCommand  SaveAs    1    page html            event preventDefault          else           if isChrome                element setAttribute  href   data text html charset UTF-8   encodeURIComponent   lt  doctype html gt     document documentElement outerHTML                      element setAttribute  download    page html              and in your anchor tag use like this    lt a href     onclick  downloadPage this event    download gt Download entire page  lt  a gt    Example   x000D   x000D      function downloadPage element  event    x000D       var isChrome    Chrome  test navigator userAgent   amp  amp   Google Inc  test navigator vendor   x000D       x000D       if   navigator userAgent indexOf  MSIE      -1        document documentMode    true     x000D        document execCommand  SaveAs    1    page html    x000D        event preventDefault    x000D         else   x000D        if isChrome    x000D                  element setAttribute  href   data text html charset UTF-8   encodeURIComponent   lt  doctype html gt     document documentElement outerHTML    x000D          x000D        element setAttribute  download    page html    x000D         x000D        x000D  I just need doctype html and should work fine in IE11  Edge and Chrome   x000D   x000D  Lorem ipsum dolor sit amet  consectetur adipiscing elit  sed do eiusmod tempor incididunt ut labore et dolore magna aliqua  Ut enim ad minim veniam  quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat  Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur  Excepteur sint occaecat cupidatat non proident  sunt in culpa qui officia deserunt mollit anim id est laborum  x000D   x000D   lt p gt  x000D   lt a href     onclick  downloadPage this event     download gt  lt h2 gt Download entire page  lt  h2 gt  lt  a gt  lt  p gt  x000D   x000D   lt p gt Some image here lt  p gt  x000D   x000D   lt p gt  lt img src  https   placeimg com 250 150 animals   gt  lt  p gt  x000D   x000D   x000D

User · Answer

To also get things outside the  lt html gt     lt  html gt   most importantly the  lt  DOCTYPE     gt  declaration  you could walk through document childNodes  turning each into a string   const html       document childNodes       map node   gt  nodeToString node        join   n      could use    instead  but whitespace should not matter   function nodeToString node        switch  node nodeType            case node ELEMENT NODE              return node outerHTML         case node TEXT NODE                 Text nodes should probably never be encountered  but handling them anyway              return node textContent         case node COMMENT NODE              return   lt  --  node textContent -- gt           case node DOCUMENT TYPE NODE              return doctypeToString node          default              throw new TypeError  Unexpected node type    node nodeType              I published this code as document-outerhtml on npm     edit Note the code above depends on a function doctypeToString  its implementation could be as follows  code below is published on npm as doctype-to-string    function doctypeToString doctype        if  doctype     null            return                 Checking with instanceof DocumentType might be neater  but how to get a        reference to DocumentType without assuming it to be available globally         To play nice with custom DOM implementations  we resort to duck-typing      if   doctype            doctype nodeType     doctype DOCUMENT TYPE NODE            typeof doctype name      string             typeof doctype publicId      string             typeof doctype systemId      string                  throw new TypeError  Expected a DocumentType             const doctypeString     lt  DOCTYPE   doctype name              doctype publicId     PUBLIC    doctype publicId                     doctype systemId                doctype publicId          SYSTEM          doctype systemId                                  gt       return doctypeString

User · Answer

document documentElement outerHTML

User · Answer

You can do  new XMLSerializer   serializeToString document    in browsers newer than IE 9  See https   caniuse com  feat xml-serializer

User · Answer

PROBABLY ONLY IE   gt      webBrowser1 DocumentText  for FF up from 1 0    serialize current DOM-Tree incl  changes edits to ss-variable var ns   new XMLSerializer    var ss  ns serializeToString document   alert ss substr 0 300     may work in FF   Shows up the VERY FIRST 300 characters from the VERY beginning of source-text  mostly doctype-defs   BUT be aware  that the normal  quot Save As quot -Dialog of FF MIGHT NOT save the current state of the page  rather the originallly loaded X h tml-source-text     a POST-up of ss to some temp-file and redirect to that might deliver a saveable source-text WITH the changes edits prior made to it   Although FF surprises by good recovery on  quot back quot  and a NICE inclusion of states values on  quot Save  as      quot  for input-like FIELDS  textarea etc    not on elements in contenteditable  designMode    If NOT a xhtml- resp  xml-file  mime-type  NOT just filename-extension    one may use document open write close to SET the appr  content to the source-layer  that will be saved on user s save-dialog from the File Save menue of FF  see  http   www w3 org MarkUp 2004 xhtml-faq docwrite resp  https   developer mozilla org en-US docs Web API document write Neutral to questions of X ht ML  try a  quot view-source http       quot  as the value of the src-attrib of an  script-made    iframe  - to access an iframes-document in FF   lt iframe-elementnode gt  contentDocument  see google  quot mdn contentDocument quot  for appr  members  like  textContent  for instance   Got that years ago and no like to crawl for it  If still of urgent need  mention this  that I got to dive in

User · Answer

You have to iterate through the document childNodes and getting the outerHTML content   in VBA it looks like this  For Each e In document ChildNodes     Put ff    e outerHTML  amp  vbCrLf Next e   using this  allows you to get all elements of the web page including  lt   DOCTYPE   node if it exists

User · Answer

The correct way is actually   webBrowser1 DocumentText

User · Answer

I believe document documentElement outerHTML should return that for you   According to MDN  outerHTML is supported in Firefox 11  Chrome 0 2  Internet Explorer 4 0  Opera 7  Safari 1 3  Android  Firefox Mobile 11  IE Mobile  Opera Mobile  and Safari Mobile  outerHTML is in the DOM Parsing and Serialization specification   The MSDN page on the outerHTML property notes that it is supported in IE 5   Colin s answer links to the W3C quirksmode page  which offers a good comparison of cross-browser compatibility  for other DOM features too

User · Answer

Use document documentElement   Same Question answered here   https   stackoverflow com a 7289396 2164160

User · Answer

document documentElement innerHTML

User · Answer

MS added the outerHTML and innerHTML properties some time ago   According to MDN  outerHTML is supported in Firefox 11  Chrome 0 2  Internet Explorer 4 0  Opera 7  Safari 1 3  Android  Firefox Mobile 11  IE Mobile  Opera Mobile  and Safari Mobile  outerHTML is in the DOM Parsing and Serialization specification   See quirksmode for browser compatibility for what will work for you  All support innerHTML   var markup   document documentElement innerHTML  alert markup

User · Answer

You can also do   document getElementsByTagName  html   0  innerHTML   You will not get the Doctype or html tag  but everything else

[javascript] How to get the entire document HTML as a string?

Examples related to javascript

Examples related to html

Examples related to document

Examples related to tostring