[javascript] How to get the entire document HTML as a string?

Is there a way in JS to get the entire HTML within the html tags, as a string?

document.documentElement.??

This question is related to javascript html document tostring

The answer is


You have to iterate through the document childNodes and getting the outerHTML content.

in VBA it looks like this

For Each e In document.ChildNodes
    Put ff, , e.outerHTML & vbCrLf
Next e

using this, allows you to get all elements of the web page including < !DOCTYPE > node if it exists


I tried the various answers to see what is returned. I'm using the latest version of Chrome.

The suggestion document.documentElement.innerHTML; returned <head> ... </body>

Gaby's suggestion document.getElementsByTagName('html')[0].innerHTML; returned the same.

The suggestion document.documentElement.outerHTML; returned <html><head> ... </body></html> which is everything apart from the 'doctype'.

You can retrieve the doctype object with document.doctype; This returns an object, not a string, so if you need to extract the details as strings for all doctypes up to and including HTML5 it is described here: Get DocType of an HTML as string with Javascript

I only wanted HTML5, so the following was enough for me to create the whole document:

alert('<!DOCTYPE HTML>' + '\n' + document.documentElement.outerHTML);


To also get things outside the <html>...</html>, most importantly the <!DOCTYPE ...> declaration, you could walk through document.childNodes, turning each into a string:

const html = [...document.childNodes]
    .map(node => nodeToString(node))
    .join('\n') // could use '' instead, but whitespace should not matter.

function nodeToString(node) {
    switch (node.nodeType) {
        case node.ELEMENT_NODE:
            return node.outerHTML
        case node.TEXT_NODE:
            // Text nodes should probably never be encountered, but handling them anyway.
            return node.textContent
        case node.COMMENT_NODE:
            return `<!--${node.textContent}-->`
        case node.DOCUMENT_TYPE_NODE:
            return doctypeToString(node)
        default:
            throw new TypeError(`Unexpected node type: ${node.nodeType}`)
    }
}

I published this code as document-outerhtml on npm.


edit Note the code above depends on a function doctypeToString; its implementation could be as follows (code below is published on npm as doctype-to-string):

function doctypeToString(doctype) {
    if (doctype === null) {
        return ''
    }
    // Checking with instanceof DocumentType might be neater, but how to get a
    // reference to DocumentType without assuming it to be available globally?
    // To play nice with custom DOM implementations, we resort to duck-typing.
    if (!doctype
        || doctype.nodeType !== doctype.DOCUMENT_TYPE_NODE
        || typeof doctype.name !== 'string'
        || typeof doctype.publicId !== 'string'
        || typeof doctype.systemId !== 'string'
    ) {
        throw new TypeError('Expected a DocumentType')
    }
    const doctypeString = `<!DOCTYPE ${doctype.name}`
        + (doctype.publicId ? ` PUBLIC "${doctype.publicId}"` : '')
        + (doctype.systemId
            ? (doctype.publicId ? `` : ` SYSTEM`) + ` "${doctype.systemId}"`
            : ``)
        + `>`
    return doctypeString
}


I believe document.documentElement.outerHTML should return that for you.

According to MDN, outerHTML is supported in Firefox 11, Chrome 0.2, Internet Explorer 4.0, Opera 7, Safari 1.3, Android, Firefox Mobile 11, IE Mobile, Opera Mobile, and Safari Mobile. outerHTML is in the DOM Parsing and Serialization specification.

The MSDN page on the outerHTML property notes that it is supported in IE 5+. Colin's answer links to the W3C quirksmode page, which offers a good comparison of cross-browser compatibility (for other DOM features too).


The correct way is actually:

webBrowser1.DocumentText


I just need doctype html and should work fine in IE11, Edge and Chrome. I used below code it works fine.

function downloadPage(element, event) {
    var isChrome = /Chrome/.test(navigator.userAgent) && /Google Inc/.test(navigator.vendor);

    if ((navigator.userAgent.indexOf("MSIE") != -1) || (!!document.documentMode == true)) {
        document.execCommand('SaveAs', '1', 'page.html');
        event.preventDefault();
    } else {
        if(isChrome) {
            element.setAttribute('href','data:text/html;charset=UTF-8,'+encodeURIComponent('<!doctype html>' + document.documentElement.outerHTML));
        }
        element.setAttribute('download', 'page.html');
    }
}

and in your anchor tag use like this.

<a href="#" onclick="downloadPage(this,event);" download>Download entire page.</a>

Example

_x000D_
_x000D_
    function downloadPage(element, event) {_x000D_
     var isChrome = /Chrome/.test(navigator.userAgent) && /Google Inc/.test(navigator.vendor);_x000D_
    _x000D_
     if ((navigator.userAgent.indexOf("MSIE") != -1) || (!!document.documentMode == true)) {_x000D_
      document.execCommand('SaveAs', '1', 'page.html');_x000D_
      event.preventDefault();_x000D_
     } else {_x000D_
      if(isChrome) {_x000D_
                element.setAttribute('href','data:text/html;charset=UTF-8,'+encodeURIComponent('<!doctype html>' + document.documentElement.outerHTML));_x000D_
      }_x000D_
      element.setAttribute('download', 'page.html');_x000D_
     }_x000D_
    }
_x000D_
I just need doctype html and should work fine in IE11, Edge and Chrome. _x000D_
_x000D_
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum._x000D_
_x000D_
<p>_x000D_
<a href="#" onclick="downloadPage(this,event);"  download><h2>Download entire page.</h2></a></p>_x000D_
_x000D_
<p>Some image here</p>_x000D_
_x000D_
<p><img src="https://placeimg.com/250/150/animals"/></p>
_x000D_
_x000D_
_x000D_


Use document.documentElement.

Same Question answered here: https://stackoverflow.com/a/7289396/2164160


I always use

document.getElementsByTagName('html')[0].innerHTML

Probably not the right way but I can understand it when I see it.


document.documentElement.outerHTML

document.documentElement.innerHTML

You can also do:

document.getElementsByTagName('html')[0].innerHTML

You will not get the Doctype or html tag, but everything else...


You can do

new XMLSerializer().serializeToString(document)

in browsers newer than IE 9

See https://caniuse.com/#feat=xml-serializer


PROBABLY ONLY IE:

>     webBrowser1.DocumentText

for FF up from 1.0:

//serialize current DOM-Tree incl. changes/edits to ss-variable
var ns = new XMLSerializer();
var ss= ns.serializeToString(document);
alert(ss.substr(0,300));

may work in FF. (Shows up the VERY FIRST 300 characters from the VERY beginning of source-text, mostly doctype-defs.)

BUT be aware, that the normal "Save As"-Dialog of FF MIGHT NOT save the current state of the page, rather the originallly loaded X/h/tml-source-text !! (a POST-up of ss to some temp-file and redirect to that might deliver a saveable source-text WITH the changes/edits prior made to it.)

Although FF surprises by good recovery on "back" and a NICE inclusion of states/values on "Save (as) ..." for input-like FIELDS, textarea etc. , not on elements in contenteditable/ designMode...

If NOT a xhtml- resp. xml-file (mime-type, NOT just filename-extension!), one may use document.open/write/close to SET the appr. content to the source-layer, that will be saved on user's save-dialog from the File/Save menue of FF. see: http://www.w3.org/MarkUp/2004/xhtml-faq#docwrite resp.

https://developer.mozilla.org/en-US/docs/Web/API/document.write

Neutral to questions of X(ht)ML, try a "view-source:http://..." as the value of the src-attrib of an (script-made!?) iframe, - to access an iframes-document in FF:

<iframe-elementnode>.contentDocument, see google "mdn contentDocument" for appr. members, like 'textContent' for instance. 'Got that years ago and no like to crawl for it. If still of urgent need, mention this, that I got to dive in ...


I am using outerHTML for elements (the main <html> container), and XMLSerializer for anything else including <!DOCTYPE>, random comments outside the <html> container, or whatever else might be there. It seems that whitespace isn't preserved outside the <html> element, so I'm adding newlines by default with sep="\n".

_x000D_
_x000D_
function get_document_html(sep="\n") {_x000D_
    let html = "";_x000D_
    let xml = new XMLSerializer();_x000D_
    for (let n of document.childNodes) {_x000D_
        if (n.nodeType == Node.ELEMENT_NODE)_x000D_
            html += n.outerHTML + sep;_x000D_
        else_x000D_
            html += xml.serializeToString(n) + sep;_x000D_
    }_x000D_
    return html;_x000D_
}_x000D_
_x000D_
console.log(get_document_html().slice(0, 200));
_x000D_
_x000D_
_x000D_


Examples related to javascript

need to add a class to an element How to make a variable accessible outside a function? Hide Signs that Meteor.js was Used How to create a showdown.js markdown extension Please help me convert this script to a simple image slider Highlight Anchor Links when user manually scrolls? Summing radio input values How to execute an action before close metro app WinJS javascript, for loop defines a dynamic variable name Getting all files in directory with ajax

Examples related to html

Embed ruby within URL : Middleman Blog Please help me convert this script to a simple image slider Generating a list of pages (not posts) without the index file Why there is this "clear" class before footer? Is it possible to change the content HTML5 alert messages? Getting all files in directory with ajax DevTools failed to load SourceMap: Could not load content for chrome-extension How to set width of mat-table column in angular? How to open a link in new tab using angular? ERROR Error: Uncaught (in promise), Cannot match any routes. URL Segment

Examples related to document

Selenium wait until document is ready GetElementByID - Multiple IDs Difference between $(document.body) and $('body') How do I convert a org.w3c.dom.Document object to a String? Convert Mongoose docs to json Create auto-numbering on images/figures in MS Word Accessing elements by type in javascript $(window).scrollTop() vs. $(document).scrollTop() Get IFrame's document, from JavaScript in main document "Access is denied" JavaScript error when trying to access the document object of a programmatically-created <iframe> (IE-only)

Examples related to tostring

How do I print my Java object without getting "SomeType@2f92e0f4"? What is the meaning of ToString("X2")? Printing out a linked list using toString ToString() function in Go to_string is not a member of std, says g++ (mingw) Confused about __str__ on list in Python How to convert an int array to String with toString method in Java Override valueof() and toString() in Java enum Checking session if empty or not Converting an object to a string