Parse an HTML string with JS

Question

I searched for a solution but nothing was relevant  so here is my problem   I want to parse a string which contains HTML text  I want to do it in JavaScript   I tried this library but it seems that it parses the HTML of my current page  not from a string  Because when I try the code below  it changes the title of my page   var parser   new HTMLtoDOM   lt html gt  lt head gt  lt title gt titleTest lt  title gt  lt  head gt  lt body gt  lt a href  test0  gt test01 lt  a gt  lt a href  test1  gt test02 lt  a gt  lt a href  test2  gt test03 lt  a gt  lt  body gt  lt  html gt    document     My goal is to extract links from an HTML external page that I read just like a string   Do you know an API to do it

User · Answer

const parse   Range prototype createContextualFragment bind document createRange      document body appendChild  parse   lt p gt  lt strong gt Today is  lt  strong gt  lt  p gt       document body appendChild  parse   lt p style  background   eee  gt   new Date    lt  p gt          Only valid child Nodes within the parent Node  start of the Range  will be parsed  Otherwise  unexpected results may occur       lt body gt  is  parent  Node  start of Range const parseRange   document createRange    const parse   Range prototype createContextualFragment bind parseRange       Returns Text  1 2  because td  tr  tbody are not valid children of  lt body gt  parse   lt td gt 1 lt  td gt   lt td gt 2 lt  td gt     parse   lt tr gt  lt td gt 1 lt  td gt   lt td gt 2 lt  td gt  lt  tr gt     parse   lt tbody gt  lt tr gt  lt td gt 1 lt  td gt   lt td gt 2 lt  td gt  lt  tr gt  lt  tbody gt         Returns  lt table gt   which is a valid child of  lt body gt  parse   lt table gt   lt td gt 1 lt  td gt   lt td gt 2 lt  td gt   lt  table gt     parse   lt table gt   lt tr gt   lt td gt 1 lt  td gt   lt td gt 2 lt  td gt   lt  tr gt   lt  table gt     parse   lt table gt   lt tbody gt   lt td gt 1 lt  td gt   lt td gt 2 lt  td gt   lt  tbody gt   lt  table gt          lt tr gt  is parent Node  start of Range parseRange setStart document createElement  tr    0       Returns   lt td gt    lt td gt   element array parse   lt td gt 1 lt  td gt   lt td gt 2 lt  td gt     parse   lt tr gt   lt td gt 1 lt  td gt   lt td gt 2 lt  td gt   lt  tr gt     parse   lt tbody gt   lt td gt 1 lt  td gt   lt td gt 2 lt  td gt   lt  tbody gt     parse   lt table gt   lt td gt 1 lt  td gt   lt td gt 2 lt  td gt   lt  table gt

User · Answer

The fastest way to parse HTML in Chrome and Firefox is Range createContextualFragment   var range   document createRange    range selectNode document body      required in Safari var fragment   range createContextualFragment   lt h1 gt html    lt  h1 gt     var firstNode   fragment firstChild    I would recommend to create a helper function which uses createContextualFragment if available and falls back to innerHTML otherwise   Benchmark  http   jsperf com domparser-vs-createelement-innerhtml 3

User · Answer

It s quite simple   var parser   new DOMParser    var htmlDoc   parser parseFromString txt   text html       do whatever you want with htmlDoc getElementsByTagName  a      According to MDN  to do this in chrome you need to parse as XML like so   var parser   new DOMParser    var htmlDoc   parser parseFromString txt   text xml       do whatever you want with htmlDoc getElementsByTagName  a      It is currently unsupported by webkit and you d have to follow Florian s answer  and it is unknown to work in most cases on mobile browsers   Edit  Now widely supported

User · Answer

The following function parseHTML will return either    a Document when your file starts with a doctype   a DocumentFragment when your file doesn t start with a doctype     The code   function parseHTML markup        if  markup toLowerCase   trim   indexOf   lt  doctype       0            var doc   document implementation createHTMLDocument  quot  quot            doc documentElement innerHTML   markup          return doc        else if   content  in document createElement  template                Template tag exists         var el   document createElement  template           el innerHTML   markup         return el content        else             Template tag doesn t exist         var docfrag   document createDocumentFragment           var el   document createElement  body           el innerHTML   markup         for  i   0  0  lt  el childNodes length                docfrag appendChild el childNodes i                    return docfrag            How to use   var links   parseHTML   lt  doctype html gt  lt html gt  lt head gt  lt  head gt  lt body gt  lt a gt Link 1 lt  a gt  lt a gt Link 2 lt  a gt  lt  body gt  lt  html gt    getElementsByTagName  a

User · Answer

with this simple code you can do that   let el       lt div gt  lt  div gt       document body  append el   el html   lt html gt  lt head gt  lt title gt titleTest lt  title gt  lt  head gt  lt body gt  lt a href  test0  gt test01 lt  a gt  lt a href  test1  gt test02 lt  a gt  lt a href  test2  gt test03 lt  a gt  lt  body gt  lt  html gt     console log el find  a href  test0

User · Answer

var doc   new DOMParser   parseFromString html   text html    var links   doc querySelectorAll  a

User · Answer

Create a dummy DOM element and add the string to it  Then  you can manipulate it like any DOM element   var el   document createElement   html     el innerHTML     lt html gt  lt head gt  lt title gt titleTest lt  title gt  lt  head gt  lt body gt  lt a href  test0  gt test01 lt  a gt  lt a href  test1  gt test02 lt  a gt  lt a href  test2  gt test03 lt  a gt  lt  body gt  lt  html gt     el getElementsByTagName   a        Live NodeList of your anchor elements   Edit  adding a jQuery answer to please the fans   var el        lt div gt  lt  div gt      el html   lt html gt  lt head gt  lt title gt titleTest lt  title gt  lt  head gt  lt body gt  lt a href  test0  gt test01 lt  a gt  lt a href  test1  gt test02 lt  a gt  lt a href  test2  gt test03 lt  a gt  lt  body gt  lt  html gt         a   el     All the anchor elements

User · Answer

1 Way Use document cloneNode   Performance is  Call to document cloneNode   took  0 22499999977299012 milliseconds  and maybe will be more   x000D   x000D  var t0  t1  html   t0   performance now       html   document cloneNode true   t1   performance now     console log  Call to doSomething took      t1 - t0      milliseconds     html documentElement innerHTML     lt  DOCTYPE html gt  lt html gt  lt head gt  lt title gt Test lt  title gt  lt  head gt  lt body gt  lt div id  test1  gt test1 lt  div gt  lt  body gt  lt  html gt     console log html getElementById  test1     x000D   x000D   x000D   2 Way Use document implementation createHTMLDocument   Performance is  Call to document implementation createHTMLDocument   took  0 14000000010128133 milliseconds   x000D   x000D  var t0  t1  html   t0   performance now    html   document implementation createHTMLDocument  test    t1   performance now     console log  Call to doSomething took      t1 - t0      milliseconds     html documentElement innerHTML     lt  DOCTYPE html gt  lt html gt  lt head gt  lt title gt Test lt  title gt  lt  head gt  lt body gt  lt div id  test1  gt test1 lt  div gt  lt  body gt  lt  html gt     console log html getElementById  test1     x000D   x000D   x000D   3 Way Use document implementation createDocument   Performance is  Call to document implementation createHTMLDocument   took  0 14000000010128133 milliseconds  var t0   performance now      html   document implementation createDocument      html                 document implementation createDocumentType  html                       var t1   performance now     console log  quot Call to doSomething took  quot     t1 - t0     quot  milliseconds  quot    html documentElement innerHTML     lt html gt  lt head gt  lt title gt Test lt  title gt  lt  head gt  lt body gt  lt div id  quot test1 quot  gt test lt  div gt  lt  body gt  lt  html gt     console log html getElementById  quot test1 quot      4 Way Use new Document   Performance is  Call to document implementation createHTMLDocument   took  0 13499999840860255 milliseconds   Note  ParentNode append is experimental technology in 2020 year  var t0  t1  html   t0   performance now      --------------- html   new Document     html append    html implementation createDocumentType  html                   html append    html createElement  html        --------------- t1   performance now     console log  quot Call to doSomething took  quot     t1 - t0     quot  milliseconds  quot    html documentElement innerHTML     lt html gt  lt head gt  lt title gt Test lt  title gt  lt  head gt  lt body gt  lt div id  quot test1 quot  gt test1 lt  div gt  lt  body gt  lt  html gt     console log html getElementById  quot test1 quot

User · Answer

If you re open to using jQuery  it has some nice facilities for creating detached DOM elements from strings of HTML  These can then be queried through the usual means  E g    var html     lt html gt  lt head gt  lt title gt titleTest lt  title gt  lt  head gt  lt body gt  lt a href  test0  gt test01 lt  a gt  lt a href  test1  gt test02 lt  a gt  lt a href  test2  gt test03 lt  a gt  lt  body gt  lt  html gt    var anchors       lt div  gt    append html  find  a   get        Edit - just saw  Florian s answer which is correct  This is basically exactly what he said  but with jQuery

User · Answer

let content    quot  lt center gt  lt h1 gt 404 Not Found lt  h1 gt  lt  center gt  quot  let result      quot  lt div  gt  quot   html content  text    content   lt center gt  lt h1 gt 404 Not Found lt  h1 gt  lt  center gt   result   quot 404 Not Found quot

User · Answer

EDIT  The solution below is only for HTML  fragments  since html head and body are removed  I guess the solution for this question is DOMParser s parseFromString   method     For HTML fragments  the solutions listed here works for most HTML  however for certain cases it won t work   For example try parsing  lt td gt Test lt  td gt   This one won t work on the div innerHTML solution nor DOMParser prototype parseFromString nor range createContextualFragment solution  The td tag goes missing and only the text remains   Only jQuery handles that case well   So the future solution  MS Edge 13   is to use template tag   function parseHTML html        var t   document createElement  template        t innerHTML   html      return t content cloneNode true      var documentFragment   parseHTML   lt td gt Test lt  td gt       For older browsers I have extracted jQuery s parseHTML   method into an independent gist - https   gist github com Munawwar 6e6362dbdf77c7865a99

[javascript] Parse an HTML string with JS

Examples related to javascript

Examples related to html

Examples related to dom

Examples related to html-parsing