[javascript] Map HTML to JSON

Representing complex HTML documents will be difficult and full of corner cases, but I just wanted to share a couple techniques to show how to get this kind of program started. This answer differs in that it uses data abstraction and the toJSON method to recursively build the result

Below, html2json is a tiny function which takes an HTML node as input and it returns a JSON string as the result. Pay particular attention to how the code is quite flat but it's still plenty capable of building a deeply nested tree structure – all possible with virtually zero complexity

_x000D_
_x000D_
// data Elem = Elem Node_x000D_
_x000D_
const Elem = e => ({_x000D_
  toJSON : () => ({_x000D_
    tagName: _x000D_
      e.tagName,_x000D_
    textContent:_x000D_
      e.textContent,_x000D_
    attributes:_x000D_
      Array.from(e.attributes, ({name, value}) => [name, value]),_x000D_
    children:_x000D_
      Array.from(e.children, Elem)_x000D_
  })_x000D_
})_x000D_
_x000D_
// html2json :: Node -> JSONString_x000D_
const html2json = e =>_x000D_
  JSON.stringify(Elem(e), null, '  ')_x000D_
  _x000D_
console.log(html2json(document.querySelector('main')))
_x000D_
<main>_x000D_
  <h1 class="mainHeading">Some heading</h1>_x000D_
  <ul id="menu">_x000D_
    <li><a href="/a">a</a></li>_x000D_
    <li><a href="/b">b</a></li>_x000D_
    <li><a href="/c">c</a></li>_x000D_
  </ul>_x000D_
  <p>some text</p>_x000D_
</main>
_x000D_
_x000D_
_x000D_

In the previous example, the textContent gets a little butchered. To remedy this, we introduce another data constructor, TextElem. We'll have to map over the childNodes (instead of children) and choose to return the correct data type based on e.nodeType – this gets us a littler closer to what we might need

_x000D_
_x000D_
// data Elem = Elem Node | TextElem Node_x000D_
_x000D_
const TextElem = e => ({_x000D_
  toJSON: () => ({_x000D_
    type:_x000D_
      'TextElem',_x000D_
    textContent:_x000D_
      e.textContent_x000D_
  })_x000D_
})_x000D_
_x000D_
const Elem = e => ({_x000D_
  toJSON : () => ({_x000D_
    type:_x000D_
      'Elem',_x000D_
    tagName: _x000D_
      e.tagName,_x000D_
    attributes:_x000D_
      Array.from(e.attributes, ({name, value}) => [name, value]),_x000D_
    children:_x000D_
      Array.from(e.childNodes, fromNode)_x000D_
  })_x000D_
})_x000D_
_x000D_
// fromNode :: Node -> Elem_x000D_
const fromNode = e => {_x000D_
  switch (e.nodeType) {_x000D_
    case 3:  return TextElem(e)_x000D_
    default: return Elem(e)_x000D_
  }_x000D_
}_x000D_
_x000D_
// html2json :: Node -> JSONString_x000D_
const html2json = e =>_x000D_
  JSON.stringify(Elem(e), null, '  ')_x000D_
  _x000D_
console.log(html2json(document.querySelector('main')))
_x000D_
<main>_x000D_
  <h1 class="mainHeading">Some heading</h1>_x000D_
  <ul id="menu">_x000D_
    <li><a href="/a">a</a></li>_x000D_
    <li><a href="/b">b</a></li>_x000D_
    <li><a href="/c">c</a></li>_x000D_
  </ul>_x000D_
  <p>some text</p>_x000D_
</main>
_x000D_
_x000D_
_x000D_

Anyway, that's just two iterations on the problem. Of course you'll have to address corner cases where they come up, but what's nice about this approach is that it gives you a lot of flexibility to encode the HTML however you wish in JSON – and without introducing too much complexity

In my experience, you could keep iterating with this technique and achieve really good results. If this answer is interesting to anyone and would like me to expand upon anything, let me know ^_^

Related: Recursive methods using JavaScript: building your own version of JSON.stringify

Examples related to javascript

need to add a class to an element How to make a variable accessible outside a function? Hide Signs that Meteor.js was Used How to create a showdown.js markdown extension Please help me convert this script to a simple image slider Highlight Anchor Links when user manually scrolls? Summing radio input values How to execute an action before close metro app WinJS javascript, for loop defines a dynamic variable name Getting all files in directory with ajax

Examples related to html

Embed ruby within URL : Middleman Blog Please help me convert this script to a simple image slider Generating a list of pages (not posts) without the index file Why there is this "clear" class before footer? Is it possible to change the content HTML5 alert messages? Getting all files in directory with ajax DevTools failed to load SourceMap: Could not load content for chrome-extension How to set width of mat-table column in angular? How to open a link in new tab using angular? ERROR Error: Uncaught (in promise), Cannot match any routes. URL Segment

Examples related to json

Use NSInteger as array index Uncaught SyntaxError: Unexpected end of JSON input at JSON.parse (<anonymous>) HTTP POST with Json on Body - Flutter/Dart Importing json file in TypeScript json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 190) Angular 5 Service to read local .json file How to import JSON File into a TypeScript file? Use Async/Await with Axios in React.js Uncaught SyntaxError: Unexpected token u in JSON at position 0 how to remove json object key and value.?