Strip HTML from Text JavaScript

Question

Is there an easy way to take a string of html in JavaScript and strip out the html

User · Answer

The accepted answer works fine mostly  however in IE if the html string is null you get the  null   instead of      Fixed    function strip html       if  html    null  return        var tmp   document createElement  DIV       tmp innerHTML   html     return tmp textContent    tmp innerText

User · Answer

Here s a version which sorta addresses  MikeSamuel s security concern    function strip html       try          var doc   document implementation createDocument  http   www w3 org 1999 xhtml    html   null          doc documentElement innerHTML   html         return doc documentElement textContent  doc documentElement innerText       catch e           return              Note  it will return an empty string if the HTML markup isn t valid XML  aka  tags must be closed and attributes must be quoted   This isn t ideal  but does avoid the issue of having the security exploit potential   If not having valid XML markup is a requirement for you  you could try using   var doc   document implementation createHTMLDocument        but that isn t a perfect solution either for other reasons

User · Answer

If you want to keep the links and the structure of the content  h1  h2  etc  then you should check out TextVersionJS You can use it with any HTML  although it was created to convert an HTML email to plain text   The usage is very simple  For example in node js   var createTextVersion   require  textversionjs    var yourHtml     lt h1 gt Your HTML lt  h1 gt  lt ul gt  lt li gt goes lt  li gt  lt li gt here  lt  li gt  lt  ul gt     var textVersion   createTextVersion yourHtml     Or in the browser with pure js    lt script src  textversion js  gt  lt  script gt   lt script gt    var yourHtml     lt h1 gt Your HTML lt  h1 gt  lt ul gt  lt li gt goes lt  li gt  lt li gt here  lt  li gt  lt  ul gt      var textVersion   createTextVersion yourHtml    lt  script gt    It also works with require js   define   textversionjs    function createTextVersion      var yourHtml     lt h1 gt Your HTML lt  h1 gt  lt ul gt  lt li gt goes lt  li gt  lt li gt here  lt  li gt  lt  ul gt      var textVersion   createTextVersion yourHtml

User · Answer

from CSS tricks  https   css-tricks com snippets javascript strip-html-tags-in-javascript   x000D   x000D  const originalString        lt div gt       lt p gt Hey that s  lt span gt somthing lt  span gt  lt  p gt     lt  div gt      const strippedString   originalString replace    lt     gt     gt   gi        console log strippedString   x000D   x000D   x000D

User · Answer

This should do the work on any Javascript environment  NodeJS included    const text      lt html lang  en  gt     lt head gt       lt style type  text css  gt   color red  lt  style gt       lt script gt alert  hello   lt  script gt     lt  head gt     lt body gt  lt b gt This is some text lt  b gt  lt br  gt  lt body gt   lt  html gt        Remove style tags and content text replace   lt style   gt    gt    lt   style gt  gm             Remove script tags and content      replace   lt script   gt    gt    lt   script gt  gm             Remove all opening  closing and orphan HTML tags      replace   lt    gt    gt  gm             Remove leading spaces and repeated CR LF      replace     r n       gm

User · Answer

I just needed to strip out the  lt a gt  tags and replace them with the text of the link   This seems to work great   htmlContent  htmlContent replace   lt a  href         gt  g       htmlContent  htmlContent replace   lt   a gt  g

User · Answer

I altered Jibberboy2000 s answer to include several  lt BR   gt  tag formats  remove everything inside  lt SCRIPT gt  and  lt STYLE gt  tags  format the resulting HTML by removing multiple line breaks and spaces and convert some HTML-encoded code into normal  After some testing it appears that you can convert most of full web pages into simple text where page title and content are retained   In the simple example    lt  DOCTYPE HTML PUBLIC  -  W3C  DTD HTML 4 01 Transitional  EN  gt   lt html gt   lt  --comment-- gt    lt head gt    lt title gt This is my title lt  title gt   lt meta http-equiv  Content-Type  content  text html  charset iso-8859-1  gt   lt style gt       body  margin-top  15px       a   color   D80C1F  font-weight bold  text-decoration none      lt  style gt   lt  head gt    lt body gt       lt center gt          This string has  lt i gt html lt  i gt  code i want to  lt b gt remove lt  b gt  lt br gt          In this line  lt a href  http   www bbc co uk  gt BBC lt  a gt  with link is mentioned  lt br  gt Now back to  amp quot normal text amp quot  and stuff using  amp lt html encoding amp gt                        lt  center gt   lt  body gt   lt  html gt    becomes     This is my title      This string has html code i want to remove      In this line BBC  http   www bbc co uk  with link is mentioned       Now back to  normal text  and stuff using     The JavaScript function and test page look this   function convertHtmlToText         var inputText   document getElementById  input   value      var returnText        inputText         -- remove BR tags and replace them with line break     returnText returnText replace   lt br gt  gi    n        returnText returnText replace   lt br s   gt  gi    n        returnText returnText replace   lt br   gt  gi    n           -- remove P and A tags but preserve what s inside of them     returnText returnText replace   lt p   gt  gi    n        returnText returnText replace   lt a  href           gt       lt   a gt  gi     2   1            -- remove all inside SCRIPT and STYLE tags     returnText returnText replace   lt script   gt   w W  1         w W  1   lt   script gt  gi           returnText returnText replace   lt style   gt   w W  1         w W  1   lt   style gt  gi             -- remove all else     returnText returnText replace   lt       s    gt  g              -- get rid of more than 2 multiple line breaks      returnText returnText replace         r n  r  n  s   2   gim    n n           -- get rid of more than 2 spaces      returnText   returnText replace          g             -- get rid of html-encoded characters      returnText returnText replace   amp nbsp  gi           returnText returnText replace   amp amp  gi   amp         returnText returnText replace   amp quot  gi           returnText returnText replace   amp lt  gi   lt         returnText returnText replace   amp gt  gi   gt            -- return     document getElementById  output   value   returnText      It was used with this HTML    lt textarea id  input  style  width  400px  height  300px   gt  lt  textarea gt  lt br   gt   lt button onclick  convertHtmlToText    gt CONVERT lt  button gt  lt br   gt   lt textarea id  output  style  width  400px  height  300px   gt  lt  textarea gt  lt br   gt

User · Answer

Simplest way   jQuery html  text      That retrieves all the text from a string of html

User · Answer

https   developer mozilla org en-US docs Web API Element insertAdjacentHTML  var div   document getElementsByTagName  div    for  var i 0  i lt div length  i          div i  insertAdjacentHTML  afterend   div i  innerHTML       document body removeChild div i

User · Answer

I have created a working regular expression myself   str str replace    lt    a-z    s   gt         gt      lt     a-z          gt   lt  DOCTYPE   gt      gt      lt  --  s S    -- gt      lt  a-z       a-z0-9        s   gt       gt      gi

User · Answer

If you re running in a browser  then the easiest way is just to let the browser do it for you    function stripHtml html       let tmp   document createElement  quot DIV quot       tmp innerHTML   html     return tmp textContent    tmp innerText     quot  quot      Note  as folks have noted in the comments  this is best avoided if you don t control the source of the HTML  for example  don t run this on anything that could ve come from user input   For those scenarios  you can still let the browser do the work for you - see Saba s answer on using the now widely-available DOMParser

User · Answer

myString replace   lt    gt    gt   gm

User · Answer

I think the easiest way is to just use Regular Expressions as someone mentioned above  Although there s no reason to use a bunch of them  Try   stringWithHTML   stringWithHTML replace   lt     a-z  a-z0-9     lt  gt    gt  ig

User · Answer

After trying all of the answers mentioned most if not all of them had edge cases and couldn t completely support my needs   I started exploring how php does it and came across the php js lib which replicates the strip tags method here  http   phpjs org functions strip tags

User · Answer

A lot of people have answered this already  but I thought it might be useful to share the function I wrote that strips HTML tags from a string but allows you to include an array of tags that you do not want stripped  It s pretty short and has been working nicely for me   function removeTags string  array     return array   string split   lt    filter function val   return f array  val      map function val   return f array  val      join       string split   lt    map function d   return d split   gt    pop       join        function f array  value       return array map function d   return value includes d     gt        indexOf true     -1     lt     value   value split   gt    1          var x     lt span gt  lt i gt Hello lt  i gt   lt b gt world lt  b gt   lt  span gt    console log removeTags x       Hello world  console log removeTags x    span    i          lt span gt  lt i gt Hello lt  i gt  world  lt  span gt

User · Answer

simple 2 line jquery to strip the html    var content     lt p gt checking the html source amp nbsp  lt  p gt  lt p gt  amp nbsp     lt  p gt  lt p gt with amp nbsp  lt  p gt  lt p gt all lt  p gt  lt p gt the html amp nbsp  lt  p gt  lt p gt content lt  p gt      var text     content  text     It gets you the plain text  console log text    check the data in your console   cj   text area id   val text    set your content to text area using text area id

User · Answer

Another  admittedly less elegant solution than nickf s or Shog9 s  would be to recursively walk the DOM starting at the  lt body gt  tag and append each text node   var bodyContent   document getElementsByTagName  body   0   var result   appendTextNodes bodyContent    function appendTextNodes element        var text               Loop through the childNodes of the passed in element     for  var i   0  len   element childNodes length  i  lt  len  i                 Get a reference to the current child         var node   element childNodes i              Append the node s value if it s a text node         if  node nodeType    3                text    node nodeValue                       Recurse through the node s children  if there are any         if  node childNodes length  gt  0                appendTextNodes node                          Return the final result     return text

User · Answer

With jQuery you can simply retrieving it by using       elementID   text

User · Answer

For easier solution  try this    https   css-tricks com snippets javascript strip-html-tags-in-javascript   var StrippedString   OriginalString replace    lt     gt     gt   ig

User · Answer

function               html2text   function html                if     scratch pad   length     0                        lt div id  lh scratch  gt  lt  div gt    appendTo  body                                return     scratch pad   html html  text                       jQuery     Define this as a jquery plugin and use it like as follows     html2text htmlContent

User · Answer

var text   html replace   lt                        gt      gt     g         This is a regex version  which is more resilient to malformed HTML  like   Unclosed tags  Some text  lt img    lt        inside tag attributes  Some text  lt img alt  x  gt  y  gt   Newlines  Some  lt a href  http   google com  gt   The code  var html     lt br gt This  lt img alt  a gt b   r n src  a b gif    gt is  gt   nmy lt  gt  lt   gt   lt a gt  text  lt  a  var text   html replace   lt                        gt      gt     g

User · Answer

method 1   function cleanHTML str     str replace   lt    lt   lt           gt   gt  g    amp lt  1 amp gt        function uncleanHTML str     str replace   amp lt    lt   amp lt           amp gt   amp gt  g    lt  1 gt         method 2   function cleanHTML str     str replace   lt  g    amp lt    replace   gt  g    amp gt        function uncleanHTML str     str replace   amp lt  g    lt    replace   amp gt  g    gt         also  don t forget if the user happens to post a math comment  ex  1  lt  2   you don t want to strip the whole comment  The browser  only tested chrome  doesn t run unicode as html tags  if you replace all  lt  with  amp lt  everyware in the string  the unicode will display  lt  as text without running any html  I recommend method 2  jquery also works well     element   text

User · Answer

As an extension to the jQuery method  if your string might not contain HTML  eg if you are trying to remove HTML from a form field  jQuery html  text     will return an empty string if there is no HTML Use  jQuery   lt p gt     html     lt  p gt    text     instead  Update  As has been pointed out in the comments  in some circumstances this solution will execute javascript contained within html if the value of html could be influenced by an attacker  use a different solution

User · Answer

Below code allows you to retain some html tags  while stripping all others   function strip tags input  allowed       allowed      allowed                   toLowerCase        match   lt  a-z  a-z0-9   gt  g              join         making sure the allowed arg is a string containing only tags in lowercase   lt a gt  lt b gt  lt c gt      var tags     lt      a-z  a-z0-9    b   gt    gt  gi        commentsAndPhpTags     lt  --  s S   -- gt   lt      php    s S      gt  gi     return input replace commentsAndPhpTags             replace tags  function  0   1              return allowed indexOf   lt      1 toLowerCase       gt     gt  -1    0

User · Answer

Converting HTML for Plain Text emailing keeping hyperlinks  a href  intact  The above function posted by hypoxide works fine  but I was after something that would basically convert HTML created in a Web RichText editor  for example FCKEditor  and clear out all HTML but leave all the Links due the fact that I wanted both the HTML and the plain text version to aid creating the correct parts to an STMP email  both HTML and plain text    After a long time of searching Google myself and my collegues came up with this using the regex engine in Javascript   str  this string has  lt i gt html lt  i gt  code i want to  lt b gt remove lt  b gt  lt br gt Link Number 1 - gt  lt a href  http   www bbc co uk  gt BBC lt  a gt  Link Number 1 lt br gt  lt p gt Now back to normal text and stuff lt  p gt     str str replace   lt br gt  gi    n    str str replace   lt p   gt  gi    n    str str replace   lt a  href           gt       lt   a gt  gi     2  Link- gt  1      str str replace   lt       s    gt  g         the str variable starts out like this   this string has  lt i gt html lt  i gt  code i want to  lt b gt remove lt  b gt  lt br gt Link Number 1 - gt  lt a href  http   www bbc co uk  gt BBC lt  a gt  Link Number 1 lt br gt  lt p gt Now back to normal text and stuff lt  p gt    and then after the code has run it looks like this -  this string has html code i want to remove Link Number 1 - gt  BBC  Link- gt http   www bbc co uk   Link Number 1   Now back to normal text and stuff   As you can see the all the HTML has been removed and the Link have been persevered with the hyperlinked text is still intact  Also I have replaced the  lt p gt  and  lt br gt  tags with  n  newline char  so that some sort of visual formatting has been retained   To change the link format  eg  BBC  Link- gt http   www bbc co uk    just edit the  2  Link- gt  1   where  1 is the href URL URI and the  2 is the hyperlinked text  With the links directly in body of the plain text most SMTP Mail Clients convert these so the user has the ability to click on them   Hope you find this useful

User · Answer

Using Jquery    function stripTags         return     lt p gt  lt  p gt    html textToEscape  text

User · Answer

function stripHTML my string       var charArr     my string split              resultArr               htmlZone    0          quoteZone   0      for  x 0  x  lt  charArr length  x           switch  charArr x    htmlZone   quoteZone           case   lt 00    htmlZone    1 break         case   gt 10    htmlZone    0 resultArr push      break         case   10    quoteZone   1 break         case   10    quoteZone   2 break         case   11            case   12    quoteZone   0 break         default      if  htmlZone   resultArr push charArr x                       return resultArr join          Accounts for   inside attributes and  lt img onerror  javascript  gt  in newly created dom elements   usage   clean string   stripHTML  string with  lt html gt  in it     demo   https   jsfiddle net gaby de wilde pqayphzd   demo of top answer doing the terrible things   https   jsfiddle net gaby de wilde 6f0jymL6 1

User · Answer

I made some modifications to original Jibberboy2000 script Hope it ll be usefull for someone  str      ANY HTML CONTENT HERE      str str replace   lt  s br    gt  gi    n    str str replace   lt  s a  href           gt       lt   a gt  gi     2  Link- gt  1      str str replace   lt  s        gt  ig    n    str str replace    2   gi        str str replace   n  s  gi    n n

User · Answer

function strip html tags str       if   str   null      str               return false    else    str   str toString      return str replace   lt    gt    gt  g

User · Answer

input element support only one line text      The text state represents a one line plain text edit control for the element s value    function stripHtml str      var tmp   document createElement  input      tmp value   str    return tmp value      Update  this works as expected  function stripHtml str         Remove some tags   str   str replace   lt    gt    gt  gim             Remove BB code   str   str replace      w                   1  g    2           Remove html and line breaks   const div   document createElement  div      div innerHTML   str     const input   document createElement  input      input value   div textContent    div innerText           return input value

User · Answer

It is also possible to use the fantastic htmlparser2 pure JS HTML parser  Here is a working demo   var htmlparser   require  htmlparser2     var body     lt p gt  lt div gt This is  lt  div gt a  lt span gt simple  lt  span gt   lt img src  test  gt  lt  img gt example  lt  p gt     var result        var parser   new htmlparser Parser       ontext  function text           result push text             decodeEntities  true     parser write body   parser end     result join        The output will be This is a simple example   See it in action here  https   tonicdev com jfahrenkrug extract-text-from-html  This works in both node and the browser if you pack you web application using a tool like webpack

User · Answer

A safer way to strip the html with jQuery is to first use jQuery parseHTML to create a DOM  ignoring any scripts  before letting jQuery build an element and then retrieving only the text  function stripHtml unsafe        return     parseHTML unsafe   text       Can safely strip html from   lt img src  quot unknown gif quot  onerror  quot console log  running injections    quot  gt   And other exploits  nJoy

User · Answer

For escape characters also this will work using pattern matching   myString replace     amp lt    lt        n     amp gt    gt    gm

User · Answer

An improvement to the accepted answer   function strip html       var tmp   document implementation createHTMLDocument  New   body     tmp innerHTML   html     return tmp textContent    tmp innerText            This way something running like this will do no harm   strip   lt img onerror  alert   could run arbitrary JS here     src bogus gt      Firefox  Chromium and Explorer 9  are safe  Opera Presto is still vulnerable  Also images mentioned in the strings are not downloaded in Chromium and Firefox saving http requests

User · Answer

I would like to share an edited version of the Shog9 s approved answer   As Mike Samuel pointed with a comment  that function can execute inline javascript codes  But Shog9 is right when saying  quot let the browser do it for you    quot  so   here my edited version  using DOMParser  function strip html      let doc   new DOMParser   parseFromString html   text html       return doc body textContent     quot  quot      here the code to test the inline javascript  strip  quot  lt img onerror  alert   quot could run arbitrary JS here  quot    src bogus gt  quot    Also  it does not request resources on parse  like images  strip  quot Just text  lt img src  https   assets rbl ms 4155638 980x jpg  gt  quot

[javascript] Strip HTML from Text JavaScript

Examples related to javascript

Examples related to html

Examples related to string