[javascript] Remove HTML Tags in Javascript with Regex

I am trying to remove all the html tags out of a string in Javascript. Heres what I have... I can't figure out why its not working....any know what I am doing wrong?

<script type="text/javascript">

var regex = "/<(.|\n)*?>/";
var body = "<p>test</p>";
var result = body.replace(regex, "");
alert(result);

</script>

Thanks a lot!

This question is related to javascript regex

The answer is


my simple JavaScript library called FuncJS has a function called "strip_tags()" which does the task for you — without requiring you to enter any regular expressions.

For example, say that you want to remove tags from a sentence - with this function, you can do it simply like this:

strip_tags("This string <em>contains</em> <strong>a lot</strong> of tags!");

This will produce "This string contains a lot of tags!".

For a better understanding, please do read the documentation at GitHub FuncJS.

Additionally, if you'd like, please provide some feedback through the form. It would be very helpful to me!


This worked for me.

   var regex = /(&nbsp;|<([^>]+)>)/ig
      ,   body = tt
     ,   result = body.replace(regex, "");
       alert(result);

Here is how TextAngular (WYSISYG Editor) is doing it. I also found this to be the most consistent answer, which is NO REGEX.

@license textAngular
Author : Austin Anderson
License : 2013 MIT
Version 1.5.16
// turn html into pure text that shows visiblity
function stripHtmlToText(html)
{
    var tmp = document.createElement("DIV");
    tmp.innerHTML = html;
    var res = tmp.textContent || tmp.innerText || '';
    res.replace('\u200B', ''); // zero width space
    res = res.trim();
    return res;
}

The selected answer doesn't always ensure that HTML is stripped, as it's still possible to construct an invalid HTML string through it by crafting a string like the following.

  "<<h1>h1>foo<<//</h1>h1/>"

This input will ensure that the stripping assembles a set of tags for you and will result in:

  "<h1>foo</h1>"

additionally jquery's text function will strip text not surrounded by tags.

Here's a function that uses jQuery but should be more robust against both of these cases:

var stripHTML = function(s) {
    var lastString;

    do {            
        s = $('<div>').html(lastString = s).text();
    } while(lastString !== s) 

    return s;
};

The way I do it is practically a one-liner.

The function creates a Range object and then creates a DocumentFragment in the Range with the string as the child content.

Then it grabs the text of the fragment, removes any "invisible"/zero-width characters, and trims it of any leading/trailing white space.

I realize this question is old, I just thought my solution was unique and wanted to share. :)

function getTextFromString(htmlString) {
    return document
        .createRange()
        // Creates a fragment and turns the supplied string into HTML nodes
        .createContextualFragment(htmlString)
        // Gets the text from the fragment
        .textContent
        // Removes the Zero-Width Space, Zero-Width Joiner, Zero-Width No-Break Space, Left-To-Right Mark, and Right-To-Left Mark characters
        .replace(/[\u200B-\u200D\uFEFF\u200E\u200F]/g, '')
        // Trims off any extra space on either end of the string
        .trim();
}

var cleanString = getTextFromString('<p>Hello world! I <em>love</em> <strong>JavaScript</strong>!!!</p>');

alert(cleanString);

This is an old question, but I stumbled across it and thought I'd share the method I used:

var body = '<div id="anid">some <a href="link">text</a></div> and some more text';
var temp = document.createElement("div");
temp.innerHTML = body;
var sanitized = temp.textContent || temp.innerText;

sanitized will now contain: "some text and some more text"

Simple, no jQuery needed, and it shouldn't let you down even in more complex cases.


Like others have stated, regex will not work. Take a moment to read my article about why you cannot and should not try to parse html with regex, which is what you're doing when you're attempting to strip html from your source string.


you can use a powerful library for management String which is undrescore.string.js

_('a <a href="#">link</a>').stripTags()

=> 'a link'

_('a <a href="#">link</a><script>alert("hello world!")</script>').stripTags()

=> 'a linkalert("hello world!")'

Don't forget to import this lib as following :

        <script src="underscore.js" type="text/javascript"></script>
        <script src="underscore.string.js" type="text/javascript"></script>
        <script type="text/javascript"> _.mixin(_.str.exports())</script>

For a proper HTML sanitizer in JS, see http://code.google.com/p/google-caja/wiki/JsHtmlSanitizer


<html>
<head>
<script type="text/javascript">
function striptag(){
var html = /(<([^>]+)>)/gi;
for (i=0; i < arguments.length; i++)
arguments[i].value=arguments[i].value.replace(html, "")
}
</script>
</head> 
<body>
       <form name="myform">
<textarea class="comment" title="comment" name=comment rows=4 cols=40></textarea><br>
<input type="button" value="Remove HTML Tags" onClick="striptag(this.form.comment)">
</form>
</body>
</html>

This is a solution for HTML tag and &nbsp etc and you can remove and add conditions to get the text without HTML and you can replace it by any.

convertHtmlToText(passHtmlBlock)
{
   str = str.toString();
  return str.replace(/<[^>]*(>|$)|&nbsp;|&zwnj;|&raquo;|&laquo;|&gt;/g, 'ReplaceIfYouWantOtherWiseKeepItEmpty');
}