[javascript] Replacing   from javascript dom text node

I am processing xhtml using javascript. I am getting the text content for a div node by concatenating the nodeValue of all child nodes where nodeType == Node.TEXT_NODE.

The resulting string sometimes contains a non-breaking space entity. How do I replace this with a regular space character?

My div looks like this...

<div><b>Expires On</b> Sep 30, 2009 06:30&nbsp;AM</div>

The following suggestions found on the web did not work:

var cleanText = text.replace(/^\xa0*([^\xa0]*)\xa0*$/g,"");


var cleanText = replaceHtmlEntities(text);

var replaceHtmlEntites = (function() {
  var translate_re = /&(nbsp|amp|quot|lt|gt);/g;
  var translate = {
    "nbsp": " ",
    "amp" : "&",
    "quot": "\"",
    "lt"  : "<",
    "gt"  : ">"
  };
  return function(s) {
    return ( s.replace(translate_re, function(match, entity) {
      return translate[entity];
    }) );
  }
})();

Any suggestions?

This question is related to javascript regex html-entities

The answer is


That first line is pretty messed up. It only needs to be:

var cleanText = text.replace(/\xA0/g,' ');

That should be all you need.


If you only need to replace &nbsp; then you can use a far simpler regex:

var textWithNBSpaceReplaced = originalText.replace(/&nbsp;/g, ' ');

Also, there is a typo in your div example, it says &nnbsp; instead of &nbsp;.


I think when you define a function with "var foo = function() {...};", the function is only defined after that line. In other words, try this:

var replaceHtmlEntites = (function() {
  var translate_re = /&(nbsp|amp|quot|lt|gt);/g;
  var translate = {
    "nbsp": " ",
    "amp" : "&",
    "quot": "\"",
    "lt"  : "<",
    "gt"  : ">"
  };
  return function(s) {
    return ( s.replace(translate_re, function(match, entity) {
      return translate[entity];
    }) );
  }
})();

var cleanText = text.replace(/^\xa0*([^\xa0]*)\xa0*$/g,"");
cleanText = replaceHtmlEntities(text);

Edit: Also, only use "var" the first time you declare a variable (you're using it twice on the cleanText variable).

Edit 2: The problem is the spelling of the function name. You have "var replaceHtmlEntites =". It should be "var replaceHtmlEntities ="


i used this, and it worked:

var cleanText = text.replace(/&amp;nbsp;/g,"");

var text = "&quot;&nbsp;&amp;&lt;&gt;";
text = text.replaceHtmlEntites();

String.prototype.replaceHtmlEntites = function() {
var s = this;
var translate_re = /&(nbsp|amp|quot|lt|gt);/g;
var translate = {"nbsp": " ","amp" : "&","quot": "\"","lt"  : "<","gt"  : ">"};
return ( s.replace(translate_re, function(match, entity) {
  return translate[entity];
}) );
};

try this.....this worked for me


Removes everything between & and ; which all such symbols have. if you juts want to get rid of them.

text.replace(/&.*;/g,'');

for me replace doesn't work... try this code:

str = str.split("&quot;").join('"');

Examples related to javascript

need to add a class to an element How to make a variable accessible outside a function? Hide Signs that Meteor.js was Used How to create a showdown.js markdown extension Please help me convert this script to a simple image slider Highlight Anchor Links when user manually scrolls? Summing radio input values How to execute an action before close metro app WinJS javascript, for loop defines a dynamic variable name Getting all files in directory with ajax

Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to html-entities

How to create string with multiple spaces in JavaScript Uses for the '&quot;' entity in HTML How to Code Double Quotes via HTML Codes Is there Unicode glyph Symbol to represent "Search" What's the right way to decode a string that has special HTML entities in it? Which characters need to be escaped in HTML? HTML entity for the middle dot HTML character codes for this ? or this ? What do &lt; and &gt; stand for? Transmitting newline character "\n"