[javascript] Find and replace specific text characters across a document with JS

I'm wondering if there is a lightweight way I could use JavaScript or jQuery to sniff out a specific text character across a document; say and find all instances of this character. And then! Write an ability to replace all instances of this with say a $.

I found this snippet for starters:

var str = 'test: '';

str = str.replace(/'/g, "'");

Essentially; I am wanting a solution for a one page document. Grab all instances of X and make it XY. Only text characters.

This question is related to javascript jquery

The answer is


How about this, replacing @ with $:

$("body").children().each(function () {
    $(this).html( $(this).html().replace(/@/g,"$") );
});

http://jsfiddle.net/maximua/jp96C/1/


str.replace(/replacetext/g,'actualtext')

This replaces all instances of replacetext with actualtext


In javascript without using jquery:

document.body.innerText = document.body.innerText.replace('actualword', 'replacementword');

Here is something that might help someone looking for this answer: The following uses jquery it searches the whole document and only replaces the text. for example if we had

<a href="/i-am/123/a/overpopulation">overpopulation</a>

and we wanted to add a span with the class overpop around the word overpopulation

<a href="/i-am/123/a/overpopulation"><span class="overpop">overpopulation</span></a>

we would run the following

        $("*:containsIN('overpopulation')").filter(
            function() {
                return $(this).find("*:contains('" + str + "')").length == 0
            }
        ).html(function(_, html) {
            if (html != 'undefined') {
                return html.replace(/(overpopulation)/gi, '<span class="overpop">$1</span>');
            }

        });

the search is case insensitive searches the whole document and only replaces the text portions in this case we are searching for the string 'overpopulation'

    $.extend($.expr[":"], {
        "containsIN": function(elem, i, match, array) {
            return (elem.textContent || elem.innerText || "").toLowerCase().indexOf((match[3] || "").toLowerCase()) >= 0;
        }
    });

I think you may be overthinking this.

My approach is simple.

Enclose you page with a div tag:

<div id="mydiv">
<!-- you page here -->
</div>

In your javascript:

var html=document.getElementById('mydiv').innerHTML;
html = html.replace(/this/g,"that");
document.getElementById('mydiv').innerHTML=html;

Similar to @max-malik's answer, but without using jQuery, you can also do this using document.createTreeWalker:

_x000D_
_x000D_
button.addEventListener('click', e => {_x000D_
  const treeWalker = document.createTreeWalker(document.body);_x000D_
  while (treeWalker.nextNode()) {_x000D_
    const node = treeWalker.currentNode;_x000D_
    node.textContent = node.textContent.replace(/@/g, '$');_x000D_
  }_x000D_
})
_x000D_
<div>This is an @ that we are @ replacing.</div>_x000D_
<div>This is another @ that we are replacing.</div>_x000D_
<div>_x000D_
  <span>This is an @ in a span in @ div.</span>_x000D_
</div>_x000D_
<br>_x000D_
<input id="button" type="button" value="Replace @ with $" />
_x000D_
_x000D_
_x000D_


Use split and join method

$("#idBut").click(function() {
    $("body").children().each(function() {
        $(this).html($(this).html().split('@').join("$"));
    });
});

here is solution


As you'll be using jQuery anyway, try:

https://github.com/cowboy/jquery-replacetext

Then just do

$("p").replaceText("£", "$")

It seems to do good job of only replacing text and not messing with other elements


My own suggestion is as follows:

function nativeSelector() {
    var elements = document.querySelectorAll("body, body *");
    var results = [];
    var child;
    for(var i = 0; i < elements.length; i++) {
        child = elements[i].childNodes[0];
        if(elements[i].hasChildNodes() && child.nodeType == 3) {
            results.push(child);
        }
    }
    return results;
}

var textnodes = nativeSelector(),
    _nv;
for (var i = 0, len = textnodes.length; i<len; i++){
    _nv = textnodes[i].nodeValue;
    textnodes[i].nodeValue = _nv.replace(/£/g,'€');
}

JS Fiddle demo.

The nativeSelector() function comes from an answer (posted by Anurag) to this question: getElementsByTagName() equivalent for textNodes.


For each element inside document body modify their text using .text(fn) function.

$("body *").text(function() {
    return $(this).text().replace("x", "xy");
});

The best would be to do this server-side or wrap the currency symbols in an element you can select before returning it to the browser, however if neither is an option, you can select all text nodes within the body and do the replace on them. Below i'm doing this using a plugin i wrote 2 years ago that was meant for highlighting text. What i'm doing is finding all occurrences of € and wrapping it in a span with the class currency-symbol, then i'm replacing the text of those spans.

Demo

(function($){

    $.fn.highlightText = function () {
        // handler first parameter
        // is the first parameter a regexp?
        var re,
            hClass,
            reStr,
            argType = $.type(arguments[0]),
            defaultTagName = $.fn.highlightText.defaultTagName;

        if ( argType === "regexp" ) {
            // first argument is a regular expression
            re = arguments[0];
        }       
        // is the first parameter an array?
        else if ( argType === "array" ) {
            // first argument is an array, generate
            // regular expression string for later use
            reStr = arguments[0].join("|");
        }       
        // is the first parameter a string?
        else if ( argType === "string" ) {
            // store string in regular expression string
            // for later use
            reStr = arguments[0];
        }       
        // else, return out and do nothing because this
        // argument is required.
        else {
            return;
        }

        // the second parameter is optional, however,
        // it must be a string or boolean value. If it is 
        // a string, it will be used as the highlight class.
        // If it is a boolean value and equal to true, it 
        // will be used as the third parameter and the highlight
        // class will default to "highlight". If it is undefined,
        // the highlight class will default to "highlight" and 
        // the third parameter will default to false, allowing
        // the plugin to match partial matches.
        // ** The exception is if the first parameter is a regular
        // expression, the third parameter will be ignored.
        argType = $.type(arguments[1]);
        if ( argType === "string" ) {
            hClass = arguments[1];
        }
        else if ( argType === "boolean" ) {
            hClass = "highlight";
            if ( reStr ) {
                reStr = "\\b" + reStr + "\\b";
            }
        }
        else {
            hClass = "highlight";
        }

        if ( arguments[2] && reStr ) {
            reStr = reStr = "\\b" + reStr + "\\b";
        } 

        // if re is not defined ( which means either an array or
        // string was passed as the first parameter ) create the
        // regular expression.
        if (!re) {
            re = new RegExp( "(" + reStr + ")", "ig" );
        }

        // iterate through each matched element
        return this.each( function() {
            // select all contents of this element
            $( this ).find( "*" ).andSelf().contents()

            // filter to only text nodes that aren't already highlighted
            .filter( function () {
                return this.nodeType === 3 && $( this ).closest( "." + hClass ).length === 0;
            })

            // loop through each text node
            .each( function () {
                var output;
                output = this.nodeValue
                    .replace( re, "<" + defaultTagName + " class='" + hClass + "'>$1</" + defaultTagName +">" );
                if ( output !== this.nodeValue ) {
                    $( this ).wrap( "<p></p>" ).parent()
                        .html( output ).contents().unwrap();
                }
            });
        });
    };

    $.fn.highlightText.defaultTagName = "span";

})( jQuery );

$("body").highlightText("€","currency-symbol");
$("span.currency-symbol").text("$");

You can use:

str.replace(/text/g, "replaced text");

ECMAScript 2015+ approach

Pitfalls when solving this task

This seems like an easy task, but you have to take care of several things:

  • Simply replacing the entire HTML kills all DOM functionality, like event listeners
  • Replacing the HTML may also replace <script> or <style> contents, or HTML tags or attributes, which is not always desired
  • Changing the HTML may result in an attack
  • You may want to replace attributes like title and alt (in a controlled manner) as well

Guarding against attacks generally can’t be solved by using the approaches below. E.g. if a fetch call reads a URL from somewhere on the page, then sends a request to that URL, the functions below won’t stop that, since this scenario is inherently unsafe.

Replacing the text contents of all elements

This basically selects all elements that contain normal text, goes through their child nodes — among those are also text nodes —, seeks those text nodes out and replaces their contents.

You can optionally specify a different root target, e.g. replaceOnDocument(/€/g, "$", { target: someElement });; by default, the <body> is chosen.

const replaceOnDocument = (pattern, string, {target = document.body} = {}) => {
  // Handle `string` — see the last section
  [
    target,
    ...target.querySelectorAll("*:not(script):not(noscript):not(style)")
  ].forEach(({childNodes: [...nodes]}) => nodes
    .filter(({nodeType}) => nodeType === document.TEXT_NODE)
    .forEach((textNode) => textNode.textContent = textNode.textContent.replace(pattern, string)));
};

replaceOnDocument(/€/g, "$");

Replacing text nodes, element attributes and properties

Now, this is a little more complex: you need to check three cases: whether a node is a text node, whether it’s an element and its attribute should be replaced, or whether it’s an element and its property should be replaced. A replacer object provides methods for text nodes and for elements.

Before replacing attributes and properties, the replacer needs to check whether the element has a matching attribute; otherwise new attributes get created, undesirably. It also needs to check whether the targeted property is a string, since only strings can be replaced, or whether the matching property to the targeted attribute is not a function, since this may lead to an attack.

In the example below, you can see how to use the extended features: in the optional third argument, you may add an attrs property and a props property, which is an iterable (e.g. an array) each, for the attributes to be replaced and the properties to be replaced, respectively.

You’ll also notice that this snippet uses flatMap. If that’s not supported, use a polyfill or replace it by the reduceconcat, or mapreduceconcat construct, as seen in the linked documentation.

const replaceOnDocument = (() => {
    const replacer = {
      [document.TEXT_NODE](node, pattern, string){
        node.textContent = node.textContent.replace(pattern, string);
      },
      [document.ELEMENT_NODE](node, pattern, string, {attrs, props} = {}){
        attrs.forEach((attr) => {
          if(typeof node[attr] !== "function" && node.hasAttribute(attr)){
            node.setAttribute(attr, node.getAttribute(attr).replace(pattern, string));
          }
        });
        props.forEach((prop) => {
          if(typeof node[prop] === "string" && node.hasAttribute(prop)){
            node[prop] = node[prop].replace(pattern, string);
          }
        });
      }
    };

    return (pattern, string, {target = document.body, attrs: [...attrs] = [], props: [...props] = []} = {}) => {
      // Handle `string` — see the last section
      [
        target,
        ...[
          target,
          ...target.querySelectorAll("*:not(script):not(noscript):not(style)")
        ].flatMap(({childNodes: [...nodes]}) => nodes)
      ].filter(({nodeType}) => replacer.hasOwnProperty(nodeType))
        .forEach((node) => replacer[node.nodeType](node, pattern, string, {
          attrs,
          props
        }));
    };
})();

replaceOnDocument(/€/g, "$", {
  attrs: [
    "title",
    "alt",
    "onerror" // This will be ignored
  ],
  props: [
    "value" // Changing an `<input>`’s `value` attribute won’t change its current value, so the property needs to be accessed here
  ]
});

Replacing with HTML entities

If you need to make it work with HTML entities like &shy;, the above approaches will just literally produce the string &shy;, since that’s an HTML entity and will only work when assigning .innerHTML or using related methods.

So let’s solve it by passing the input string to something that accepts an HTML string: a new, temporary HTMLDocument. This is created by the DOMParser’s parseFromString method; in the end we read its documentElement’s textContent:

string = new DOMParser().parseFromString(string, "text/html").documentElement.textContent;

If you want to use this, choose one of the approaches above, depending on whether or not you want to replace HTML attributes and DOM properties in addition to text; then simply replace the comment // Handle `string` — see the last section by the above line.

Now you can use replaceOnDocument(/Güterzug/g, "G&uuml;ter&shy;zug");.

NB: If you don’t use the string handling code, you may also remove the { } around the arrow function body.

Note that this parses HTML entities but still disallows inserting actual HTML tags, since we’re reading only the textContent. This is also safe against most cases of : since we’re using parseFromString and the page’s document isn’t affected, no <script> gets downloaded and no onerror handler gets executed.

You should also consider using \xAD instead of &shy; directly in your JavaScript string, if it turns out to be simpler.


Vanilla JavaScript solution:

document.body.innerHTML = document.body.innerHTML.replace(/Original/g, "New")