[javascript] Replace all non Alpha Numeric characters, New Lines, and multiple White Space with one Space

I'm looking for a neat RegEx solution to replace

  • All non Alpha-Numeric Characters
  • All NewLines
  • All multiple instances of white space

With a single space


For those playing at home (the following does work)

text.replace(/[^a-z0-9]/gmi, " ").replace(/\s+/g, " ");

My thinking is RegEx is probably powerful enough to achieve this in one statement. The components i think id need are

  • [^a-z0-9] - to Remove non Alpha-Numeric characters
  • \s+ - match any collections of spaces
  • \r?\n|\r - match all new line
  • /gmi - global, multi-line, case insensitive

However, i cant seem to style the regex in the right way (the following doesn't work)

text.replace(/[^a-z0-9]|\s+|\r?\n|\r/gmi, " ");


Input

234&^%,Me,2 2013 1080p x264 5 1 BluRay
S01(*&asd 05
S1E5
1x05
1x5


Desired Output

234 Me 2 2013 1080p x264 5 1 BluRay S01 asd 05 S1E5 1x05 1x5

This question is related to javascript regex replace alphanumeric

The answer is


Jonny 5 beat me to it. I was going to suggest using the \W+ without the \s as in text.replace(/\W+/g, " "). This covers white space as well.


Well I think you just need to add a quantifier to each pattern. Also the carriage-return thing is a little funny:

text.replace(/[^a-z0-9]+|\s+/gmi, " ");

edit The \s thing matches \r and \n too.


This is an old post of mine, the accepted answers are good for the most part. However i decided to benchmark each solution and another obvious one (just for fun). I wondered if there was a difference between the regex patterns on different browsers with different sized strings.

So basically i used jsPerf on

  • Testing in Chrome 65.0.3325 / Windows 10 0.0.0
  • Testing in Edge 16.16299.0 / Windows 10 0.0.0

The regex patterns i tested were

  • /[\W_]+/g
  • /[^a-z0-9]+/gi
  • /[^a-zA-Z0-9]+/g

I loaded them up with a string length of random characters

  • length 5000
  • length 1000
  • length 200

Example javascript i used var newstr = str.replace(/[\W_]+/g," ");

Each run consisted of 50 or more sample on each regex, and i run them 5 times on each browser.

Lets race our horses!

Results

                                Chrome                  Edge
Chars   Pattern                 Ops/Sec     Deviation   Op/Sec      Deviation
------------------------------------------------------------------------
5,000   /[\W_]+/g                19,977.80  1.09         10,820.40  1.32
5,000   /[^a-z0-9]+/gi           19,901.60  1.49         10,902.00  1.20
5,000   /[^a-zA-Z0-9]+/g         19,559.40  1.96         10,916.80  1.13
------------------------------------------------------------------------
1,000   /[\W_]+/g                96,239.00  1.65         52,358.80  1.41
1,000   /[^a-z0-9]+/gi           97,584.40  1.18         52,105.00  1.60
1,000   /[^a-zA-Z0-9]+/g         96,965.80  1.10         51,864.60  1.76
------------------------------------------------------------------------
  200   /[\W_]+/g               480,318.60  1.70        261,030.40  1.80
  200   /[^a-z0-9]+/gi          476,177.80  2.01        261,751.60  1.96
  200   /[^a-zA-Z0-9]+/g        486,423.00  0.80        258,774.20  2.15

Truth be known, Regex in both browsers (taking into consideration deviation) were nearly indistinguishable, however i think if it run this even more times the results would become a little more clearer (but not by much).

Theoretical scaling for 1 character

                            Chrome                        Edge
Chars   Pattern             Ops/Sec     Scaled            Op/Sec    Scaled
------------------------------------------------------------------------
5,000   /[\W_]+/g            19,977.80  99,889,000       10,820.40  54,102,000
5,000   /[^a-z0-9]+/gi       19,901.60  99,508,000       10,902.00  54,510,000
5,000   /[^a-zA-Z0-9]+/g     19,559.40  97,797,000       10,916.80  54,584,000
------------------------------------------------------------------------

1,000   /[\W_]+/g            96,239.00  96,239,000       52,358.80  52,358,800
1,000   /[^a-z0-9]+/gi       97,584.40  97,584,400       52,105.00  52,105,000
1,000   /[^a-zA-Z0-9]+/g     96,965.80  96,965,800       51,864.60  51,864,600
------------------------------------------------------------------------

  200   /[\W_]+/g           480,318.60  96,063,720      261,030.40  52,206,080
  200   /[^a-z0-9]+/gi      476,177.80  95,235,560      261,751.60  52,350,320
  200   /[^a-zA-Z0-9]+/g    486,423.00  97,284,600      258,774.20  51,754,840

I wouldn't take to much into these results as this is not really a significant differences, all we can really tell is edge is slower :o . Additionally that i was super bored.

Anyway you can run the benchmark for your self.

Jsperf Benchmark here


Since [^a-z0-9] character class contains all that is not alnum, it contains white characters too!

 text.replace(/[^a-z0-9]+/gi, " ");

For anyone still strugging (like me...) after the above more expert replies, this works in Visual Studio 2019:

outputString = Regex.Replace(inputString, @"\W", "_");

Remember to add

using System.Text.RegularExpressions;

To replace with dashes, do the following:

text.replace(/[\W_-]/g,' ');

A saw a different post that also had diacritical marks, which is great

s.replace(/[^a-zA-Z0-9À-ž\s]/g, "")


Examples related to javascript

need to add a class to an element How to make a variable accessible outside a function? Hide Signs that Meteor.js was Used How to create a showdown.js markdown extension Please help me convert this script to a simple image slider Highlight Anchor Links when user manually scrolls? Summing radio input values How to execute an action before close metro app WinJS javascript, for loop defines a dynamic variable name Getting all files in directory with ajax

Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to replace

How do I find and replace all occurrences (in all files) in Visual Studio Code? How to find and replace with regex in excel How to replace text in a column of a Pandas dataframe? How to replace negative numbers in Pandas Data Frame by zero Replacing few values in a pandas dataframe column with another value How to replace multiple patterns at once with sed? Using tr to replace newline with space replace special characters in a string python Replace None with NaN in pandas dataframe Batch script to find and replace a string in text file within a minute for files up to 12 MB

Examples related to alphanumeric

Replace all non Alpha Numeric characters, New Lines, and multiple White Space with one Space HTML5 form validation pattern alphanumeric with spaces? Regular expression to allow spaces between words Regex for checking if a string is strictly alphanumeric Determine if char is a num or letter How to determine if a String has non-alphanumeric characters? How to remove all non-alpha numeric characters from a string in MySQL? How do I sort strings alphabetically while accounting for value when a string is numeric? How to remove leading zeros from alphanumeric text? How to strip all non-alphabetic characters from string in SQL Server?