[regex] How do I remove all non-ASCII characters with regex and Notepad++?

I searched a lot, but nowhere is it written how to remove non-ASCII characters from Notepad++.

I need to know what command to write in find and replace (with picture it would be great).

  • If I want to make a white-list and bookmark all the ASCII words/lines so non-ASCII lines would be unmarked

  • If the file is quite large and can't select all the ASCII lines and just want to select the lines containing non-ASCII characters...

This question is related to regex expression notepad++ non-ascii-characters

The answer is


To remove all non-ASCII characters, you can use following replacement: [^\x00-\x7F]+

Removing non-ASCII

To highlight characters, I recommend using the Mark function in the search window: this highlights non-ASCII characters and put a bookmark in the lines containing one of them

If you want to highlight and put a bookmark on the ASCII characters instead, you can use the regex [\x00-\x7F] to do so.

Highlighting Non-ASCII

Cheers


In addition to the answer by ProGM, in case you see characters in boxes like NUL or ACK and want to get rid of them, those are ASCII control characters (0 to 31), you can find them with the following expression and remove them:

[\x00-\x1F]+

In order to remove all non-ASCII AND ASCII control characters, you should remove all characters matching this regex:

[^\x1F-\x7F]+

This expression will search for non-ASCII values:

[^\x00-\x7F]+

Tick off 'Search Mode = Regular expression', and click Find Next.

Source: Regex any ASCII character


Another good trick is to go into UTF8 mode in your editor so that you can actually see these funny characters and delete them yourself.


Click on View/Show Symbol/Show All Character - to show the [SOH] characters in the file Click on the [SOH] symbol in the file CTRL=H to bring up the replace Leave the 'Find What:' as is Change the 'Replace with:' to the character of your choosing (comma,semicolon, other...) Click 'Replace All' Done and done!


In Notepad++, if you go to menu Search ? Find characters in range ? Non-ASCII Characters (128-255) you can then step through the document to each non-ASCII character.

Be sure to tick off "Wrap around" if you want to loop in the document for all non-ASCII characters.

screenshot "Find in Range"


To keep new lines:

  1. First select a character for new line... I used #.
  2. Select replace option, extended.
  3. input \n replace with #
  4. Hit Replace All

Next:

  1. Select Replace option Regular Expression.
  2. Input this : [^\x20-\x7E]+
  3. Keep Replace With Empty
  4. Hit Replace All

Now, Select Replace option Extended and Replace # with \n

:) now, you have a clean ASCII file ;)


Another way...

  1. Install the Text FX plugin if you don't have it already
  2. Go to the TextFX menu option -> zap all non printable characters to #. It will replace all invalid chars with 3 # symbols
  3. Go to Find/Replace and look for ###. Replace it with a space.

This is nice if you can't remember the regex or don't care to look it up. But the regex mentioned by others is a nice solution as well.


Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to expression

How do I remove all non-ASCII characters with regex and Notepad++? SSRS Expression for IF, THEN ELSE Regex to get the words after matching string Regular expression to match a word or its prefix SSIS expression: convert date to string Are complex expressions possible in ng-hide / ng-show? Change some value inside the List<T> XPath - Difference between node() and text() javascript - match string against the array of regular expressions Spring cron expression for every after 30 minutes

Examples related to notepad++

How to view Plugin Manager in Notepad++ How to format JSON in notepad++ CR LF notepad++ removal How to install a Notepad++ plugin offline? Find duplicates and delete all in notepad++ How to compare two files in Notepad++ v6.6.8 Notepad++ cached files location How to indent HTML tags in Notepad++ How to change background color in the Notepad++ text editor? How do I stop Notepad++ from showing autocomplete for all words in the file

Examples related to non-ascii-characters

How do I remove all non-ASCII characters with regex and Notepad++? Remove non-ascii character in string SyntaxError of Non-ASCII character Find non-ASCII characters in varchar columns using SQL Server Replacing accented characters php (grep) Regex to match non-ASCII characters?