[sed] Remove non-ASCII characters from CSV

I want to remove all the non-ASCII characters from a file in place.

I found one solution with tr, but I guess I need to write back that file after modification.

I need to do it in place with relatively good performance.

Any suggestions?

This question is related to sed awk

The answer is


sed -i 's/[^[:print:]]//' FILENAME

Also, this acts like dos2unix


A perl oneliner would do: perl -i.bak -pe 's/[^[:ascii:]]//g' <your file>

-i says that the file is going to be edited inplace, and the backup is going to be saved with extension .bak.


I appreciate the tips I found on this site.

But, on my Windows 10, I had to use double quotes for this to work ...

sed -i "s/[\d128-\d255]//g" FILENAME

Noticed these things ...

  1. For FILENAME the entire path\name needs to be quoted This didn't work -- %TEMP%\"FILENAME" This did -- %TEMP%\FILENAME"

  2. sed leaves behind temp files in the current directory, named sed*


awk '{ sub("[^a-zA-Z0-9\"!@#$%^&*|_\[](){}", ""); print }' MYinputfile.txt > pipe_out_to_CONVERTED_FILE.txt

As an alternative to sed or perl you may consider to use ed(1) and POSIX character classes.

Note: ed(1) reads the entire file into memory to edit it in-place, so for really large files you should use sed -i ..., perl -i ...

# see:
# - http://wiki.bash-hackers.org/doku.php?id=howto:edit-ed
# - http://en.wikipedia.org/wiki/Regular_expression#POSIX_character_classes

# test
echo $'aaa \177 bbb \200 \214 ccc \254 ddd\r\n' > testfile
ed -s testfile <<< $',l' 
ed -s testfile <<< $'H\ng/[^[:graph:][:space:][:cntrl:]]/s///g\nwq'
ed -s testfile <<< $',l'

I tried all the solutions and nothing worked. The following, however, does:

tr -cd '\11\12\15\40-\176'

Which I found here:

https://alvinalexander.com/blog/post/linux-unix/how-remove-non-printable-ascii-characters-file-unix

My problem needed it in a series of piped programs, not directly from a file, so modify as needed.


I'm using a very minimal busybox system, in which there is no support for ranges in tr or POSIX character classes, so I have to do it the crappy old-fashioned way. Here's the solution with sed, stripping ALL non-printable non-ASCII characters from the file:

sed -i 's/[^a-zA-Z 0-9`~!@#$%^&*()_+\[\]\\{}|;'\'':",.\/<>?]//g' FILE

Try tr instead of sed

tr -cd '[:print:]' < file.txt

This worked for me:

sed -i 's/[^[:print:]]//g'

# -i (inplace)

LANG=C sed -i -E "s|[\d128-\d255]||g" /path/to/file(s)

The LANG=C part's role is to avoid a Invalid collation character error.

Based on Ivan's answer and Patrick's comment.