[mysql] How can I find non-ASCII characters in MySQL?

I'm working with a MySQL database that has some data imported from Excel. The data contains non-ASCII characters (em dashes, etc.) as well as hidden carriage returns or line feeds. Is there a way to find these records using MySQL?

This question is related to mysql character-encoding

The answer is


for this question we can also use this method :

Question from sql zoo:
Find all details of the prize won by PETER GRÜNBERG

Non-ASCII characters

ans: select*from nobel where winner like'P% GR%_%berg';


You can define ASCII as all characters that have a decimal value of 0 - 127 (0x00 - 0x7F) and find columns with non-ASCII characters using the following query

SELECT * FROM TABLE WHERE NOT HEX(COLUMN) REGEXP '^([0-7][0-9A-F])*$';

This was the most comprehensive query I could come up with.


MySQL provides comprehensive character set management that can help with this kind of problem.

SELECT whatever
  FROM tableName 
 WHERE columnToCheck <> CONVERT(columnToCheck USING ASCII)

The CONVERT(col USING charset) function turns the unconvertable characters into replacement characters. Then, the converted and unconverted text will be unequal.

See this for more discussion. https://dev.mysql.com/doc/refman/8.0/en/charset-repertoire.html

You can use any character set name you wish in place of ASCII. For example, if you want to find out which characters won't render correctly in code page 1257 (Lithuanian, Latvian, Estonian) use CONVERT(columnToCheck USING cp1257)


In Oracle we can use below.

SELECT * FROM TABLE_A WHERE ASCIISTR(COLUMN_A) <> COLUMN_A;

Based on the correct answer, but taking into account ASCII control characters as well, the solution that worked for me is this:

SELECT * FROM `table` WHERE NOT `field` REGEXP  "[\\x00-\\xFF]|^$";

It does the same thing: searches for violations of the ASCII range in a column, but lets you search for control characters too, since it uses hexadecimal notation for code points. Since there is no comparison or conversion (unlike @Ollie's answer), this should be significantly faster, too. (Especially if MySQL does early-termination on the regex query, which it definitely should.)

It also avoids returning fields that are zero-length. If you want a slightly-longer version that might perform better, you can use this instead:

SELECT * FROM `table` WHERE `field` <> "" AND NOT `field` REGEXP  "[\\x00-\\xFF]";

It does a separate check for length to avoid zero-length results, without considering them for a regex pass. Depending on the number of zero-length entries you have, this could be significantly faster.

Note that if your default character set is something bizarre where 0x00-0xFF don't map to the same values as ASCII (is there such a character set in existence anywhere?), this would return a false positive. Otherwise, enjoy!


This is probably what you're looking for:

select * from TABLE where COLUMN regexp '[^ -~]';

It should return all rows where COLUMN contains non-ASCII characters (or non-printable ASCII characters such as newline).


for this question we can also use this method :

Question from sql zoo:
Find all details of the prize won by PETER GRÜNBERG

Non-ASCII characters

ans: select*from nobel where winner like'P% GR%_%berg';


You can define ASCII as all characters that have a decimal value of 0 - 127 (0x00 - 0x7F) and find columns with non-ASCII characters using the following query

SELECT * FROM TABLE WHERE NOT HEX(COLUMN) REGEXP '^([0-7][0-9A-F])*$';

This was the most comprehensive query I could come up with.


@zende's answer was the only one that covered columns with a mix of ascii and non ascii characters, but it also had that problematic hex thing. I used this:

SELECT * FROM `table` WHERE NOT `column` REGEXP '^[ -~]+$' AND `column` !=''

One missing character from everyone's examples above is the termination character (\0). This is invisible to the MySQL console output and is not discoverable by any of the queries heretofore mentioned. The query to find it is simply:

select * from TABLE where COLUMN like '%\0%';

MySQL provides comprehensive character set management that can help with this kind of problem.

SELECT whatever
  FROM tableName 
 WHERE columnToCheck <> CONVERT(columnToCheck USING ASCII)

The CONVERT(col USING charset) function turns the unconvertable characters into replacement characters. Then, the converted and unconverted text will be unequal.

See this for more discussion. https://dev.mysql.com/doc/refman/8.0/en/charset-repertoire.html

You can use any character set name you wish in place of ASCII. For example, if you want to find out which characters won't render correctly in code page 1257 (Lithuanian, Latvian, Estonian) use CONVERT(columnToCheck USING cp1257)


Based on the correct answer, but taking into account ASCII control characters as well, the solution that worked for me is this:

SELECT * FROM `table` WHERE NOT `field` REGEXP  "[\\x00-\\xFF]|^$";

It does the same thing: searches for violations of the ASCII range in a column, but lets you search for control characters too, since it uses hexadecimal notation for code points. Since there is no comparison or conversion (unlike @Ollie's answer), this should be significantly faster, too. (Especially if MySQL does early-termination on the regex query, which it definitely should.)

It also avoids returning fields that are zero-length. If you want a slightly-longer version that might perform better, you can use this instead:

SELECT * FROM `table` WHERE `field` <> "" AND NOT `field` REGEXP  "[\\x00-\\xFF]";

It does a separate check for length to avoid zero-length results, without considering them for a regex pass. Depending on the number of zero-length entries you have, this could be significantly faster.

Note that if your default character set is something bizarre where 0x00-0xFF don't map to the same values as ASCII (is there such a character set in existence anywhere?), this would return a false positive. Otherwise, enjoy!


In Oracle we can use below.

SELECT * FROM TABLE_A WHERE ASCIISTR(COLUMN_A) <> COLUMN_A;

@zende's answer was the only one that covered columns with a mix of ascii and non ascii characters, but it also had that problematic hex thing. I used this:

SELECT * FROM `table` WHERE NOT `column` REGEXP '^[ -~]+$' AND `column` !=''

Try Using this query for searching special character records

SELECT *
FROM tableName
WHERE fieldName REGEXP '[^a-zA-Z0-9@:. \'\-`,\&]'

One missing character from everyone's examples above is the termination character (\0). This is invisible to the MySQL console output and is not discoverable by any of the queries heretofore mentioned. The query to find it is simply:

select * from TABLE where COLUMN like '%\0%';

Try Using this query for searching special character records

SELECT *
FROM tableName
WHERE fieldName REGEXP '[^a-zA-Z0-9@:. \'\-`,\&]'

This is probably what you're looking for:

select * from TABLE where COLUMN regexp '[^ -~]';

It should return all rows where COLUMN contains non-ASCII characters (or non-printable ASCII characters such as newline).