[mysql] UTF-8: General? Bin? Unicode?

  • utf8_bin compares the bits blindly. No case folding, no accent stripping.
  • utf8_general_ci compares one byte with one byte. It does case folding and accent stripping, but no 2-character comparisions: ij is not equal ? in this collation.
  • utf8_*_ci is a set of language-specific rules, but otherwise like unicode_ci. Some special cases: Ç, C, ch, ll
  • utf8_unicode_ci follows an old Unicode standard for comparisons. ij=?, but ae != æ
  • utf8_unicode_520_ci follows an newer Unicode standard. ae = æ

See collation chart for details on what is equal to what in various utf8 collations.

utf8, as defined by MySQL is limited to the 1- to 3-byte utf8 codes. This leaves out Emoji and some of Chinese. So you should really switch to utf8mb4 if you want to go much beyond Europe.

The above points apply to utf8mb4, after suitable spelling change. Going forward, utf8mb4 and utf8mb4_unicode_520_ci are preferred.

  • utf16 and utf32 are variants on utf8; there is virtually no use for them.
  • ucs2 is closer to "Unicode" than "utf8"; there is virtually no use for it.

Examples related to mysql

Implement specialization in ER diagram How to post query parameters with Axios? PHP with MySQL 8.0+ error: The server requested authentication method unknown to the client Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver' phpMyAdmin - Error > Incorrect format parameter? Authentication plugin 'caching_sha2_password' is not supported How to resolve Unable to load authentication plugin 'caching_sha2_password' issue Connection Java-MySql : Public Key Retrieval is not allowed How to grant all privileges to root user in MySQL 8.0 MySQL 8.0 - Client does not support authentication protocol requested by server; consider upgrading MySQL client

Examples related to utf-8

error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Changing PowerShell's default output encoding to UTF-8 'Malformed UTF-8 characters, possibly incorrectly encoded' in Laravel Encoding Error in Panda read_csv Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? what is <meta charset="utf-8">? Pandas df.to_csv("file.csv" encode="utf-8") still gives trash characters for minus sign UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) Android Studio : unmappable character for encoding UTF-8

Examples related to collation

#1273 – Unknown collation: ‘utf8mb4_unicode_520_ci’ phpmysql error - #1273 - #1273 - Unknown collation: 'utf8mb4_general_ci' How to fix a collation conflict in a SQL Server query? Cannot Resolve Collation Conflict SQL Server - Convert varchar to another collation (code page) to fix character encoding How to change the CHARACTER SET (and COLLATION) throughout a database? SQL Server default character encoding What does 'COLLATE SQL_Latin1_General_CP1_CI_AS' do? Changing SQL Server collation to case insensitive from case sensitive? Troubleshooting "Illegal mix of collations" error in mysql