utf8_bin
compares the bits blindly. No case folding, no accent stripping.utf8_general_ci
compares one byte with one byte. It does case folding and accent stripping, but no 2-character comparisions: ij
is not equal ?
in this collation.utf8_*_ci
is a set of language-specific rules, but otherwise like unicode_ci
. Some special cases: Ç
, C
, ch
, ll
utf8_unicode_ci
follows an old Unicode standard for comparisons. ij
=?
, but ae
!= æ
utf8_unicode_520_ci
follows an newer Unicode standard. ae
= æ
See collation chart for details on what is equal to what in various utf8 collations.
utf8
, as defined by MySQL is limited to the 1- to 3-byte utf8 codes. This leaves out Emoji and some of Chinese. So you should really switch to utf8mb4
if you want to go much beyond Europe.
The above points apply to utf8mb4
, after suitable spelling change. Going forward, utf8mb4
and utf8mb4_unicode_520_ci
are preferred.