utf8
is MySQL's older, flawed implementation of UTF-8 which is in the process of being deprecated.utf8mb4
is what they named their fixed UTF-8 implementation, and is what you should use right now.In their flawed version, only characters in the first 64k character plane - the basic multilingual plane - work, with other characters considered invalid. The code point values within that plane - 0 to 65535 (some of which are reserved for special reasons) can be represented by multi-byte encodings in UTF-8 of up to 3 bytes, and MySQL's early version of UTF-8 arbitrarily decided to set that as a limit. At no point was this limitation a correct interpretation of the UTF-8 rules, because at no point was UTF-8 defined as only allowing up to 3 bytes per character. In fact, the earliest definitions of UTF-8 defined it as having up to 6 bytes (since revised to 4). MySQL's original version was always arbitrarily crippled.
Back when MySQL released this, the consequences of this limitation weren't too bad as most Unicode characters were in that first plane. Since then, more and more newly defined character ranges have been added to Unicode with values outside that first plane. Unicode itself defines 17 planes, though so far only 7 of these are used.
In an effort not to break old code making any particular assumptions, MySQL retained the broken implementation and called the newer, fixed version utf8mb4
. This has led to some confusion with the name being misinterpreted as if it's some kind of extension to UTF-8 or alternative form of UTF-8, rather than MySQL's implementation of the true UTF-8.
Future versions of MySQL will eventually phase out the older version, and for now it can be considered deprecated. For the foreseeable future you need to use utf8mb4
to ensure correct UTF-8 encoding. After sufficient time has passed, the current utf8
will be removed, and at some future date utf8
will rise again, this time referring to the fixed version, though utf8mb4
will continue to unambiguously refer to the fixed version.