What is the difference between utf8mb4 and utf8 charsets in MySQL

Question

What is the difference between utf8mb4 and utf8 charsets in MySQL   I already know about ASCII  UTF-8  UTF-16 and UTF-32 encodings  but I m curious to know whats the difference of utf8mb4 group of encodings with other encoding types defined in MySQL Server   Are there any special benefits proposes of using utf8mb4 rather than utf8

User · Accepted Answer

UTF-8 is a variable-length encoding  In the case of UTF-8  this means that storing one code point requires one to four bytes  However  MySQL s encoding called  quot utf8 quot   alias of  quot utf8mb3 quot   only stores a maximum of three bytes per code point  So the character set  quot utf8 quot   quot utf8mb3 quot  cannot store all Unicode code points  it only supports the range 0x000 to 0xFFFF  which is called the  quot Basic Multilingual Plane quot   See also Comparison of Unicode encodings  This is what  a previous version of the same page at  the MySQL documentation has to say about it   The character set named utf8  utf8mb3  uses a maximum of three bytes per character and contains only BMP characters  As of MySQL 5 5 3  the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters   For a BMP character  utf8  utf8mb3  and utf8mb4 have identical storage characteristics  same code values  same encoding  same length   For a supplementary character  utf8  utf8mb3  cannot store the character at all  while utf8mb4 requires four bytes to store it  Since utf8  utf8mb3  cannot store the character at all  you do not have any supplementary characters in utf8  utf8mb3  columns and you need not worry about converting characters or losing data when upgrading utf8  utf8mb3  data from older versions of MySQL     So if you want your column to support storing characters lying outside the BMP  and you usually want to   such as emoji  use  quot utf8mb4 quot   See also What are the most common non-BMP Unicode characters in actual use

User · Answer

The utf8mb4 character set is useful because nowadays we need support for storing not only language characters but also symbols  newly introduced emojis  and so on   A nice read on How to support full Unicode in MySQL databases by Mathias Bynens can also shed some light on this

User · Answer

MySQL added this utf8mb4 code after 5 5 3  Mb4 is the most bytes 4 meaning  specifically designed to be compatible with four-byte Unicode  Fortunately  UTF8MB4 is a superset of UTF8  except that there is no need to convert the encoding to UTF8MB4  Of course  in order to save space  the general use of UTF8 is enough  The original UTF-8 format uses one to six bytes and can encode 31 characters maximum  The latest UTF-8 specification uses only one to four bytes and can encode up to 21 bits  just to represent all 17 Unicode planes  UTF8 is a character set in Mysql that supports only a maximum of three bytes of UTF-8 characters  which is the basic multi-text plane in Unicode  To save 4-byte-long UTF-8 characters in Mysql  you need to use the UTF8MB4 character set  but only 5 5  After 3 versions are supported  View version  Select version       I think that in order to get better compatibility  you should always use UTF8MB4 instead of UTF8  For char type data  UTF8MB4 consumes more space and  according to Mysql s official recommendation  uses VARCHAR instead of char  In MariaDB utf8mb4 as the default CHARSET when it not set explicitly in the server config  hence COLLATE utf8mb4 unicode ci is used  Refer MariaDB CHARSET  amp  COLLATE Click CHARSET utf8mb4 COLLATE utf8mb4 unicode ci

User · Answer

Taken from the MySQL 8 0 Reference Manual         utf8mb4  A UTF-8 encoding of the Unicode character set using one to   four bytes per character    utf8mb3  A UTF-8 encoding of the Unicode character set using one to   three bytes per character       In MySQL utf8 is currently an alias for utf8mb3 which is deprecated and will be removed in a future MySQL release  At that point utf8 will become a reference to utf8mb4   So regardless of this alias  you can consciously set yourself an utf8mb4 encoding   To complete the answer  I d like to add the  WilliamEntriken s comment below  also taken from the manual       To avoid ambiguity about the meaning of utf8  consider specifying utf8mb4 explicitly for character set references instead of utf8

User · Answer

utf8 is MySQL s older  flawed implementation of UTF-8 which is in the process of being deprecated  utf8mb4 is what they named their fixed UTF-8 implementation  and is what you should use right now   In their flawed version  only characters in the first 64k character plane - the basic multilingual plane - work  with other characters considered invalid  The code point values within that plane - 0 to 65535  some of which are reserved for special reasons  can be represented by multi-byte encodings in UTF-8 of up to 3 bytes  and MySQL s early version of UTF-8 arbitrarily decided to set that as a limit   At no point was this limitation a correct interpretation of the UTF-8 rules  because at no point was UTF-8 defined as only allowing up to 3 bytes per character   In fact  the earliest definitions of UTF-8 defined it as having up to 6 bytes  since revised to 4    MySQL s original version was always arbitrarily crippled  Back when MySQL released this  the consequences of this limitation weren t too bad as most Unicode characters were in that first plane  Since then  more and more newly defined character ranges have been added to Unicode with values outside that first plane  Unicode itself defines 17 planes  though so far only 7 of these are used  In an effort not to break old code making any particular assumptions  MySQL retained the broken implementation and called the newer  fixed version utf8mb4  This has led to some confusion with the name being misinterpreted as if it s some kind of extension to UTF-8 or alternative form of UTF-8  rather than MySQL s implementation of the true UTF-8  Future versions of MySQL will eventually phase out the older version  and for now it can be considered deprecated   For the foreseeable future you need to use utf8mb4 to ensure correct UTF-8 encoding   After sufficient time has passed  the current utf8 will be removed  and at some future date utf8 will rise again  this time referring to the fixed version  though utf8mb4 will continue to unambiguously refer to the fixed version

[mysql] What is the difference between utf8mb4 and utf8 charsets in MySQL?

Examples related to mysql

Examples related to encoding

Examples related to utf-8

Examples related to character-encoding

Examples related to utf8mb4