What is the best collation to use for MySQL with PHP

Question

I m wondering if there is a  best  choice for collation in MySQL for a general website where you aren t 100  sure of what will be entered  I understand that all the encodings should be the same  such as MySQL  Apache  the HTML and anything inside PHP   In the past I have set PHP to output in  UTF-8   but which collation does this match in MySQL  I m thinking it s one of the UTF-8 ones  but I have used utf8 unicode ci  utf8 general ci  and utf8 bin before

User · Answer

In your database upload file  add the followin line before any line   SET NAMES utf8    And your problem should be solved

User · Answer

The accepted answer fairly definitively suggests using utf8 unicode ci  and whilst for new projects that s great  I wanted to relate my recent contrary experience just in case it saves anyone some time   Because utf8 general ci is the default collation for Unicode in MySQL  if you want to use utf8 unicode ci then you end up having to specify it in a lot of places   For example  all client connections not only have a default charset  makes sense to me  but also a default collation  i e  the collation will always default to utf8 general ci for unicode    Likely  if you use utf8 unicode ci for your fields  your scripts that connect to the database will need to be updated to mention the desired collation explicitly -- otherwise queries using text strings can fail when your connection is using the default collation   The upshot is that when converting an existing system of any size to Unicode utf8  you may end up being forced to use utf8 general ci because of the way MySQL handles defaults

User · Answer

For UTF-8 textual information  you should use utf8 general ci because      utf8 bin  compare strings by the binary value of each character in the string utf8 general ci  compare strings using general language rules and using case-insensitive comparisons   a k a  it will should making searching and indexing the data faster more efficient more useful

User · Answer

The main difference is sorting accuracy  when comparing characters in the language  and performance  The only special one is utf8 bin which is for comparing characters in binary format   utf8 general ci is somewhat faster than utf8 unicode ci  but less accurate  for sorting   The specific language utf8 encoding  such as utf8 swedish ci  contain additional language rules that make them the most accurate to sort for those languages  Most of the time I use utf8 unicode ci  I prefer accuracy to small performance improvements   unless I have a good reason to prefer a specific language   You can read more on specific unicode character sets on the MySQL manual - http   dev mysql com doc refman 5 0 en charset-unicode-sets html

User · Answer

Be very  very aware of this problem that can occur when using utf8 general ci   MySQL will not distinguish between some characters in select statements  if the utf8 general ci collation is used  This can lead to very nasty bugs - especially for example  where usernames are involved  Depending on the implementation that uses the database tables  this problem could allow malicious users to create a username matching an administrator account   This problem exposes itself at the very least in early 5 x versions - I m not sure if this behaviour as changed later   I m no DBA  but to avoid this problem  I always go with utf8-bin instead of a case-insensitive one   The script below describes the problem by example   -- first  create a sandbox to play in CREATE DATABASE  sandbox   use  sandbox    -- next  make sure that your client connection is of the same  -- character collate type as the one we re going to test next  charset utf8 collate utf8 general ci  -- now  create the table and fill it with values CREATE TABLE  test    key  VARCHAR 16    value  VARCHAR 16        CHARACTER SET utf8 COLLATE utf8 general ci   INSERT INTO  test  VALUES   Key ONE    value      Key TWO    val  e     --  verify  SELECT   FROM  test    -- now  expose the problem bug  SELECT   FROM test WHERE  value     value    -- -- Note that we get BOTH keys here  MySQLs UTF8 collates that are  -- case insensitive  ending with  ci  do not distinguish between  -- both values  -- -- collate  utf8 bin  doesn t have this problem  as I ll show next  --  -- first  reset the client connection charset collate type charset utf8 collate utf8 bin  -- next  convert the values that we ve previously inserted in the table ALTER TABLE  test  CONVERT TO CHARACTER SET utf8 COLLATE utf8 bin   -- now  re-check for the bug SELECT   FROM test WHERE  value     value    -- -- Note that we get just one key now  as you d expect  -- -- This problem appears to be specific to utf8  Next  I ll try to  -- do the same with the  latin1  charset  --  -- first  reset the client connection charset collate type charset latin1 collate latin1 general ci  -- next  convert the values that we ve previously inserted -- in the table ALTER TABLE  test  CONVERT TO CHARACTER SET latin1 COLLATE latin1 general ci   -- now  re-check for the bug SELECT   FROM test WHERE  value     value    -- -- Again  only one key is returned  expected   This shows  -- that the problem with utf8 utf8 generic ci isn t present  -- in latin1 latin1 general ci -- -- To complete the example  I ll check with the binary collate -- of latin1 as well   -- first  reset the client connection charset collate type charset latin1 collate latin1 bin  -- next  convert the values that we ve previously inserted in the table ALTER TABLE  test  CONVERT TO CHARACTER SET latin1 COLLATE latin1 bin   -- now  re-check for the bug SELECT   FROM test WHERE  value     value    -- -- Again  only one key is returned  expected   -- -- Finally  I ll re-introduce the problem in the exact same  -- way  for any sceptics out there    -- first  reset the client connection charset collate type charset utf8 collate utf8 generic ci  -- next  convert the values that we ve previously inserted in the table ALTER TABLE  test  CONVERT TO CHARACTER SET utf8 COLLATE utf8 general ci   -- now  re-check for the problem bug SELECT   FROM test WHERE  value     value    -- -- Two keys  --  DROP DATABASE sandbox

User · Answer

Essentially  it depends on how you think of a string   I always use utf8 bin because of the problem highlighted by Guus   In my opinion  as far as the database should be concerned  a string is still just a string   A string is a number of UTF-8 characters   A character has a binary representation so why does it need to know the language you re using   Usually  people will be constructing databases for systems with the scope for multilingual sites   This is the whole point of using UTF-8 as a character set   I m a bit of a pureist but I think the bug risks heavily outweigh the slight advantage you may get on indexing   Any language related rules should be done at a much higher level than the DBMS   In my books  value  should never in a million years be equal to  val  e    If I want to store a text field and do a case insensitive search  I will use MYSQL string functions with PHP functions such as LOWER   and the php function strtolower

User · Answer

Actually  you probably want to use utf8 unicode ci or utf8 general ci    utf8 general ci sorts by stripping away all accents and sorting as if it were ASCII utf8 unicode ci uses the Unicode sort order  so it sorts correctly in more languages   However  if you are only using this to store English text  these shouldn t differ

User · Answer

Collations affect how data is sorted and how strings are compared to each other  That means you should use the collation that most of your users expect   Example from the documentation for charset unicode      utf8 general ci also is satisfactory   for both German and French  except   that          is equal to    s     and not to      ss     If this is acceptable for your   application  then you should use   utf8 general ci because it is faster    Otherwise  use utf8 unicode ci because   it is more accurate    So - it depends on your expected user base and on how much you need correct sorting  For an English user base  utf8 general ci should suffice  for other languages  like Swedish  special collations have been created

User · Answer

I found these collation charts helpful  http   collation-charts org mysql60   I m no sure which is the used utf8 general ci though   For example here is the chart for utf8 swedish ci  It shows which characters it interprets as the same  http   collation-charts org mysql60 mysql604 utf8 swedish ci html

User · Answer

Collations affect how data is sorted and how strings are compared to each other  That means you should use the collation that most of your users expect   Example from the documentation for charset unicode      utf8 general ci also is satisfactory   for both German and French  except   that          is equal to    s     and not to      ss     If this is acceptable for your   application  then you should use   utf8 general ci because it is faster    Otherwise  use utf8 unicode ci because   it is more accurate    So - it depends on your expected user base and on how much you need correct sorting  For an English user base  utf8 general ci should suffice  for other languages  like Swedish  special collations have been created

User · Answer

For UTF-8 textual information  you should use utf8 general ci because      utf8 bin  compare strings by the binary value of each character in the string utf8 general ci  compare strings using general language rules and using case-insensitive comparisons   a k a  it will should making searching and indexing the data faster more efficient more useful

User · Answer

Be very  very aware of this problem that can occur when using utf8 general ci   MySQL will not distinguish between some characters in select statements  if the utf8 general ci collation is used  This can lead to very nasty bugs - especially for example  where usernames are involved  Depending on the implementation that uses the database tables  this problem could allow malicious users to create a username matching an administrator account   This problem exposes itself at the very least in early 5 x versions - I m not sure if this behaviour as changed later   I m no DBA  but to avoid this problem  I always go with utf8-bin instead of a case-insensitive one   The script below describes the problem by example   -- first  create a sandbox to play in CREATE DATABASE  sandbox   use  sandbox    -- next  make sure that your client connection is of the same  -- character collate type as the one we re going to test next  charset utf8 collate utf8 general ci  -- now  create the table and fill it with values CREATE TABLE  test    key  VARCHAR 16    value  VARCHAR 16        CHARACTER SET utf8 COLLATE utf8 general ci   INSERT INTO  test  VALUES   Key ONE    value      Key TWO    val  e     --  verify  SELECT   FROM  test    -- now  expose the problem bug  SELECT   FROM test WHERE  value     value    -- -- Note that we get BOTH keys here  MySQLs UTF8 collates that are  -- case insensitive  ending with  ci  do not distinguish between  -- both values  -- -- collate  utf8 bin  doesn t have this problem  as I ll show next  --  -- first  reset the client connection charset collate type charset utf8 collate utf8 bin  -- next  convert the values that we ve previously inserted in the table ALTER TABLE  test  CONVERT TO CHARACTER SET utf8 COLLATE utf8 bin   -- now  re-check for the bug SELECT   FROM test WHERE  value     value    -- -- Note that we get just one key now  as you d expect  -- -- This problem appears to be specific to utf8  Next  I ll try to  -- do the same with the  latin1  charset  --  -- first  reset the client connection charset collate type charset latin1 collate latin1 general ci  -- next  convert the values that we ve previously inserted -- in the table ALTER TABLE  test  CONVERT TO CHARACTER SET latin1 COLLATE latin1 general ci   -- now  re-check for the bug SELECT   FROM test WHERE  value     value    -- -- Again  only one key is returned  expected   This shows  -- that the problem with utf8 utf8 generic ci isn t present  -- in latin1 latin1 general ci -- -- To complete the example  I ll check with the binary collate -- of latin1 as well   -- first  reset the client connection charset collate type charset latin1 collate latin1 bin  -- next  convert the values that we ve previously inserted in the table ALTER TABLE  test  CONVERT TO CHARACTER SET latin1 COLLATE latin1 bin   -- now  re-check for the bug SELECT   FROM test WHERE  value     value    -- -- Again  only one key is returned  expected   -- -- Finally  I ll re-introduce the problem in the exact same  -- way  for any sceptics out there    -- first  reset the client connection charset collate type charset utf8 collate utf8 generic ci  -- next  convert the values that we ve previously inserted in the table ALTER TABLE  test  CONVERT TO CHARACTER SET utf8 COLLATE utf8 general ci   -- now  re-check for the problem bug SELECT   FROM test WHERE  value     value    -- -- Two keys  --  DROP DATABASE sandbox

User · Answer

The accepted answer fairly definitively suggests using utf8 unicode ci  and whilst for new projects that s great  I wanted to relate my recent contrary experience just in case it saves anyone some time   Because utf8 general ci is the default collation for Unicode in MySQL  if you want to use utf8 unicode ci then you end up having to specify it in a lot of places   For example  all client connections not only have a default charset  makes sense to me  but also a default collation  i e  the collation will always default to utf8 general ci for unicode    Likely  if you use utf8 unicode ci for your fields  your scripts that connect to the database will need to be updated to mention the desired collation explicitly -- otherwise queries using text strings can fail when your connection is using the default collation   The upshot is that when converting an existing system of any size to Unicode utf8  you may end up being forced to use utf8 general ci because of the way MySQL handles defaults

User · Answer

The main difference is sorting accuracy  when comparing characters in the language  and performance  The only special one is utf8 bin which is for comparing characters in binary format   utf8 general ci is somewhat faster than utf8 unicode ci  but less accurate  for sorting   The specific language utf8 encoding  such as utf8 swedish ci  contain additional language rules that make them the most accurate to sort for those languages  Most of the time I use utf8 unicode ci  I prefer accuracy to small performance improvements   unless I have a good reason to prefer a specific language   You can read more on specific unicode character sets on the MySQL manual - http   dev mysql com doc refman 5 0 en charset-unicode-sets html

User · Answer

For UTF-8 textual information  you should use utf8 general ci because      utf8 bin  compare strings by the binary value of each character in the string utf8 general ci  compare strings using general language rules and using case-insensitive comparisons   a k a  it will should making searching and indexing the data faster more efficient more useful

User · Answer

Collations affect how data is sorted and how strings are compared to each other  That means you should use the collation that most of your users expect   Example from the documentation for charset unicode      utf8 general ci also is satisfactory   for both German and French  except   that          is equal to    s     and not to      ss     If this is acceptable for your   application  then you should use   utf8 general ci because it is faster    Otherwise  use utf8 unicode ci because   it is more accurate    So - it depends on your expected user base and on how much you need correct sorting  For an English user base  utf8 general ci should suffice  for other languages  like Swedish  special collations have been created

User · Answer

Collations affect how data is sorted and how strings are compared to each other  That means you should use the collation that most of your users expect   Example from the documentation for charset unicode      utf8 general ci also is satisfactory   for both German and French  except   that          is equal to    s     and not to      ss     If this is acceptable for your   application  then you should use   utf8 general ci because it is faster    Otherwise  use utf8 unicode ci because   it is more accurate    So - it depends on your expected user base and on how much you need correct sorting  For an English user base  utf8 general ci should suffice  for other languages  like Swedish  special collations have been created

User · Answer

Actually  you probably want to use utf8 unicode ci or utf8 general ci    utf8 general ci sorts by stripping away all accents and sorting as if it were ASCII utf8 unicode ci uses the Unicode sort order  so it sorts correctly in more languages   However  if you are only using this to store English text  these shouldn t differ

User · Answer

I found these collation charts helpful  http   collation-charts org mysql60   I m no sure which is the used utf8 general ci though   For example here is the chart for utf8 swedish ci  It shows which characters it interprets as the same  http   collation-charts org mysql60 mysql604 utf8 swedish ci html

User · Answer

Essentially  it depends on how you think of a string   I always use utf8 bin because of the problem highlighted by Guus   In my opinion  as far as the database should be concerned  a string is still just a string   A string is a number of UTF-8 characters   A character has a binary representation so why does it need to know the language you re using   Usually  people will be constructing databases for systems with the scope for multilingual sites   This is the whole point of using UTF-8 as a character set   I m a bit of a pureist but I think the bug risks heavily outweigh the slight advantage you may get on indexing   Any language related rules should be done at a much higher level than the DBMS   In my books  value  should never in a million years be equal to  val  e    If I want to store a text field and do a case insensitive search  I will use MYSQL string functions with PHP functions such as LOWER   and the php function strtolower

User · Answer

For UTF-8 textual information  you should use utf8 general ci because      utf8 bin  compare strings by the binary value of each character in the string utf8 general ci  compare strings using general language rules and using case-insensitive comparisons   a k a  it will should making searching and indexing the data faster more efficient more useful

User · Answer

Actually  you probably want to use utf8 unicode ci or utf8 general ci    utf8 general ci sorts by stripping away all accents and sorting as if it were ASCII utf8 unicode ci uses the Unicode sort order  so it sorts correctly in more languages   However  if you are only using this to store English text  these shouldn t differ

User · Answer

For the case highlighted by Guus  I would strongly suggest using either utf8 unicode cs  case sensitive  strict matching  ordering correctly for the most part  instead of utf8 bin  strict matching  incorrect ordering    If the field is intended to be searched  as opposed to matched for a user  then use utf8 general ci or utf8 unicode ci  Both are case-insensitive  one will losely match           is equal to    s     and not to    ss      There are also language specific versions  like utf8 german ci where the lose matching is more suitable for the language specified    Edit - nearly 6 years later   I no longer recommend the  utf8  character set on MySQL  and instead recommend the  utf8mb4  character set  They match almost entirely  but allow for a little  lot  more unicode characters   Realistically  MySQL should have updated the  utf8  character set and respective collations to match the  utf8  specification  but instead  a separate character set and respective collations as to not impact storage designation for those already using their incomplete  utf8  character set

User · Answer

Actually  you probably want to use utf8 unicode ci or utf8 general ci    utf8 general ci sorts by stripping away all accents and sorting as if it were ASCII utf8 unicode ci uses the Unicode sort order  so it sorts correctly in more languages   However  if you are only using this to store English text  these shouldn t differ

User · Answer

The main difference is sorting accuracy  when comparing characters in the language  and performance  The only special one is utf8 bin which is for comparing characters in binary format   utf8 general ci is somewhat faster than utf8 unicode ci  but less accurate  for sorting   The specific language utf8 encoding  such as utf8 swedish ci  contain additional language rules that make them the most accurate to sort for those languages  Most of the time I use utf8 unicode ci  I prefer accuracy to small performance improvements   unless I have a good reason to prefer a specific language   You can read more on specific unicode character sets on the MySQL manual - http   dev mysql com doc refman 5 0 en charset-unicode-sets html

User · Answer

It is best to use character set utf8mb4 with the collation utf8mb4 unicode ci   The character set  utf8  only supports a small amount of UTF-8 code points  about 6  of possible characters  utf8 only supports the Basic Multilingual Plane  BMP   There 16 other planes  Each plane contains 65 536 characters  utf8mb4 supports all 17 planes   MySQL will truncate 4 byte UTF-8 characters resulting in corrupted data   The utf8mb4 character set was introduced in MySQL 5 5 3 on 2010-03-24   Some of the required changes to use the new character set are not trivial    Changes may need to be made in your application database adapter  Changes will need to be made to my cnf  including setting the character set  the collation and switching innodb file format to Barracuda SQL CREATE statements may need to include  ROW FORMAT DYNAMIC  DYNAMIC is required for indexes on VARCHAR 192  and larger     NOTE  Switching to Barracuda from Antelope  may require restarting the MySQL service more than once  innodb file format max does not change until after the MySQL service has been restarted to  innodb file format   barracuda   MySQL uses the old Antelope InnoDB file format  Barracuda supports dynamic row formats  which you will need if you do not want to hit the SQL errors for creating indexes and keys after you switch to the charset  utf8mb4    1709 - Index column size too large  The maximum column size is 767 bytes    1071 - Specified key was too long  max key length is 767 bytes   The following scenario has been tested on MySQL 5 6 17  By default  MySQL is configured like this   SHOW VARIABLES   innodb large prefix   OFF innodb file format   Antelope   Stop your MySQL service and add the options to your existing my cnf    client  default-character-set  utf8mb4   mysqld  explicit defaults for timestamp   true innodb large prefix   true innodb file format   barracuda innodb file format max   barracuda innodb file per table   true    Character collation character set server utf8mb4 collation server utf8mb4 unicode ci   Example SQL CREATE statement   CREATE TABLE Contacts    id INT AUTO INCREMENT NOT NULL   ownerId INT DEFAULT NULL   created timestamp NOT NULL DEFAULT  0000-00-00 00 00 00    modified timestamp NOT NULL DEFAULT CURRENT TIMESTAMP ON UPDATE CURRENT TIMESTAMP   contact VARCHAR 640  NOT NULL   prefix VARCHAR 128  NOT NULL   first VARCHAR 128  NOT NULL   middle VARCHAR 128  NOT NULL   last VARCHAR 128  NOT NULL   suffix VARCHAR 128  NOT NULL   notes MEDIUMTEXT NOT NULL   INDEX IDX CA367725E05EFD25  ownerId    INDEX created  created    INDEX modified idx  modified    INDEX contact idx  contact    PRIMARY KEY id    DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4 unicode ci ENGINE   InnoDB ROW FORMAT DYNAMIC     You can see error  1709 generated for INDEX contact idx  contact  if ROW FORMAT DYNAMIC is removed from the CREATE statement    NOTE  Changing the index to limit to the first 128 characters on contacteliminates the requirement for using Barracuda with ROW FORMAT DYNAMIC   INDEX contact idx  contact 128      Also note  when it says the size of the field is VARCHAR 128   that is not 128 bytes  You can use have 128  4 byte characters or 128  1 byte characters   This INSERT statement should contain the 4 byte  poo  character in the 2 row   INSERT INTO  Contacts    id    ownerId    created    modified    contact    prefix    first    middle    last    suffix    notes   VALUES  1  NULL   0000-00-00 00 00 00    2014-08-25 03 00 36    1234567890    12345678901234567890    1234567890123456789012345678901234567890    1234567890123456789012345678901234567890    12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678             2  NULL   0000-00-00 00 00 00    2014-08-25 03 05 57    poo    12345678901234567890                         3  NULL   0000-00-00 00 00 00    2014-08-25 03 05 57    poo    12345678901234567890            123              You can see the amount of space used by the last column   mysql gt  SELECT BIT LENGTH  last    CHAR LENGTH  last   FROM  Contacts    -------------------- ---------------------    BIT LENGTH  last     CHAR LENGTH  last      -------------------- ---------------------                  1024                   128   -- All characters are ASCII                 4096                   128   -- All characters are 4 bytes                 4024                   128   -- 3 characters are ASCII  125 are 4 bytes  -------------------- ---------------------    In your database adapter  you may want to set the charset and collation for your connection   SET NAMES  utf8mb4  COLLATE  utf8mb4 unicode ci    In PHP  this would be set for   PDO  MYSQL ATTR INIT COMMAND  References    Mysql 5 6 Reference Manual  Limits on InnoDB Tables How to support full Unicode in MySQL databases

User · Answer

It is best to use character set utf8mb4 with the collation utf8mb4 unicode ci   The character set  utf8  only supports a small amount of UTF-8 code points  about 6  of possible characters  utf8 only supports the Basic Multilingual Plane  BMP   There 16 other planes  Each plane contains 65 536 characters  utf8mb4 supports all 17 planes   MySQL will truncate 4 byte UTF-8 characters resulting in corrupted data   The utf8mb4 character set was introduced in MySQL 5 5 3 on 2010-03-24   Some of the required changes to use the new character set are not trivial    Changes may need to be made in your application database adapter  Changes will need to be made to my cnf  including setting the character set  the collation and switching innodb file format to Barracuda SQL CREATE statements may need to include  ROW FORMAT DYNAMIC  DYNAMIC is required for indexes on VARCHAR 192  and larger     NOTE  Switching to Barracuda from Antelope  may require restarting the MySQL service more than once  innodb file format max does not change until after the MySQL service has been restarted to  innodb file format   barracuda   MySQL uses the old Antelope InnoDB file format  Barracuda supports dynamic row formats  which you will need if you do not want to hit the SQL errors for creating indexes and keys after you switch to the charset  utf8mb4    1709 - Index column size too large  The maximum column size is 767 bytes    1071 - Specified key was too long  max key length is 767 bytes   The following scenario has been tested on MySQL 5 6 17  By default  MySQL is configured like this   SHOW VARIABLES   innodb large prefix   OFF innodb file format   Antelope   Stop your MySQL service and add the options to your existing my cnf    client  default-character-set  utf8mb4   mysqld  explicit defaults for timestamp   true innodb large prefix   true innodb file format   barracuda innodb file format max   barracuda innodb file per table   true    Character collation character set server utf8mb4 collation server utf8mb4 unicode ci   Example SQL CREATE statement   CREATE TABLE Contacts    id INT AUTO INCREMENT NOT NULL   ownerId INT DEFAULT NULL   created timestamp NOT NULL DEFAULT  0000-00-00 00 00 00    modified timestamp NOT NULL DEFAULT CURRENT TIMESTAMP ON UPDATE CURRENT TIMESTAMP   contact VARCHAR 640  NOT NULL   prefix VARCHAR 128  NOT NULL   first VARCHAR 128  NOT NULL   middle VARCHAR 128  NOT NULL   last VARCHAR 128  NOT NULL   suffix VARCHAR 128  NOT NULL   notes MEDIUMTEXT NOT NULL   INDEX IDX CA367725E05EFD25  ownerId    INDEX created  created    INDEX modified idx  modified    INDEX contact idx  contact    PRIMARY KEY id    DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4 unicode ci ENGINE   InnoDB ROW FORMAT DYNAMIC     You can see error  1709 generated for INDEX contact idx  contact  if ROW FORMAT DYNAMIC is removed from the CREATE statement    NOTE  Changing the index to limit to the first 128 characters on contacteliminates the requirement for using Barracuda with ROW FORMAT DYNAMIC   INDEX contact idx  contact 128      Also note  when it says the size of the field is VARCHAR 128   that is not 128 bytes  You can use have 128  4 byte characters or 128  1 byte characters   This INSERT statement should contain the 4 byte  poo  character in the 2 row   INSERT INTO  Contacts    id    ownerId    created    modified    contact    prefix    first    middle    last    suffix    notes   VALUES  1  NULL   0000-00-00 00 00 00    2014-08-25 03 00 36    1234567890    12345678901234567890    1234567890123456789012345678901234567890    1234567890123456789012345678901234567890    12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678             2  NULL   0000-00-00 00 00 00    2014-08-25 03 05 57    poo    12345678901234567890                         3  NULL   0000-00-00 00 00 00    2014-08-25 03 05 57    poo    12345678901234567890            123              You can see the amount of space used by the last column   mysql gt  SELECT BIT LENGTH  last    CHAR LENGTH  last   FROM  Contacts    -------------------- ---------------------    BIT LENGTH  last     CHAR LENGTH  last      -------------------- ---------------------                  1024                   128   -- All characters are ASCII                 4096                   128   -- All characters are 4 bytes                 4024                   128   -- 3 characters are ASCII  125 are 4 bytes  -------------------- ---------------------    In your database adapter  you may want to set the charset and collation for your connection   SET NAMES  utf8mb4  COLLATE  utf8mb4 unicode ci    In PHP  this would be set for   PDO  MYSQL ATTR INIT COMMAND  References    Mysql 5 6 Reference Manual  Limits on InnoDB Tables How to support full Unicode in MySQL databases

User · Answer

In your database upload file  add the followin line before any line   SET NAMES utf8    And your problem should be solved

User · Answer

The main difference is sorting accuracy  when comparing characters in the language  and performance  The only special one is utf8 bin which is for comparing characters in binary format   utf8 general ci is somewhat faster than utf8 unicode ci  but less accurate  for sorting   The specific language utf8 encoding  such as utf8 swedish ci  contain additional language rules that make them the most accurate to sort for those languages  Most of the time I use utf8 unicode ci  I prefer accuracy to small performance improvements   unless I have a good reason to prefer a specific language   You can read more on specific unicode character sets on the MySQL manual - http   dev mysql com doc refman 5 0 en charset-unicode-sets html

User · Answer

For the case highlighted by Guus  I would strongly suggest using either utf8 unicode cs  case sensitive  strict matching  ordering correctly for the most part  instead of utf8 bin  strict matching  incorrect ordering    If the field is intended to be searched  as opposed to matched for a user  then use utf8 general ci or utf8 unicode ci  Both are case-insensitive  one will losely match           is equal to    s     and not to    ss      There are also language specific versions  like utf8 german ci where the lose matching is more suitable for the language specified    Edit - nearly 6 years later   I no longer recommend the  utf8  character set on MySQL  and instead recommend the  utf8mb4  character set  They match almost entirely  but allow for a little  lot  more unicode characters   Realistically  MySQL should have updated the  utf8  character set and respective collations to match the  utf8  specification  but instead  a separate character set and respective collations as to not impact storage designation for those already using their incomplete  utf8  character set

[php] What is the best collation to use for MySQL with PHP?

Examples related to php

Examples related to mysql

Examples related to encoding

Examples related to collation