How to change the CHARACTER SET and COLLATION throughout a database

Question

Our previous programmer set the wrong collation in a table  Mysql   He set it up with Latin collation  when it should be UTF8  and now I have issues  Every record with Chinese and Japan character turn to     character   Is possible to change collation and get back the detail of character

User · Accepted Answer

change database collation  ALTER DATABASE  lt database name gt  CHARACTER SET utf8mb4 COLLATE utf8mb4 0900 ai ci   change table collation  ALTER TABLE  lt table name gt  CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4 0900 ai ci   change column collation  ALTER TABLE  lt table name gt  MODIFY  lt column name gt  VARCHAR 255  CHARACTER SET utf8mb4 COLLATE utf8mb4 0900 ai ci    What do the parts of utf8mb4 0900 ai ci mean  3 bytes -- utf8 4 bytes -- utf8mb4  new   v4 0 --    unicode  v5 20 --   unicode 520  v9 0 --    0900   new    bin      -- just compare the bits  don t consider case folding  accents  etc  ci       -- explicitly case insensitive  A a  and implicitly accent insensitive  a      ai ci    -- explicitly case insensitive and accent insensitive  as  etc  -- accent-sensitive  etc    bin         -- simple  fast  general ci  -- fails to compare multiple letters  eg ss     somewhat fast              -- slower  0900        --  8 0  much faster because of a rewrite  More info   What are the differences between utf8 general ci and utf8 unicode ci  What  39 s the difference between utf8 general ci and utf8 unicode ci  How to change collation of database  table  column  What  39 s the difference between utf8 general ci and utf8 unicode ci

User · Answer

Heres how to change all databases tables columns  Run these queries and they will output all of the subsequent queries necessary to convert your entire schema to utf8  Hope this helps   -- Change DATABASE Default Collation  SELECT DISTINCT concat  ALTER DATABASE     TABLE SCHEMA     CHARACTER SET utf8 COLLATE utf8 unicode ci    from information schema tables where TABLE SCHEMA like   database name     -- Change TABLE Collation   Char Set  SELECT concat  ALTER TABLE     TABLE SCHEMA         table name     CHARACTER SET utf8 COLLATE utf8 unicode ci    from information schema tables where TABLE SCHEMA like  database name     -- Change COLUMN Collation   Char Set  SELECT concat  ALTER TABLE     t1 TABLE SCHEMA         t1 table name     MODIFY     t1 column name        t1 data type         t1 CHARACTER MAXIMUM LENGTH           CHARACTER SET utf8 COLLATE utf8 unicode ci    from information schema columns t1 where t1 TABLE SCHEMA like  database name  and t1 COLLATION NAME    old charset name

User · Answer

Beware that in Mysql  the utf8 character set is only a subset of the real UTF8 character set  In order to save one byte of storage  the Mysql team decided to store only three bytes of a UTF8 characters instead of the full four-bytes  That means that some east asian language and emoji aren t fully supported  To make sure you can store all UTF8 characters  use the utf8mb4 data type  and utf8mb4 bin or utf8mb4 general ci in Mysql

User · Answer

Adding to what David Whittaker posted  I have created a query that generates the complete table and columns alter statement that will convert each table  It may be a good idea to run  SET SESSION group concat max len   100000   first to make sure your group concat doesn t go over the very small limit as seen here        SELECT a table name  concat  ALTER TABLE    a table schema       a table name    DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4 unicode ci             group concat distinct concat   MODIFY     column name       column type    CHARACTER SET utf8mb4 COLLATE utf8mb4 unicode ci    if  is nullable    NO     NOT          NULL            if  COLUMN DEFAULT is not null  CONCAT   DEFAULT      COLUMN DEFAULT              if  EXTRA        CONCAT      EXTRA                as alter statement     FROM information schema columns a     INNER JOIN INFORMATION SCHEMA TABLES b ON a TABLE CATALOG   b TABLE CATALOG         AND a TABLE SCHEMA   b TABLE SCHEMA         AND a TABLE NAME   b TABLE NAME         AND b table type     view      WHERE a table schema     and  collation name    latin1 swedish ci  or collation name    utf8mb4 general ci       GROUP BY table name    A difference here between the previous answer is it was using utf8 instead of ut8mb4 and using t1 data type with t1 CHARACTER MAXIMUM LENGTH didn t work for enums  Also  my query excludes views since those will have to altered separately   I simply used a Perl script to return all these alters as an array and iterated over them  fixed the columns that were too long  generally they were varchar 256  when the data generally only had 20 characters in them so that was an easy fix    I found some data was corrupted when altering from latin1 -  utf8mb4  It appeared to be utf8 encoded latin1 characters in columns would get goofed in the conversion  I simply held data from the columns I knew was going to be an issue in memory from before and after the alter and compared them and generated update statements to fix the data

User · Answer

here describes the process well  However  some of the characters that didn t fit in latin space are gone forever  UTF-8 is a SUPERSET of latin1  Not the reverse  Most will fit in single byte space  but any undefined ones will not  check a list of latin1 - not all 256 characters are defined  depending on mysql s latin1 definition

[mysql] How to change the CHARACTER SET (and COLLATION) throughout a database?

What do the parts of `utf8mb4_0900_ai_ci` mean?

Examples related to mysql

Examples related to sql

Examples related to collation

[mysql] How to change the CHARACTER SET (and COLLATION) throughout a database?

What do the parts of utf8mb4_0900_ai_ci mean?

Examples related to mysql

Examples related to sql

Examples related to collation

What do the parts of `utf8mb4_0900_ai_ci` mean?