[mysql] How to make MySQL handle UTF-8 properly

One of the responses to a question I asked yesterday suggested that I should make sure my database can handle UTF-8 characters correctly. How I can do this with MySQL?

This question is related to mysql utf-8

The answer is


To change the character set encoding to UTF-8 for the database itself, type the following command at the mysql> prompt. USE ALTER DATABASE.. Replace DBNAME with the database name:

ALTER DATABASE DBNAME CHARACTER SET utf8 COLLATE utf8_general_ci;

This is a duplicate of this question How to convert an entire MySQL database characterset and collation to UTF-8?


To make this 'permanent', in my.cnf:

[client]
default-character-set=utf8
[mysqld]
character-set-server = utf8

To check, go to the client and show some variables:

SHOW VARIABLES LIKE 'character_set%';

Verify that they're all utf8, except ..._filesystem, which should be binary and ..._dir, that points somewhere in the MySQL installation.


DATABASE CONNECTION TO UTF-8

$connect = mysql_connect('$localhost','$username','$password') or die(mysql_error());
mysql_set_charset('utf8',$connect);
mysql_select_db('$database_name','$connect') or die(mysql_error());

SET NAMES UTF8

This is does the trick


SET NAMES UTF8

This is does the trick


MySQL 4.1 and above has a default character set that it calls utf8 but which is actually only a subset of UTF-8 (allows only three-byte characters and smaller).

Use utf8mb4 as your charset if you want "full" UTF-8.


Was able to find a solution. Ran the following as specified at http://technoguider.com/2015/05/utf8-set-up-in-mysql/

SET NAMES UTF8;
set collation_server = utf8_general_ci;
set default-character-set = utf8;
set init_connect = ’SET NAMES utf8';
set character_set_server = utf8;
set character_set_client = utf8;

SET NAMES UTF8

This is does the trick


The short answer: Use utf8mb4 in 4 places:

  • The bytes in your client are utf8, not latin1/cp1251/etc.
  • SET NAMES utf8mb4 or something equivalent when establishing the client's connection to MySQL
  • CHARACTER SET utf8mb4 on all tables/columns -- except columns that are strictly ascii/hex/country_code/zip_code/etc.
  • <meta charset charset=UTF-8> if you are outputting to HTML. (Yes the spelling is different here.)

More info ;
UTF8 all the way

The above links provide the "detailed canonical answer is required to address all the concerns". -- There is a space limit on this forum.

Edit

In addition to CHARACTER SET utf8mb4 containing "all" the world's characters, COLLATION utf8mb4_unicode_520_ci is arguable the 'best all-around' collation to use. (There are also Turkish, Spanish, etc, collations for those who want the nuances in those languages.)


MySQL 4.1 and above has a default character set that it calls utf8 but which is actually only a subset of UTF-8 (allows only three-byte characters and smaller).

Use utf8mb4 as your charset if you want "full" UTF-8.


Set your database connection to UTF8:

  if($handle = @mysql_connect(DB_HOST, DB_USER, DB_PASS)){          
         //set to utf8 encoding
         mysql_set_charset('utf8',$handle);
  }

The charset is a property of the database (default) and the table. You can have a look (MySQL commands):

show create database foo; 
> CREATE DATABASE  `foo`.`foo` /*!40100 DEFAULT CHARACTER SET latin1 */

show create table foo.bar;
> lots of stuff ending with
> ) ENGINE=InnoDB AUTO_INCREMENT=252 DEFAULT CHARSET=latin1

In other words; it's quite easy to check your database charset or change it:

ALTER TABLE `foo`.`bar` CHARACTER SET utf8;

I followed Javier's solution, but I added some different lines in my.cnf:

[myslqd]
skip-character-set-client-handshake
collation_server=utf8_unicode_ci
character_set_server=utf8 

I found this idea here: http://dev.mysql.com/doc/refman/5.0/en/charset-server.html in the first/only user comment on the bottom of the page. He mentions that skip-character-set-client-handshake has some importance.


These tips on MySQL and UTF-8 may be helpful. Unfortunately, they don't constitute a full solution, just common gotchas.


The charset is a property of the database (default) and the table. You can have a look (MySQL commands):

show create database foo; 
> CREATE DATABASE  `foo`.`foo` /*!40100 DEFAULT CHARACTER SET latin1 */

show create table foo.bar;
> lots of stuff ending with
> ) ENGINE=InnoDB AUTO_INCREMENT=252 DEFAULT CHARSET=latin1

In other words; it's quite easy to check your database charset or change it:

ALTER TABLE `foo`.`bar` CHARACTER SET utf8;

The short answer: Use utf8mb4 in 4 places:

  • The bytes in your client are utf8, not latin1/cp1251/etc.
  • SET NAMES utf8mb4 or something equivalent when establishing the client's connection to MySQL
  • CHARACTER SET utf8mb4 on all tables/columns -- except columns that are strictly ascii/hex/country_code/zip_code/etc.
  • <meta charset charset=UTF-8> if you are outputting to HTML. (Yes the spelling is different here.)

More info ;
UTF8 all the way

The above links provide the "detailed canonical answer is required to address all the concerns". -- There is a space limit on this forum.

Edit

In addition to CHARACTER SET utf8mb4 containing "all" the world's characters, COLLATION utf8mb4_unicode_520_ci is arguable the 'best all-around' collation to use. (There are also Turkish, Spanish, etc, collations for those who want the nuances in those languages.)


SET NAMES UTF8

This is does the trick


To make this 'permanent', in my.cnf:

[client]
default-character-set=utf8
[mysqld]
character-set-server = utf8

To check, go to the client and show some variables:

SHOW VARIABLES LIKE 'character_set%';

Verify that they're all utf8, except ..._filesystem, which should be binary and ..._dir, that points somewhere in the MySQL installation.


Was able to find a solution. Ran the following as specified at http://technoguider.com/2015/05/utf8-set-up-in-mysql/

SET NAMES UTF8;
set collation_server = utf8_general_ci;
set default-character-set = utf8;
set init_connect = ’SET NAMES utf8';
set character_set_server = utf8;
set character_set_client = utf8;

Set your database collation to UTF-8 then apply table collation to database default.


To change the character set encoding to UTF-8 for the database itself, type the following command at the mysql> prompt. USE ALTER DATABASE.. Replace DBNAME with the database name:

ALTER DATABASE DBNAME CHARACTER SET utf8 COLLATE utf8_general_ci;

This is a duplicate of this question How to convert an entire MySQL database characterset and collation to UTF-8?


Set your database collation to UTF-8 then apply table collation to database default.


To make this 'permanent', in my.cnf:

[client]
default-character-set=utf8
[mysqld]
character-set-server = utf8

To check, go to the client and show some variables:

SHOW VARIABLES LIKE 'character_set%';

Verify that they're all utf8, except ..._filesystem, which should be binary and ..._dir, that points somewhere in the MySQL installation.


Your answer is you can configure by MySql Settings. In My Answer may be something gone out of context but this is also know is help for you.
how to configure Character Set and Collation.

For applications that store data using the default MySQL character set and collation (latin1, latin1_swedish_ci), no special configuration should be needed. If applications require data storage using a different character set or collation, you can configure character set information several ways:

  • Specify character settings per database. For example, applications that use one database might require utf8, whereas applications that use another database might require sjis.
  • Specify character settings at server startup. This causes the server to use the given settings for all applications that do not make other arrangements.
  • Specify character settings at configuration time, if you build MySQL from source. This causes the server to use the given settings for all applications, without having to specify them at server startup.

The examples shown here for your question to set utf8 character set , here also set collation for more helpful(utf8_general_ci collation`).

Specify character settings per database

  CREATE DATABASE new_db
  DEFAULT CHARACTER SET utf8
  DEFAULT COLLATE utf8_general_ci;

Specify character settings at server startup

[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci

Specify character settings at MySQL configuration time

shell> cmake . -DDEFAULT_CHARSET=utf8 \
           -DDEFAULT_COLLATION=utf8_general_ci

To see the values of the character set and collation system variables that apply to your connection, use these statements:

SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';

This May be lengthy answer but there is all way, you can use. Hopeful my answer is helpful for you. for more information http://dev.mysql.com/doc/refman/5.7/en/charset-applications.html


Set your database connection to UTF8:

  if($handle = @mysql_connect(DB_HOST, DB_USER, DB_PASS)){          
         //set to utf8 encoding
         mysql_set_charset('utf8',$handle);
  }

To make this 'permanent', in my.cnf:

[client]
default-character-set=utf8
[mysqld]
character-set-server = utf8

To check, go to the client and show some variables:

SHOW VARIABLES LIKE 'character_set%';

Verify that they're all utf8, except ..._filesystem, which should be binary and ..._dir, that points somewhere in the MySQL installation.


These tips on MySQL and UTF-8 may be helpful. Unfortunately, they don't constitute a full solution, just common gotchas.


The charset is a property of the database (default) and the table. You can have a look (MySQL commands):

show create database foo; 
> CREATE DATABASE  `foo`.`foo` /*!40100 DEFAULT CHARACTER SET latin1 */

show create table foo.bar;
> lots of stuff ending with
> ) ENGINE=InnoDB AUTO_INCREMENT=252 DEFAULT CHARSET=latin1

In other words; it's quite easy to check your database charset or change it:

ALTER TABLE `foo`.`bar` CHARACTER SET utf8;

DATABASE CONNECTION TO UTF-8

$connect = mysql_connect('$localhost','$username','$password') or die(mysql_error());
mysql_set_charset('utf8',$connect);
mysql_select_db('$database_name','$connect') or die(mysql_error());

The charset is a property of the database (default) and the table. You can have a look (MySQL commands):

show create database foo; 
> CREATE DATABASE  `foo`.`foo` /*!40100 DEFAULT CHARACTER SET latin1 */

show create table foo.bar;
> lots of stuff ending with
> ) ENGINE=InnoDB AUTO_INCREMENT=252 DEFAULT CHARSET=latin1

In other words; it's quite easy to check your database charset or change it:

ALTER TABLE `foo`.`bar` CHARACTER SET utf8;

Your answer is you can configure by MySql Settings. In My Answer may be something gone out of context but this is also know is help for you.
how to configure Character Set and Collation.

For applications that store data using the default MySQL character set and collation (latin1, latin1_swedish_ci), no special configuration should be needed. If applications require data storage using a different character set or collation, you can configure character set information several ways:

  • Specify character settings per database. For example, applications that use one database might require utf8, whereas applications that use another database might require sjis.
  • Specify character settings at server startup. This causes the server to use the given settings for all applications that do not make other arrangements.
  • Specify character settings at configuration time, if you build MySQL from source. This causes the server to use the given settings for all applications, without having to specify them at server startup.

The examples shown here for your question to set utf8 character set , here also set collation for more helpful(utf8_general_ci collation`).

Specify character settings per database

  CREATE DATABASE new_db
  DEFAULT CHARACTER SET utf8
  DEFAULT COLLATE utf8_general_ci;

Specify character settings at server startup

[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci

Specify character settings at MySQL configuration time

shell> cmake . -DDEFAULT_CHARSET=utf8 \
           -DDEFAULT_COLLATION=utf8_general_ci

To see the values of the character set and collation system variables that apply to your connection, use these statements:

SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';

This May be lengthy answer but there is all way, you can use. Hopeful my answer is helpful for you. for more information http://dev.mysql.com/doc/refman/5.7/en/charset-applications.html


Examples related to mysql

Implement specialization in ER diagram How to post query parameters with Axios? PHP with MySQL 8.0+ error: The server requested authentication method unknown to the client Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver' phpMyAdmin - Error > Incorrect format parameter? Authentication plugin 'caching_sha2_password' is not supported How to resolve Unable to load authentication plugin 'caching_sha2_password' issue Connection Java-MySql : Public Key Retrieval is not allowed How to grant all privileges to root user in MySQL 8.0 MySQL 8.0 - Client does not support authentication protocol requested by server; consider upgrading MySQL client

Examples related to utf-8

error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Changing PowerShell's default output encoding to UTF-8 'Malformed UTF-8 characters, possibly incorrectly encoded' in Laravel Encoding Error in Panda read_csv Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? what is <meta charset="utf-8">? Pandas df.to_csv("file.csv" encode="utf-8") still gives trash characters for minus sign UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) Android Studio : unmappable character for encoding UTF-8