How to Use UTF-8 Collation in SQL Server database

Question

I ve migrated a database from mysql to SQL Server  politics   original mysql database using UTF8   Now I read https   dba stackexchange com questions 7346 sql-server-2005-2008-utf-8-collation-charset that SQL Server 2008 doesn t support utf8  is this a joke   The SQL Server hosts multiple databases  mostly Latin-encoded  Since the migrated db is intended for web publishing  I want to keep the utf8-encoding  Have I missed something or do I need to enc dec at application level

User · Answer

Two UDF to deal with UTF-8 in T-SQL   CREATE Function UcsToUtf8  src nvarchar MAX   returns varchar MAX  as begin     declare  res varchar MAX       pi char 8        char 0   -  char 127         i int   j int     select  i patindex  pi  src collate Latin1 General BIN      while  i gt 0     begin         select  j unicode substring  src  i 1           if  j lt 0x800     select  res  res left  src  i-1  char   j amp 1984  64 192  char   j amp 63  128          else            select  res  res left  src  i-1  char   j amp 61440  4096 224  char   j amp 4032  64 128  char   j amp 63  128          select  src substring  src  i 1 datalength  src -1    i patindex  pi  src collate Latin1 General BIN      end     select  res  res  src     return  res end  CREATE Function Utf8ToUcs  src varchar MAX   returns nvarchar MAX  as begin     declare  i int   res nvarchar MAX   src   pi varchar 18      select  pi      -       -       -       i patindex  pi  src collate Latin1 General BIN      while  i gt 0 select  res stuff  res  i 3 nchar   ascii substring  src  i 1   amp 31  4096    ascii substring  src  i 1 1   amp 63  64   ascii substring  src  i 2 1   amp 63      src stuff  src  i 3        i patindex  pi  src collate Latin1 General BIN      select  pi      -       -       i patindex  pi  src collate Latin1 General BIN      while  i gt 0 select  res stuff  res  i 2 nchar   ascii substring  src  i 1   amp 31  64   ascii substring  src  i 1 1   amp 63      src stuff  src  i 2       i patindex  pi  src collate Latin1 General BIN      return  res end

User · Answer

Looks like this will be finally supported in the SQL Server 2019  SQL Server 2019 - whats new      From BOL       UTF-8 support      Full support for the widely used UTF-8 character encoding as an import   or export encoding  or as database-level or column-level collation for   text data  UTF-8 is allowed in the CHAR and VARCHAR datatypes  and is   enabled when creating or changing an object   s collation to a collation   with the UTF8 suffix       For example LATIN1 GENERAL 100 CI AS SC to   LATIN1 GENERAL 100 CI AS SC UTF8  UTF-8 is only available to Windows   collations that support supplementary characters  as introduced in SQL   Server 2012  NCHAR and NVARCHAR allow UTF-16 encoding only  and remain   unchanged       This feature may provide significant storage savings  depending on the   character set in use  For example  changing an existing column data   type with ASCII strings from NCHAR 10  to CHAR 10  using an UTF-8   enabled collation  translates into nearly 50  reduction in storage   requirements  This reduction is because NCHAR 10  requires 22 bytes   for storage  whereas CHAR 10  requires 12 bytes for the same Unicode   string    2019-05-14 update   Documentation seems to be updated now and explains our options staring in MSSQL 2019 in section  Collation and Unicode Support    2019-07-24 update   Article by Pedro Lopes - Senior Program Manager   Microsoft about introducing UTF-8 support for Azure SQL Database

User · Answer

UTF-8 is not a character set  it s an encoding  The character set for UTF-8 is Unicode  If you want to store Unicode text you use the nvarchar data type   If the database would use UTF-8 to store text  you would still not get the text out as encoded UTF-8 data  you would get it out as decoded text   You can easily store UTF-8 encoded text in the database  but then you don t store it as text  you store it as binary data  varbinary

User · Answer

No  It s not a joke   Take a look here  http   msdn microsoft com en-us library ms186939 aspx     Character data types that are either fixed-length  nchar  or   variable-length  nvarchar  Unicode data and use the UNICODE UCS-2   character set    And also here  http   en wikipedia org wiki UTF-16     The older UCS-2  2-byte Universal Character Set  is a similar   character encoding that was superseded by UTF-16 in version 2 0 of the   Unicode standard in July 1996

User · Answer

Note that as of Microsoft SQL Server 2016  UTF-8 is supported by bcp  BULK INSERT  and OPENROWSET   Addendum 2016-12-21  SQL Server 2016 SP1 now enables Unicode Compression  and most other previously Enterprise-only features  for all versions of MS SQL including Standard and Express  This is not the same as UTF-8 support  but it yields a similar benefit if the goal is disk space reduction for Western alphabets

[sql-server] How to Use UTF-8 Collation in SQL Server database?

Examples related to sql-server

Examples related to utf-8