What does COLLATE SQL Latin1 General CP1 CI AS do

Question

I have an SQL query to create the database in SQLServer as given below   create database yourdb on   name    yourdb dat     filename    c  program files microsoft sql server mssql 1 mssql data yourdbdat mdf     size   25mb    maxsize   1500mb    filegrowth   10mb   log on   name    yourdb log     filename    c  program files microsoft sql server mssql 1 mssql data yourdblog ldf     size   7mb    maxsize   375mb    filegrowth   10mb   COLLATE SQL Latin1 General CP1 CI AS  go   It runs fine   While rest of the SQL is clear to be I am quite confused about the functionality of COLLATE SQL Latin1 General CP1 CI AS   Can anyone explain this to me  Also  I would like to know if creating the database in this way is a best practice

User · Answer

The COLLATE keyword specify what kind of character set and rules  order  confrontation rules  you are using for string values   For example in your case you  are using Latin rules with case insensitive  CI  and accent sensitive  AS   You can refer to this Documentation

User · Answer

Please be aware that the accepted answer is a bit incomplete  Yes  at the most basic level Collation handles sorting  BUT  the comparison rules defined by the chosen Collation are used in many places outside of user queries against user data   If  What does COLLATE SQL Latin1 General CP1 CI AS do   means  What does the COLLATE clause of CREATE DATABASE do    then   The COLLATE  collation name  clause of the CREATE DATABASE statement specifies the default Collation of the Database  and not the Server  Database-level and Server-level default Collations control different things   Server  i e  Instance -level controls    Database-level Collation for system Databases  master  model  msdb  and tempdb  Due to controlling the DB-level Collation of tempdb  it is then the default Collation for string columns in temporary tables  global and local   but not table variables  Due to controlling the DB-level Collation of master  it is then the Collation used for Server-level data  such as Database names  i e  name column in sys databases   Login names  etc  Handling of parameter   variable names Handling of cursor names Handling of GOTO labels Default Collation used for newly created Databases when the COLLATE clause is missing   Database-level controls    Default Collation used for newly created string columns  CHAR  VARCHAR  NCHAR  NVARCHAR  TEXT  and NTEXT -- but don t use TEXT or NTEXT  when the COLLATE clause is missing from the column definition  This goes for both CREATE TABLE and ALTER TABLE     ADD statements  Default Collation used for string literals  i e   some text   and string variables  i e   StringVariable   This Collation is only ever used when comparing strings and variables to other strings and variables  When comparing strings   variables to columns  then the Collation of the column will be used  The Collation used for Database-level meta-data  such as object names  i e  sys objects   column names  i e  sys columns   index names  i e  sys indexes   etc  The Collation used for Database-level objects  tables  columns  indexes  etc    Also    ASCII is an encoding which is 8-bit  for common usage  technically  ASCII  is 7-bit with character values 0 - 127  and  ASCII Extended  is 8-bit with character values 0 - 255   This group is the same across cultures  The Code Page is the  extended  part of Extended ASCII  and controls which characters are used for values 128 - 255  This group varies between each culture  Latin1 does not mean  ASCII  since standard ASCII only covers values 0 - 127  and all code pages  that can be represented in SQL Server  and even NVARCHAR  map those same 128 values to the same characters    If  What does COLLATE SQL Latin1 General CP1 CI AS do   means  What does this particular collation do    then    Because the name start with SQL   this is a SQL Server collation  not a Windows collation  These are definitely obsolete  even if not officially deprecated  and are mainly for pre-SQL Server 2000 compatibility  Although  quite unfortunately SQL Latin1 General CP1 CI AS is very common due to it being the default when installing on an OS using US English as its language  These collations should be avoided if at all possible   Windows collations  those with names not starting with SQL   are newer  more functional  have consistent sorting between VARCHAR and NVARCHAR for the same values  and are being updated with additional   corrected sort weights and uppercase lowercase mappings  These collations also don t have the potential performance problem that the SQL Server collations have  Impact on Indexes When Mixing VARCHAR and NVARCHAR Types  Latin1 General is the culture   locale    For NCHAR  NVARCHAR  and NTEXT data this determines the linguistic rules used for sorting and comparison  For CHAR  VARCHAR  and TEXT data  columns  literals  and variables  this determines the    linguistic rules used for sorting and comparison  code page used to encode the characters  For example  Latin1 General collations use code page 1252  Hebrew collations use code page 1255  and so on    CP code page  or  version    For SQL Server collations  CP code page   is the 8-bit code page that determines what characters map to values 128 - 255  While there are four code pages for Double-Byte Character Sets  DBCS  that can use 2-byte combinations to create more than 256 characters  these are not available for the SQL Server collations  For Windows collations   version   while not present in all collation names  refers to the SQL Server version in which the collation was introduced  for the most part   Windows collations with no version number in the name are version 80  meaning SQL Server 2000 as that is version 8 0   Not all versions of SQL Server come with new collations  so there are gaps in the version numbers  There are some that are 90  for SQL Server 2005  which is version 9 0   most are 100  for SQL Server 2008  version 10 0   and a small set has 140  for SQL Server 2017  version 14 0    I said  for the most part  because the collations ending in  SC were introduced in SQL Server 2012  version 11 0   but the underlying data wasn t new  they merely added support for supplementary characters for the built-in functions  So  those endings exist for version 90 and 100 collations  but only starting in SQL Server 2012   Next you have the sensitivities  that can be in any combination of the following  but always specified in this order    CS   case-sensitive or CI   case-insensitive AS   accent-sensitive or AI   accent-insensitive KS   Kana type-sensitive or missing   Kana type-insensitive WS   width-sensitive or missing   width insensitive VSS   variation selector sensitive  only available in the version 140 collations  or missing   variation selector insensitive  Optional last piece     SC at the end means  Supplementary Character support   The  support  only affects how the built-in functions interpret surrogate pairs  which are how supplementary characters are encoded in UTF-16   Without  SC at the end  or  140  in the middle   built-in functions don t see a single supplementary character  but instead see two meaningless code points that make up the surrogate pair  This ending can be added to any non-binary  version 90 or 100 collation   BIN or  BIN2 at the end means  binary  sorting and comparison  Data is still stored the same  but there are no linguistic rules  This ending is never combined with any of the 5 sensitivities or  SC   BIN is the older style  and  BIN2 is the newer  more accurate style  If using SQL Server 2005 or newer  use  BIN2  For details on the differences between  BIN and  BIN2  please see  Differences Between the Various Binary Collations  Cultures  Versions  and BIN vs BIN2    UTF8 is a new option as of SQL Server 2019  It s an 8-bit encoding that allows for Unicode data to be stored in VARCHAR and CHAR datatypes  but not the deprecated TEXT datatype   This option can only be used on collations that support supplementary characters  i e  version 90 or 100 collations with  SC in their name  and version 140 collations   There is also a single binary  UTF8 collation   BIN2  not  BIN    PLEASE NOTE  UTF-8 was designed   created for compatibility with environments   code that are set up for 8-bit encodings yet want to support Unicode  Even though there are a few scenarios where UTF-8 can provide up to 50  space savings as compared to NVARCHAR  that is a side-effect and has a cost of a slight hit to performance in many   most operations  If you need this for compatibility  then the cost is acceptable  If you want this for space-savings  you had better test  and TEST AGAIN  Testing includes all functionality  and more than just a few rows of data  Be warned that UTF-8 collations work best when ALL columns  and the database itself  are using VARCHAR data  columns  variables  string literals  with a  UTF8 collation  This is the natural state for anyone using this for compatibility  but not for those hoping to use it for space-savings  Be careful when mixing VARCHAR data using a  UTF8 collation with either VARCHAR data using non- UTF8 collations or NVARCHAR data  as you might experience odd behavior   data loss  For more details on the new UTF-8 collations  please see  Native UTF-8 Support in SQL Server 2019  Savior or False Prophet

User · Answer

The CP1 means  Code Page 1  - technically this translates to code page 1252

User · Answer

It sets how the database server sorts  compares pieces of text   in this case   SQL Latin1 General CP1 CI AS   breaks up into interesting parts    latin1 makes the server treat strings using charset latin 1  basically ascii CP1 stands for Code Page 1252 CI case insensitive comparisons so  ABC  would equal  abc  AS accent sensitive  so      does not equal  u    P S  For more detailed information be sure to read  solomon-rutzky s answer

User · Answer

This specifies the default collation for the database  Every text field that you create in tables in the database will use that collation  unless you specify a different one   A database always has a default collation  If you don t specify any  the default collation of the SQL Server instance is used   The name of the collation that you use shows that it uses the Latin1 code page 1  is case insensitive  CI  and accent sensitive  AS   This collation is used in the USA  so it will contain sorting rules that are used in the USA   The collation decides how text values are compared for equality and likeness  and how they are compared when sorting  The code page is used when storing non-unicode data  e g  varchar fields

[sql-server] What does 'COLLATE SQL_Latin1_General_CP1_CI_AS' do?

Examples related to sql-server

Examples related to database

Examples related to tsql

Examples related to collation