[sql-server] SQL Server - Convert varchar to another collation (code page) to fix character encoding

I'm querying a SQL Server database that uses the SQL_Latin1_General_CP850_BIN2 collation. One of the table rows has a varchar with a value that includes the +/- character (decimal code 177 in the Windows-1252 codepage).

When I query the table directly in SQL Server Management Studio, I get a gibberish character instead of the +/- character in this row. When I use this table as the source in an SSIS package, the destination table (which uses the typical SQL_Latin1_General_CP1_CI_AS collation), ends up with the correct +/- character.

I now have to build a mechanism that directly queries the source table without SSIS. How do I do this in a way that I get the correct character instead of gibberish? My guess would be that I would need to convert/cast the column to the SQL_Latin1_General_CP1_CI_AS collation but that isn't working as I keep getting a gibberish character.

I've tried the following with no luck:

select 
columnName collate SQL_Latin1_General_CP1_CI_AS
from tableName

select 
cast (columnName as varchar(100)) collate SQL_Latin1_General_CP1_CI_AS
from tableName

select 
convert (varchar, columnName) collate SQL_Latin1_General_CP1_CI_AS
from tableName

What am I doing wrong?

This question is related to sql-server character-encoding collation

The answer is


I think SELECT CAST( CAST([field] AS VARBINARY(120)) AS varchar(120)) for your update


try:

SELECT CAST( CAST([field] AS VARBINARY) AS varchar) 

We may need more information. Here is what I did to reproduce on SQL Server 2008:

CREATE DATABASE [Test] ON  PRIMARY 
    ( 
    NAME = N'Test'
    , FILENAME = N'...Test.mdf' 
    , SIZE = 3072KB 
    , FILEGROWTH = 1024KB 
    )
    LOG ON 
    ( 
    NAME = N'Test_log'
    , FILENAME = N'...Test_log.ldf' 
    , SIZE = 1024KB 
    , FILEGROWTH = 10%
    )
    COLLATE SQL_Latin1_General_CP850_BIN2
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[MyTable]
    (
    [SomeCol] [varchar](50) NULL
    ) ON [PRIMARY]
GO
Insert MyTable( SomeCol )
Select '±' Collate SQL_Latin1_General_CP1_CI_AS
GO
Select SomeCol, SomeCol Collate SQL_Latin1_General_CP1_CI_AS
From MyTable

Results show the original character. Declaring collation in the query should return the proper character from SQL Server's perspective however it may be the case that the presentation layer is then converting to something yet different like UTF-8.


Must be used convert, not cast:

SELECT
 CONVERT(varchar(50), N'æøåáälcçcédnoöruýtžš')
 COLLATE Cyrillic_General_CI_AI

(http://blog.sqlpositive.com/2010/03/using-convert-with-collate-to-strip-accents-from-unicode-strings/)


Character set conversion is done implicitly on the database connection level. You can force automatic conversion off in the ODBC or ADODB connection string with the parameter "Auto Translate=False". This is NOT recommended. See: https://msdn.microsoft.com/en-us/library/ms130822.aspx

There has been a codepage incompatibility in SQL Server 2005 when Database and Client codepage did not match. https://support.microsoft.com/kb/KbView/904803

SQL-Management Console 2008 and upwards is a UNICODE application. All values entered or requested are interpreted as such on the application level. Conversation to and from the column collation is done implicitly. You can verify this with:

SELECT CAST(N'±' as varbinary(10)) AS Result

This will return 0xB100 which is the Unicode character U+00B1 (as entered in the Management Console window). You cannot turn off "Auto Translate" for Management Studio.

If you specify a different collation in the select, you eventually end up in a double conversion (with possible data loss) as long as "Auto Translate" is still active. The original character is first transformed to the new collation during the select, which in turn gets "Auto Translated" to the "proper" application codepage. That's why your various COLLATION tests still show all the same result.

You can verify that specifying the collation DOES have an effect in the select, if you cast the result as VARBINARY instead of VARCHAR so the SQL Server transformation is not invalidated by the client before it is presented:

SELECT cast(columnName COLLATE SQL_Latin1_General_CP850_BIN2 as varbinary(10)) from tableName
SELECT cast(columnName COLLATE SQL_Latin1_General_CP1_CI_AS as varbinary(10)) from tableName

This will get you 0xF1 or 0xB1 respectively if columnName contains just the character '±'

You still might get the correct result and yet a wrong character, if the font you are using does not provide the proper glyph.

Please double check the actual internal representation of your character by casting the query to VARBINARY on a proper sample and verify whether this code indeed corresponds to the defined database collation SQL_Latin1_General_CP850_BIN2

SELECT CAST(columnName as varbinary(10)) from tableName

Differences in application collation and database collation might go unnoticed as long as the conversion is always done the same way in and out. Troubles emerge as soon as you add a client with a different collation. Then you might find that the internal conversion is unable to match the characters correctly.

All that said, you should keep in mind that Management Studio usually is not the final reference when interpreting result sets. Even if it looks gibberish in MS, it still might be the correct output. The question is whether the records show up correctly in your applications.


Examples related to sql-server

Passing multiple values for same variable in stored procedure SQL permissions for roles Count the Number of Tables in a SQL Server Database Visual Studio 2017 does not have Business Intelligence Integration Services/Projects ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database How to create temp table using Create statement in SQL Server? SQL Query Where Date = Today Minus 7 Days How do I pass a list as a parameter in a stored procedure? SQL Server date format yyyymmdd

Examples related to character-encoding

Changing PowerShell's default output encoding to UTF-8 JsonParseException : Illegal unquoted character ((CTRL-CHAR, code 10) Change the encoding of a file in Visual Studio Code What is the difference between utf8mb4 and utf8 charsets in MySQL? How to open html file? All inclusive Charset to avoid "java.nio.charset.MalformedInputException: Input length = 1"? UTF-8 output from PowerShell ERROR 1115 (42000): Unknown character set: 'utf8mb4' "for line in..." results in UnicodeDecodeError: 'utf-8' codec can't decode byte How to make php display \t \n as tab and new line instead of characters

Examples related to collation

#1273 – Unknown collation: ‘utf8mb4_unicode_520_ci’ phpmysql error - #1273 - #1273 - Unknown collation: 'utf8mb4_general_ci' How to fix a collation conflict in a SQL Server query? Cannot Resolve Collation Conflict SQL Server - Convert varchar to another collation (code page) to fix character encoding How to change the CHARACTER SET (and COLLATION) throughout a database? SQL Server default character encoding What does 'COLLATE SQL_Latin1_General_CP1_CI_AS' do? Changing SQL Server collation to case insensitive from case sensitive? Troubleshooting "Illegal mix of collations" error in mysql