SQL Server - Convert varchar to another collation code page to fix character encoding

Question

I m querying a SQL Server database that uses the SQL Latin1 General CP850 BIN2 collation   One of the table rows has a varchar with a value that includes the   - character  decimal code 177 in the Windows-1252 codepage    When I query the table directly in SQL Server Management Studio  I get a gibberish character instead of the   - character in this row  When I use this table as the source in an SSIS package  the destination table  which uses the typical SQL Latin1 General CP1 CI AS collation   ends up with the correct   - character   I now have to build a mechanism that directly queries the source table without SSIS  How do I do this in a way that I get the correct character instead of gibberish  My guess would be that I would need to convert cast the column to the SQL Latin1 General CP1 CI AS collation but that isn t working as I keep getting a gibberish character   I ve tried the following with no luck   select  columnName collate SQL Latin1 General CP1 CI AS from tableName  select  cast  columnName as varchar 100   collate SQL Latin1 General CP1 CI AS from tableName  select  convert  varchar  columnName  collate SQL Latin1 General CP1 CI AS from tableName   What am I doing wrong

User · Answer

We may need more information  Here is what I did to reproduce on SQL Server 2008   CREATE DATABASE  Test  ON  PRIMARY             NAME   N Test        FILENAME   N    Test mdf         SIZE   3072KB        FILEGROWTH   1024KB            LOG ON             NAME   N Test log        FILENAME   N    Test log ldf         SIZE   1024KB        FILEGROWTH   10            COLLATE SQL Latin1 General CP850 BIN2 GO SET ANSI NULLS ON GO SET QUOTED IDENTIFIER ON GO SET ANSI PADDING ON GO CREATE TABLE  dbo   MyTable             SomeCol   varchar  50  NULL       ON  PRIMARY  GO Insert MyTable  SomeCol   Select      Collate SQL Latin1 General CP1 CI AS GO Select SomeCol  SomeCol Collate SQL Latin1 General CP1 CI AS From MyTable   Results show the original character  Declaring collation in the query should return the proper character from SQL Server s perspective however it may be the case that the presentation layer is then converting to something yet different like UTF-8

User · Answer

Must be used convert  not cast   SELECT  CONVERT varchar 50   N           lc  c  dno  ru  t        COLLATE Cyrillic General CI AI    http   blog sqlpositive com 2010 03 using-convert-with-collate-to-strip-accents-from-unicode-strings

User · Answer

Character set conversion is done implicitly on the database connection level  You can force automatic conversion off in the ODBC or ADODB connection string with the parameter  Auto Translate False   This is NOT recommended  See  https   msdn microsoft com en-us library ms130822 aspx  There has been a codepage incompatibility in SQL Server 2005 when Database and Client codepage did not match  https   support microsoft com kb KbView 904803  SQL-Management Console 2008 and upwards is a UNICODE application  All values entered or requested are interpreted as such on the application level  Conversation to and from the column collation is done implicitly  You can verify this with   SELECT CAST N     as varbinary 10   AS Result   This will return 0xB100 which is the Unicode character U 00B1  as entered in the Management Console window   You cannot turn off  Auto Translate  for Management Studio   If you specify a different collation in the select  you eventually end up in a double conversion  with possible data loss  as long as  Auto Translate  is still active  The original character is first transformed to the new collation during the select  which in turn gets  Auto Translated  to the  proper  application codepage  That s why your various COLLATION tests still show all the same result    You can verify that specifying the collation DOES have an effect in the select  if you cast the result as VARBINARY instead of VARCHAR so the SQL Server transformation is not invalidated by the client before it is presented   SELECT cast columnName COLLATE SQL Latin1 General CP850 BIN2 as varbinary 10   from tableName SELECT cast columnName COLLATE SQL Latin1 General CP1 CI AS as varbinary 10   from tableName   This will get you 0xF1 or 0xB1 respectively if columnName contains just the character       You still might get the correct result and yet a wrong character  if the font you are using does not provide the proper glyph   Please double check the actual internal representation of your character by casting the query to VARBINARY on a proper sample and verify whether this code indeed corresponds to the defined database collation SQL Latin1 General CP850 BIN2  SELECT CAST columnName as varbinary 10   from tableName   Differences in application collation and database collation might go unnoticed as long as the conversion is always done the same way in and out  Troubles emerge as soon as you add a client with a different collation  Then you might find that the internal conversion is unable to match the characters correctly   All that said  you should keep in mind that Management Studio usually is not the final reference when interpreting result sets  Even if it looks gibberish in MS  it still might be the correct output  The question is whether the records show up correctly in your applications

User · Answer

try   SELECT CAST  CAST  field  AS VARBINARY  AS varchar

User · Answer

I think SELECT CAST  CAST  field  AS VARBINARY 120   AS varchar 120   for your update

[sql-server] SQL Server - Convert varchar to another collation (code page) to fix character encoding

Examples related to sql-server

Examples related to character-encoding

Examples related to collation