What s the best way to identify hidden characters in the result of a query in SQL Server Query Analyzer

Question

When trying to identify erroneous data  often needing manual review and removal   I d like an easy way of seeing hidden characters  such as TAB  Space  Carriage return and Line feed  Is there a built-in way for this   In a similar question here on stackoverflow  regarding Oracle  a DUMP fieldname  function was suggested  but I don t know if that woud make things easier even if a corresponding function would exist in SQL Server  since I need to see the Characters in their context   The best idea I could come up with was replacing the expected hidden characters with visible ones  like this   SELECT REPLACE REPLACE REPLACE REPLACE myfield             CHAR 13     CR     CHAR 10     LF     CHAR 9     TAB    FROM mytable  Is there a better way  I don t like this way since there might be other less common hidden characters that are not taken into account by me such as vertical TAB etc    Turning on  show hidden characters   as you can do in almost any text editor  would be such a nice feature in SQL Server Query Analyzer  so I almost expect that it can be done somehow in SQL server as well    or at least that someone has an even better idea than mine  to show this kind of white space info   I just noticed that there is a built-in way to see  white space   not in SQL Query Analyzer  but in the part of the interface that once was the SQL Enterprise manager  Right-click a table in SQL Management Studio Object Explorer tree  and select  Edit top 200 rows   In the result white space  at least CR LF  is visible as empty squares

User · Answer

select myfield  CAST myfield as varbinary max

User · Answer

You can always use the DATALENGTH Function to determine if you have extra white space characters in text fields  This won t make the text visible but will show you where there are extra white space characters       SELECT DATALENGTH  MyTextData    AS BinaryLength  LEN  MyTextData    AS TextLength   This will produce 11 for BinaryLength and 10 for TextLength   In a table your SQL would like this       SELECT        FROM tblA     WHERE DATALENGTH MyTextField   gt  LEN MyTextField    This function is usable in all versions of SQL Server beginning with 2005

User · Answer

They way I did it was by selecting all of the data  select   from myTable and then right-clicking on the result set and chose  Save results as     a csv file    Opening the csv file in Notepad   I saw the LF characters not visible in SQL Server result set

User · Answer

To find them  you can use this   WITH cte AS      SELECT 0 AS CharCode    UNION ALL    SELECT CharCode   1 FROM cte WHERE CharCode  lt 31   SELECT      FROM    mytable T      cross join cte WHERE    EXISTS  SELECT           FROM mytable Tx         WHERE Tx PKCol   T PKCol              AND               Tx MyField LIKE       CHAR cte CharCode                     Replacing the EXISTS with a JOIN will allow you to REPLACE them  but you ll get multiple rows    I can t think of a way around that

User · Answer

I have faced the same problem with a character that I never managed to match with a where query - CHARINDEX  LIKE  REPLACE  etc  did not work  Then I have used a brute force solution which is awful  heavy but works   Step 1  make a copy of the complete data set - keep track of the original names with an source id referencing the pk of the source table  and keep this source id in all the subsequent tables   Step 2  LTRIM RTRIM the data  and replace all double spaces  tab  etc  basically all the CHAR 1  to CHAR 32  by one space  Lowercase the whole set as well  Step 3  replace all the special characters that you know  get the list of all the quotes  double quotes  etc   by something from a-z  I suggest z   Basically replace everything that is not standard English characters by a z  using nested REPLACE of REPLACE in a loop   Step 4  split by word into a second copy  where each word is in a separate row - the split is a SUBSTRING based on the position of the space characters - at this point  we should miss the ones where there s a hidden space that we did not catche earlier  Step 5  split each word into a third copy  where each letter is in a separate row  I know it makes a very large table  - keep track of the charindex of each letter in a separate column  Step 6  Select everything in the above table which is not LIKE  a-z   This is the list of the unidentified characters we want to exclude    From the output of step 6 we have enough data to make a series of substring of the source to select everything but the unknown character we want to exclude    Note 1  there are smart ways to optimize this  depending on the size of the original expression  steps 4  5 and 6 can be made in one go    Note 2  this is not very fast  but the fastest way to get this done for a large data set  because the split of lines into words and words into letters is made by substring  which slices all the table into one character slices  However  this is quite heavy to build  With a smaller set  it may be enough to parse each record one by one and search for character which is not in a list of all English characters plus all special characters

User · Answer

Create a function that addresses all the whitespace possibilites and enable only those that seem appropriate   SELECT dbo ShowWhiteSpace myfield  from mytable  Uncomment only those whitespace cases you want to test for    CREATE FUNCTION dbo ShowWhiteSpace   str varchar 8000   RETURNS varchar 8000  AS BEGIN      DECLARE  ShowWhiteSpace varchar 8000        SET  ShowWhiteSpace    str      SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 32               SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 13     CR         SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 10     LF         SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 9      TAB    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 1      SOH    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 2      STX    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 3      ETX    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 4      EOT    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 5      ENQ    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 6      ACK    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 7      BEL    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 8      BS    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 11     VT    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 12     FF    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 14     SO    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 15     SI    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 16     DLE    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 17     DC1    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 18     DC2    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 19     DC3    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 20     DC4    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 21     NAK    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 22     SYN    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 23     ETB    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 24     CAN    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 25     EM    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 26     SUB    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 27     ESC    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 28     FS    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 29     GS    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 30     RS    --   SET  ShowWhiteSpace   REPLACE   ShowWhiteSpace  CHAR 31     US         RETURN  ShowWhiteSpace  END

[sql-server] What's the best way to identify hidden characters in the result of a query in SQL Server (Query Analyzer)?

Examples related to sql-server

Examples related to hidden

Examples related to query-analyzer