I had a one-off data migration issue where the source data could not output correctly some unusual/technical characters plus the ubiquitous extra commas in CSVs.
We decided that for each such character the source extract should replace them with something that was recognisable to both the source system and the SQL Server that was loading them but which would not be in the data otherwise.
It did mean however that in various columns across various tables these replacement characters would appear and I would have to replace them. Nesting multiple REPLACE functions made the import code look scary and prone to errors in misjudging the placement and number of brackets so I wrote the following function. I know it can process a column in a table of 3,000 rows in less than a second though I'm not sure how quickly it will scale up to multi-million row tables.
create function [dbo].[udf_ReplaceMultipleChars]
(
@OriginalString nvarchar(4000)
, @ReplaceTheseChars nvarchar(100)
, @LengthOfReplacement int = 1
)
returns nvarchar(4000)
begin
declare @RevisedString nvarchar(4000) = N'';
declare @lengthofinput int =
(
select len(@OriginalString)
);
with AllNumbers
as (select 1 as Number
union all
select Number + 1
from AllNumbers
where Number < @lengthofinput)
select @RevisedString += case
when (charindex(substring(@OriginalString, Number, 1), @ReplaceTheseChars, 1) - 1) % 2
= 0 then
substring(
@ReplaceTheseChars
, charindex(
substring(@OriginalString, Number, 1)
, @ReplaceTheseChars
, 1
) + 1
, @LengthOfReplacement
)
else
substring(@OriginalString, Number, 1)
end
from AllNumbers
option (maxrecursion 4000);
return (@RevisedString);
end;
It works by submitting both the string to be evaluated and have characters to be replaced (@OriginalString) along with a string of paired characters where the first character is to be replaced by the second, the third by the fourth, fifth by sixth and so on (@ReplaceTheseChars).
Here is the string of chars that I needed to replace and their replacements... [']"~,{Ø}°$±|¼¦¼ª½¬½^¾#?
i.e. A opening square bracket denotes an apostrophe, a closing one a double quote. You can see that there were vulgar fractions as well as degrees and diameter symbols in there.
There is a default @LengthOfReplacement that is included as a starting point if anyone needed to replace longer strings. I played around with that in my project but the single char replacement was the main function.
The condition of the case statement is important. It ensures that it only replaces the character if it is found in your @ReplaceTheseChars variable and that the character has to be found in an odd numbered position (the minus 1 from charindex result ensures that anything NOT found returns a negative modulo value). i.e if you find a tilde (~) in position 5 it will replace it with a comma but if on a subsequent run it found the comma in position 6 it would not replace it with a curly bracket ({).
This can be best demonstrated with an example...
declare @ProductDescription nvarchar(20) = N'abc~def[¦][123';
select @ProductDescription
= dbo.udf_ReplaceMultipleChars(
@ProductDescription
/* NB the doubling up of the apostrophe is necessary in the string but resolves to a single apostrophe when passed to the function */
,'['']"~,{Ø}°$±|¼¦¼ª½¬½^¾#?'
, default
);
select @ProductDescription
, dbo.udf_ReplaceMultipleChars(
@ProductDescription
,'['']"~,{Ø}°$±|¼¦¼ª½¬½^¾#?'
/* if you didn't know how to type those peculiar chars in then you can build a string like this... '[' + nchar(0x0027) + ']"~,{' + nchar(0x00D8) + '}' + nchar(0x00B0) etc */
,
default
);
This will return both the value after the first pass through the function and the second time as follows... abc,def'¼"'123 abc,def'¼"'123
A table update would just be
update a
set a.Col1 = udf.ReplaceMultipleChars(a.Col1,'~,]"',1)
from TestTable a
Finally (I hear you say!), although I've not had access to the translate function I believe that this function can process the example shown in the documentation quite easily. The TRANSLATE function demo is
SELECT TRANSLATE('2*[3+4]/{7-2}', '[]{}', '()()');
which returns 2*(3+4)/(7-2) although I understand it might not work on 2*[3+4]/[7-2] !!
My function would approach this as follows listing each char to be replaced followed by its replacement [ --> (, { --> ( etc.
select dbo.udf_ReplaceMultipleChars('2*[3+4]/{7-2}', '[({(])})', 1);
which will also work for
select dbo.udf_ReplaceMultipleChars('2*[3+4]/[7-2]', '[({(])})', 1);
I hope someone finds this useful and if you get to test its performance against larger tables do let us know one way or another!