Possible Duplicate:
Split string in SQL
I have seen a couple of questions related to string concatenation in SQL. I wonder how would you approach the opposite problem: splitting coma delimited string into rows of data:
Lets say I have tables:
userTypedTags(userID,commaSeparatedTags) 'one entry per user
tags(tagID,name)
And want to insert data into table
userTag(userID,tagID) 'multiple entries per user
Inspired by Which tags are not in the database? question
EDIT
Thanks for the answers, actually more then one deserves to be accepted but I can only pick one, and the solution presented by Cade Roux with recursions seems pretty clean to me. It works on SQL Server 2005 and above.
For earlier version of SQL Server the solution provided by miies can be used. For working with text data type wcm answer will be helpful. Thanks again.
This question is related to
sql-server
sql-server-2005
tsql
SELECT substring(commaSeparatedTags,0,charindex(',',commaSeparatedTags))
will give you the first tag. You can proceed similarly to get the second one and so on by combining substring and charindex one layer deeper each time. That's an immediate solution but it works only with very few tags as the query grows very quickly in size and becomes unreadable. Move on to functions then, as outlined in other, more sophisticated answers to this post.
I usually do this with the following code:
create function [dbo].[Split](@string varchar(max), @separator varchar(10))
returns @splited table ( stringPart varchar(max) )
with execute as caller
as
begin
declare @stringPart varchar(max);
set @stringPart = '';
while charindex(@separator, @string) > 0
begin
set @stringPart = substring(@string, 0, charindex(@separator, @string));
insert into @splited (stringPart) values (@stringPart);
set @string = substring(@string, charindex(@separator, @string) + len(@separator), len(@string) + 1);
end
return;
end
go
You can test it with this query:
declare @example varchar(max);
set @example = 'one;string;to;rule;them;all;;';
select * from [dbo].[Split](@example, ';');
Using CLR, here's a much simpler alternative that works in all cases, yet 40% faster than the accepted answer:
using System;
using System.Collections;
using System.Data.SqlTypes;
using System.Text.RegularExpressions;
using Microsoft.SqlServer.Server;
public class UDF
{
[SqlFunction(FillRowMethodName="FillRow")]
public static IEnumerable RegexSplit(SqlString s, SqlString delimiter)
{
return Regex.Split(s.Value, delimiter.Value);
}
public static void FillRow(object row, out SqlString str)
{
str = new SqlString((string) row);
}
}
Of course, it is still 8 times slower than PostgreSQL's regexp_split_to_table
.
I up-voted "Nathan Wheeler" answer as I found "Cade Roux" answer did not work above a certain string size.
Couple of points
-I found adding the DISTINCT keyword improved performance for me.
-Nathan's answer only works if your identifiers are 5 characters or less, of course you can adjust that...If the items you are splitting are INT identifiers as I am you can us same as me below:
CREATE FUNCTION [dbo].Split
(
@sep VARCHAR(32),
@s VARCHAR(MAX)
)
RETURNS
@result TABLE (
Id INT NULL
)
AS
BEGIN
DECLARE @xml XML
SET @XML = N'<root><r>' + REPLACE(@s, @sep, '</r><r>') + '</r></root>'
INSERT INTO @result(Id)
SELECT DISTINCT r.value('.','int') as Item
FROM @xml.nodes('//root//r') AS RECORDS(r)
RETURN
END
You can also achieve this effect using XML, as seen here, which removes the limitation of the answers provided which all seem to include recursion in some fashion. The particular use I've made here allows for up to a 32-character delimiter, but that could be increased however large it needs to be.
create FUNCTION [dbo].[Split] (@sep VARCHAR(32), @s VARCHAR(MAX))
RETURNS TABLE
AS
RETURN
(
SELECT r.value('.','VARCHAR(MAX)') as Item
FROM (SELECT CONVERT(XML, N'<root><r>' + REPLACE(REPLACE(REPLACE(@s,'& ','& '),'<','<'), @sep, '</r><r>') + '</r></root>') as valxml) x
CROSS APPLY x.valxml.nodes('//root/r') AS RECORDS(r)
)
Then you can invoke it using:
SELECT * FROM dbo.Split(' ', 'I hate bunnies')
Which returns:
-----------
|I |
|---------|
|hate |
|---------|
|bunnies |
-----------
CREATE FUNCTION [dbo].[Split] (@sep VARCHAR(32), @s VARCHAR(MAX))
RETURNS TABLE
AS
RETURN
(
SELECT r.value('.','VARCHAR(MAX)') as Item
FROM (SELECT CONVERT(XML, N'<root><r>' + REPLACE(@s, @sep, '</r><r>') + '</r></root>') as valxml) x
CROSS APPLY x.valxml.nodes('//root/r') AS RECORDS(r)
)
Slight modification of the solution above so it works with variable-length delimiters.
create FUNCTION dbo.fn_Split2 (@sep nvarchar(10), @s nvarchar(4000))
RETURNS table
AS
RETURN (
WITH Pieces(pn, start, stop) AS (
SELECT 1, 1, CHARINDEX(@sep, @s)
UNION ALL
SELECT pn + 1, stop + (datalength(@sep)/2), CHARINDEX(@sep, @s, stop + (datalength(@sep)/2))
FROM Pieces
WHERE stop > 0
)
SELECT pn,
SUBSTRING(@s, start, CASE WHEN stop > 0 THEN stop-start ELSE 4000 END) AS s
FROM Pieces
)
NB: I've used datalength() since len() reports incorrectly if there are trailing spaces.
Here's a Split
function that is compatible with SQL Server versions prior to 2005.
CREATE FUNCTION dbo.Split(@data nvarchar(4000), @delimiter nvarchar(100))
RETURNS @result table (Id int identity(1,1), Data nvarchar(4000))
AS
BEGIN
DECLARE @pos INT
DECLARE @start INT
DECLARE @len INT
DECLARE @end INT
SET @len = LEN('.' + @delimiter + '.') - 2
SET @end = LEN(@data) + 1
SET @start = 1
SET @pos = 0
WHILE (@pos < @end)
BEGIN
SET @pos = CHARINDEX(@delimiter, @data, @start)
IF (@pos = 0) SET @pos = @end
INSERT @result (data) SELECT SUBSTRING(@data, @start, @pos - @start)
SET @start = @pos + @len
END
RETURN
END
I wrote this awhile back. It assumes the delimiter is a comma and that the individual values aren't bigger than 127 characters. It could be modified pretty easily.
It has the benefit of not being limited to 4,000 characters.
Good luck!
ALTER Function [dbo].[SplitStr] (
@txt text
)
Returns @tmp Table
(
value varchar(127)
)
as
BEGIN
declare @str varchar(8000)
, @Beg int
, @last int
, @size int
set @size=datalength(@txt)
set @Beg=1
set @str=substring(@txt,@Beg,8000)
IF len(@str)<8000 set @Beg=@size
ELSE BEGIN
set @last=charindex(',', reverse(@str))
set @str=substring(@txt,@Beg,8000-@last)
set @Beg=@Beg+8000-@last+1
END
declare @workingString varchar(25)
, @stringindex int
while @Beg<=@size Begin
WHILE LEN(@str) > 0 BEGIN
SELECT @StringIndex = CHARINDEX(',', @str)
SELECT
@workingString = CASE
WHEN @StringIndex > 0 THEN SUBSTRING(@str, 1, @StringIndex-1)
ELSE @str
END
INSERT INTO
@tmp(value)
VALUES
(cast(rtrim(ltrim(@workingString)) as varchar(127)))
SELECT @str = CASE
WHEN CHARINDEX(',', @str) > 0 THEN SUBSTRING(@str, @StringIndex+1, LEN(@str))
ELSE ''
END
END
set @str=substring(@txt,@Beg,8000)
if @Beg=@size set @Beg=@Beg+1
else IF len(@str)<8000 set @Beg=@size
ELSE BEGIN
set @last=charindex(',', reverse(@str))
set @str=substring(@txt,@Beg,8000-@last)
set @Beg=@Beg+8000-@last+1
END
END
return
END
For the particular case of splitting strings into words I've come across another solution for SQL Server 2008.
with testTable AS
(
SELECT 1 AS Id, N'how now brown cow' AS txt UNION ALL
SELECT 2, N'she sells sea shells upon the sea shore' UNION ALL
SELECT 3, N'red lorry yellow lorry' UNION ALL
SELECT 4, N'the quick brown fox jumped over the lazy dog'
)
SELECT display_term, COUNT(*) As Cnt
FROM testTable
CROSS APPLY sys.dm_fts_parser('"' + txt + '"', 1033, 0,0)
GROUP BY display_term
HAVING COUNT(*) > 1
ORDER BY Cnt DESC
Returns
display_term Cnt
------------------------------ -----------
the 3
brown 2
lorry 2
sea 2
SELECT substring(commaSeparatedTags,0,charindex(',',commaSeparatedTags))
will give you the first tag. You can proceed similarly to get the second one and so on by combining substring and charindex one layer deeper each time. That's an immediate solution but it works only with very few tags as the query grows very quickly in size and becomes unreadable. Move on to functions then, as outlined in other, more sophisticated answers to this post.
I up-voted "Nathan Wheeler" answer as I found "Cade Roux" answer did not work above a certain string size.
Couple of points
-I found adding the DISTINCT keyword improved performance for me.
-Nathan's answer only works if your identifiers are 5 characters or less, of course you can adjust that...If the items you are splitting are INT identifiers as I am you can us same as me below:
CREATE FUNCTION [dbo].Split
(
@sep VARCHAR(32),
@s VARCHAR(MAX)
)
RETURNS
@result TABLE (
Id INT NULL
)
AS
BEGIN
DECLARE @xml XML
SET @XML = N'<root><r>' + REPLACE(@s, @sep, '</r><r>') + '</r></root>'
INSERT INTO @result(Id)
SELECT DISTINCT r.value('.','int') as Item
FROM @xml.nodes('//root//r') AS RECORDS(r)
RETURN
END
I use this function (SQL Server 2005 and above).
create function [dbo].[Split]
(
@string nvarchar(4000),
@delimiter nvarchar(10)
)
returns @table table
(
[Value] nvarchar(4000)
)
begin
declare @nextString nvarchar(4000)
declare @pos int, @nextPos int
set @nextString = ''
set @string = @string + @delimiter
set @pos = charindex(@delimiter, @string)
set @nextPos = 1
while (@pos <> 0)
begin
set @nextString = substring(@string, 1, @pos - 1)
insert into @table
(
[Value]
)
values
(
@nextString
)
set @string = substring(@string, @pos + len(@delimiter), len(@string))
set @nextPos = @pos
set @pos = charindex(@delimiter, @string)
end
return
end
I usually do this with the following code:
create function [dbo].[Split](@string varchar(max), @separator varchar(10))
returns @splited table ( stringPart varchar(max) )
with execute as caller
as
begin
declare @stringPart varchar(max);
set @stringPart = '';
while charindex(@separator, @string) > 0
begin
set @stringPart = substring(@string, 0, charindex(@separator, @string));
insert into @splited (stringPart) values (@stringPart);
set @string = substring(@string, charindex(@separator, @string) + len(@separator), len(@string) + 1);
end
return;
end
go
You can test it with this query:
declare @example varchar(max);
set @example = 'one;string;to;rule;them;all;;';
select * from [dbo].[Split](@example, ';');
Using CLR, here's a much simpler alternative that works in all cases, yet 40% faster than the accepted answer:
using System;
using System.Collections;
using System.Data.SqlTypes;
using System.Text.RegularExpressions;
using Microsoft.SqlServer.Server;
public class UDF
{
[SqlFunction(FillRowMethodName="FillRow")]
public static IEnumerable RegexSplit(SqlString s, SqlString delimiter)
{
return Regex.Split(s.Value, delimiter.Value);
}
public static void FillRow(object row, out SqlString str)
{
str = new SqlString((string) row);
}
}
Of course, it is still 8 times slower than PostgreSQL's regexp_split_to_table
.
I use this function (SQL Server 2005 and above).
create function [dbo].[Split]
(
@string nvarchar(4000),
@delimiter nvarchar(10)
)
returns @table table
(
[Value] nvarchar(4000)
)
begin
declare @nextString nvarchar(4000)
declare @pos int, @nextPos int
set @nextString = ''
set @string = @string + @delimiter
set @pos = charindex(@delimiter, @string)
set @nextPos = 1
while (@pos <> 0)
begin
set @nextString = substring(@string, 1, @pos - 1)
insert into @table
(
[Value]
)
values
(
@nextString
)
set @string = substring(@string, @pos + len(@delimiter), len(@string))
set @nextPos = @pos
set @pos = charindex(@delimiter, @string)
end
return
end
Slight modification of the solution above so it works with variable-length delimiters.
create FUNCTION dbo.fn_Split2 (@sep nvarchar(10), @s nvarchar(4000))
RETURNS table
AS
RETURN (
WITH Pieces(pn, start, stop) AS (
SELECT 1, 1, CHARINDEX(@sep, @s)
UNION ALL
SELECT pn + 1, stop + (datalength(@sep)/2), CHARINDEX(@sep, @s, stop + (datalength(@sep)/2))
FROM Pieces
WHERE stop > 0
)
SELECT pn,
SUBSTRING(@s, start, CASE WHEN stop > 0 THEN stop-start ELSE 4000 END) AS s
FROM Pieces
)
NB: I've used datalength() since len() reports incorrectly if there are trailing spaces.
I use this function (SQL Server 2005 and above).
create function [dbo].[Split]
(
@string nvarchar(4000),
@delimiter nvarchar(10)
)
returns @table table
(
[Value] nvarchar(4000)
)
begin
declare @nextString nvarchar(4000)
declare @pos int, @nextPos int
set @nextString = ''
set @string = @string + @delimiter
set @pos = charindex(@delimiter, @string)
set @nextPos = 1
while (@pos <> 0)
begin
set @nextString = substring(@string, 1, @pos - 1)
insert into @table
(
[Value]
)
values
(
@nextString
)
set @string = substring(@string, @pos + len(@delimiter), len(@string))
set @nextPos = @pos
set @pos = charindex(@delimiter, @string)
end
return
end
Here's a Split
function that is compatible with SQL Server versions prior to 2005.
CREATE FUNCTION dbo.Split(@data nvarchar(4000), @delimiter nvarchar(100))
RETURNS @result table (Id int identity(1,1), Data nvarchar(4000))
AS
BEGIN
DECLARE @pos INT
DECLARE @start INT
DECLARE @len INT
DECLARE @end INT
SET @len = LEN('.' + @delimiter + '.') - 2
SET @end = LEN(@data) + 1
SET @start = 1
SET @pos = 0
WHILE (@pos < @end)
BEGIN
SET @pos = CHARINDEX(@delimiter, @data, @start)
IF (@pos = 0) SET @pos = @end
INSERT @result (data) SELECT SUBSTRING(@data, @start, @pos - @start)
SET @start = @pos + @len
END
RETURN
END
SELECT substring(commaSeparatedTags,0,charindex(',',commaSeparatedTags))
will give you the first tag. You can proceed similarly to get the second one and so on by combining substring and charindex one layer deeper each time. That's an immediate solution but it works only with very few tags as the query grows very quickly in size and becomes unreadable. Move on to functions then, as outlined in other, more sophisticated answers to this post.
I wrote this awhile back. It assumes the delimiter is a comma and that the individual values aren't bigger than 127 characters. It could be modified pretty easily.
It has the benefit of not being limited to 4,000 characters.
Good luck!
ALTER Function [dbo].[SplitStr] (
@txt text
)
Returns @tmp Table
(
value varchar(127)
)
as
BEGIN
declare @str varchar(8000)
, @Beg int
, @last int
, @size int
set @size=datalength(@txt)
set @Beg=1
set @str=substring(@txt,@Beg,8000)
IF len(@str)<8000 set @Beg=@size
ELSE BEGIN
set @last=charindex(',', reverse(@str))
set @str=substring(@txt,@Beg,8000-@last)
set @Beg=@Beg+8000-@last+1
END
declare @workingString varchar(25)
, @stringindex int
while @Beg<=@size Begin
WHILE LEN(@str) > 0 BEGIN
SELECT @StringIndex = CHARINDEX(',', @str)
SELECT
@workingString = CASE
WHEN @StringIndex > 0 THEN SUBSTRING(@str, 1, @StringIndex-1)
ELSE @str
END
INSERT INTO
@tmp(value)
VALUES
(cast(rtrim(ltrim(@workingString)) as varchar(127)))
SELECT @str = CASE
WHEN CHARINDEX(',', @str) > 0 THEN SUBSTRING(@str, @StringIndex+1, LEN(@str))
ELSE ''
END
END
set @str=substring(@txt,@Beg,8000)
if @Beg=@size set @Beg=@Beg+1
else IF len(@str)<8000 set @Beg=@size
ELSE BEGIN
set @last=charindex(',', reverse(@str))
set @str=substring(@txt,@Beg,8000-@last)
set @Beg=@Beg+8000-@last+1
END
END
return
END
For the particular case of splitting strings into words I've come across another solution for SQL Server 2008.
with testTable AS
(
SELECT 1 AS Id, N'how now brown cow' AS txt UNION ALL
SELECT 2, N'she sells sea shells upon the sea shore' UNION ALL
SELECT 3, N'red lorry yellow lorry' UNION ALL
SELECT 4, N'the quick brown fox jumped over the lazy dog'
)
SELECT display_term, COUNT(*) As Cnt
FROM testTable
CROSS APPLY sys.dm_fts_parser('"' + txt + '"', 1033, 0,0)
GROUP BY display_term
HAVING COUNT(*) > 1
ORDER BY Cnt DESC
Returns
display_term Cnt
------------------------------ -----------
the 3
brown 2
lorry 2
sea 2
You can also achieve this effect using XML, as seen here, which removes the limitation of the answers provided which all seem to include recursion in some fashion. The particular use I've made here allows for up to a 32-character delimiter, but that could be increased however large it needs to be.
create FUNCTION [dbo].[Split] (@sep VARCHAR(32), @s VARCHAR(MAX))
RETURNS TABLE
AS
RETURN
(
SELECT r.value('.','VARCHAR(MAX)') as Item
FROM (SELECT CONVERT(XML, N'<root><r>' + REPLACE(REPLACE(REPLACE(@s,'& ','& '),'<','<'), @sep, '</r><r>') + '</r></root>') as valxml) x
CROSS APPLY x.valxml.nodes('//root/r') AS RECORDS(r)
)
Then you can invoke it using:
SELECT * FROM dbo.Split(' ', 'I hate bunnies')
Which returns:
-----------
|I |
|---------|
|hate |
|---------|
|bunnies |
-----------
CREATE FUNCTION [dbo].[Split] (@sep VARCHAR(32), @s VARCHAR(MAX))
RETURNS TABLE
AS
RETURN
(
SELECT r.value('.','VARCHAR(MAX)') as Item
FROM (SELECT CONVERT(XML, N'<root><r>' + REPLACE(@s, @sep, '</r><r>') + '</r></root>') as valxml) x
CROSS APPLY x.valxml.nodes('//root/r') AS RECORDS(r)
)
I wrote this awhile back. It assumes the delimiter is a comma and that the individual values aren't bigger than 127 characters. It could be modified pretty easily.
It has the benefit of not being limited to 4,000 characters.
Good luck!
ALTER Function [dbo].[SplitStr] (
@txt text
)
Returns @tmp Table
(
value varchar(127)
)
as
BEGIN
declare @str varchar(8000)
, @Beg int
, @last int
, @size int
set @size=datalength(@txt)
set @Beg=1
set @str=substring(@txt,@Beg,8000)
IF len(@str)<8000 set @Beg=@size
ELSE BEGIN
set @last=charindex(',', reverse(@str))
set @str=substring(@txt,@Beg,8000-@last)
set @Beg=@Beg+8000-@last+1
END
declare @workingString varchar(25)
, @stringindex int
while @Beg<=@size Begin
WHILE LEN(@str) > 0 BEGIN
SELECT @StringIndex = CHARINDEX(',', @str)
SELECT
@workingString = CASE
WHEN @StringIndex > 0 THEN SUBSTRING(@str, 1, @StringIndex-1)
ELSE @str
END
INSERT INTO
@tmp(value)
VALUES
(cast(rtrim(ltrim(@workingString)) as varchar(127)))
SELECT @str = CASE
WHEN CHARINDEX(',', @str) > 0 THEN SUBSTRING(@str, @StringIndex+1, LEN(@str))
ELSE ''
END
END
set @str=substring(@txt,@Beg,8000)
if @Beg=@size set @Beg=@Beg+1
else IF len(@str)<8000 set @Beg=@size
ELSE BEGIN
set @last=charindex(',', reverse(@str))
set @str=substring(@txt,@Beg,8000-@last)
set @Beg=@Beg+8000-@last+1
END
END
return
END
Source: Stackoverflow.com