[sql] Optimal way to concatenate/aggregate strings

I'm finding a way to aggregate strings from different rows into a single row. I'm looking to do this in many different places, so having a function to facilitate this would be nice. I've tried solutions using COALESCE and FOR XML, but they just don't cut it for me.

String aggregation would do something like this:

id | Name                    Result: id | Names
-- - ----                            -- - -----
1  | Matt                            1  | Matt, Rocks
1  | Rocks                           2  | Stylus
2  | Stylus

I've taken a look at CLR-defined aggregate functions as a replacement for COALESCE and FOR XML, but apparently SQL Azure does not support CLR-defined stuff, which is a pain for me because I know being able to use it would solve a whole lot of problems for me.

Is there any possible workaround, or similarly optimal method (which might not be as optimal as CLR, but hey I'll take what I can get) that I can use to aggregate my stuff?

The answer is


Update: Ms SQL Server 2017+, Azure SQL Database

You can use: STRING_AGG.

Usage is pretty simple for OP's request:

SELECT id, STRING_AGG(name, ', ') AS names
FROM some_table
GROUP BY id

Read More

Well my old non-answer got rightfully deleted (left in-tact below), but if anyone happens to land here in the future, there is good news. They have implimented STRING_AGG() in Azure SQL Database as well. That should provide the exact functionality originally requested in this post with native and built in support. @hrobky mentioned this previously as a SQL Server 2016 feature at the time.

--- Old Post: Not enough reputation here to reply to @hrobky directly, but STRING_AGG looks great, however it is only available in SQL Server 2016 vNext currently. Hopefully it will follow to Azure SQL Datababse soon as well..


Are methods using FOR XML PATH like below really that slow? Itzik Ben-Gan writes that this method has good performance in his T-SQL Querying book (Mr. Ben-Gan is a trustworthy source, in my view).

create table #t (id int, name varchar(20))

insert into #t
values (1, 'Matt'), (1, 'Rocks'), (2, 'Stylus')

select  id
        ,Names = stuff((select ', ' + name as [text()]
        from #t xt
        where xt.id = t.id
        for xml path('')), 1, 2, '')
from #t t
group by id

You can use += to concatenate strings, for example:

declare @test nvarchar(max)
set @test = ''
select @test += name from names

if you select @test, it will give you all names concatenated


Although @serge answer is correct but i compared time consumption of his way against xmlpath and i found the xmlpath is so faster. I'll write the compare code and you can check it by yourself. This is @serge way:

DECLARE @startTime datetime2;
DECLARE @endTime datetime2;
DECLARE @counter INT;
SET @counter = 1;

set nocount on;

declare @YourTable table (ID int, Name nvarchar(50))

WHILE @counter < 1000
BEGIN
    insert into @YourTable VALUES (ROUND(@counter/10,0), CONVERT(NVARCHAR(50), @counter) + 'CC')
    SET @counter = @counter + 1;
END

SET @startTime = GETDATE()

;WITH Partitioned AS
(
    SELECT 
        ID,
        Name,
        ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Name) AS NameNumber,
        COUNT(*) OVER (PARTITION BY ID) AS NameCount
    FROM @YourTable
),
Concatenated AS
(
    SELECT ID, CAST(Name AS nvarchar) AS FullName, Name, NameNumber, NameCount FROM Partitioned WHERE NameNumber = 1

    UNION ALL

    SELECT 
        P.ID, CAST(C.FullName + ', ' + P.Name AS nvarchar), P.Name, P.NameNumber, P.NameCount
    FROM Partitioned AS P
        INNER JOIN Concatenated AS C ON P.ID = C.ID AND P.NameNumber = C.NameNumber + 1
)
SELECT 
    ID,
    FullName
FROM Concatenated
WHERE NameNumber = NameCount

SET @endTime = GETDATE();

SELECT DATEDIFF(millisecond,@startTime, @endTime)
--Take about 54 milliseconds

And this is xmlpath way:

DECLARE @startTime datetime2;
DECLARE @endTime datetime2;
DECLARE @counter INT;
SET @counter = 1;

set nocount on;

declare @YourTable table (RowID int, HeaderValue int, ChildValue varchar(5))

WHILE @counter < 1000
BEGIN
    insert into @YourTable VALUES (@counter, ROUND(@counter/10,0), CONVERT(NVARCHAR(50), @counter) + 'CC')
    SET @counter = @counter + 1;
END

SET @startTime = GETDATE();

set nocount off
SELECT
    t1.HeaderValue
        ,STUFF(
                   (SELECT
                        ', ' + t2.ChildValue
                        FROM @YourTable t2
                        WHERE t1.HeaderValue=t2.HeaderValue
                        ORDER BY t2.ChildValue
                        FOR XML PATH(''), TYPE
                   ).value('.','varchar(max)')
                   ,1,2, ''
              ) AS ChildValues
    FROM @YourTable t1
    GROUP BY t1.HeaderValue

SET @endTime = GETDATE();

SELECT DATEDIFF(millisecond,@startTime, @endTime)
--Take about 4 milliseconds

I found Serge's answer to be very promising, but I also encountered performance issues with it as-written. However, when I restructured it to use temporary tables and not include double CTE tables, the performance went from 1 minute 40 seconds to sub-second for 1000 combined records. Here it is for anyone who needs to do this without FOR XML on older versions of SQL Server:

DECLARE @STRUCTURED_VALUES TABLE (
     ID                 INT
    ,VALUE              VARCHAR(MAX) NULL
    ,VALUENUMBER        BIGINT
    ,VALUECOUNT         INT
);

INSERT INTO @STRUCTURED_VALUES
SELECT   ID
        ,VALUE
        ,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY VALUE) AS VALUENUMBER
        ,COUNT(*) OVER (PARTITION BY ID)    AS VALUECOUNT
FROM    RAW_VALUES_TABLE;

WITH CTE AS (
    SELECT   SV.ID
            ,SV.VALUE
            ,SV.VALUENUMBER
            ,SV.VALUECOUNT
    FROM    @STRUCTURED_VALUES SV
    WHERE   VALUENUMBER = 1

    UNION ALL

    SELECT   SV.ID
            ,CTE.VALUE + ' ' + SV.VALUE AS VALUE
            ,SV.VALUENUMBER
            ,SV.VALUECOUNT
    FROM    @STRUCTURED_VALUES SV
    JOIN    CTE 
        ON  SV.ID = CTE.ID
        AND SV.VALUENUMBER = CTE.VALUENUMBER + 1

)
SELECT   ID
        ,VALUE
FROM    CTE
WHERE   VALUENUMBER = VALUECOUNT
ORDER BY ID
;

For those of us who found this and are not using Azure SQL Database:

STRING_AGG() in PostgreSQL, SQL Server 2017 and Azure SQL
https://www.postgresql.org/docs/current/static/functions-aggregate.html
https://docs.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql

GROUP_CONCAT() in MySQL
http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_group-concat

(Thanks to @Brianjorden and @milanio for Azure update)

Example Code:

select Id
, STRING_AGG(Name, ', ') Names 
from Demo
group by Id

SQL Fiddle: http://sqlfiddle.com/#!18/89251/1


Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to sql-server

Passing multiple values for same variable in stored procedure SQL permissions for roles Count the Number of Tables in a SQL Server Database Visual Studio 2017 does not have Business Intelligence Integration Services/Projects ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database How to create temp table using Create statement in SQL Server? SQL Query Where Date = Today Minus 7 Days How do I pass a list as a parameter in a stored procedure? SQL Server date format yyyymmdd

Examples related to azure

How is VIP swapping + CNAMEs better than IP swapping + A records? 'Connect-MsolService' is not recognized as the name of a cmdlet Install-Module : The term 'Install-Module' is not recognized as the name of a cmdlet Could not load file or assembly "System.Net.Http, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a" How to view the roles and permissions granted to any database user in Azure SQL server instance? How to get the azure account tenant Id? Azure SQL Database "DTU percentage" metric Could not load file or assembly System.Web.Http.WebHost after published to Azure web site How to re-create database for Entity Framework? How do I create a new user in a SQL Azure database?

Examples related to aggregate-functions

Spark SQL: apply aggregate functions to a list of columns GROUP BY without aggregate function GROUP BY + CASE statement must appear in the GROUP BY clause or be used in an aggregate function Naming returned columns in Pandas aggregate function? Concatenate multiple result rows of one column into one, group by another column How to include "zero" / "0" results in COUNT aggregate? Apply multiple functions to multiple groupby columns Reason for Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause Optimal way to concatenate/aggregate strings

Examples related to azure-sql-database

How to view the roles and permissions granted to any database user in Azure SQL server instance? Azure SQL Database "DTU percentage" metric Reset identity seed after deleting records in SQL Server How to re-create database for Entity Framework? The SELECT permission was denied on the object 'Users', database 'XXX', schema 'dbo' How do I create a new user in a SQL Azure database? How to emulate a BEFORE INSERT trigger in T-SQL / SQL Server for super/subtype (Inheritance) entities? Optimal way to concatenate/aggregate strings How do I copy SQL Azure database to my local development server?