[sql] Select statement to find duplicates on certain fields

Can you help me with SQL statements to find duplicates on multiple fields?

For example, in pseudo code:

select count(field1,field2,field3) 
from table 
where the combination of field1, field2, field3 occurs multiple times

and from the above statement if there are multiple occurrences I would like to select every record except the first one.

This question is related to sql sql-server tsql sql-server-2008

The answer is


You mention "the first one", so I assume that you have some kind of ordering on your data. Let's assume that your data is ordered by some field ID.

This SQL should get you the duplicate entries except for the first one. It basically selects all rows for which another row with (a) the same fields and (b) a lower ID exists. Performance won't be great, but it might solve your problem.

SELECT A.ID, A.field1, A.field2, A.field3
  FROM myTable A
 WHERE EXISTS (SELECT B.ID
                 FROM myTable B
                WHERE B.field1 = A.field1
                  AND B.field2 = A.field2
                  AND B.field3 = A.field3
                  AND B.ID < A.ID)

Try this query to find duplicate records on multiple fields

SELECT a.column1, a.column2
FROM dbo.a a
JOIN (SELECT column1, 
       column2, count(*) as countC
FROM dbo.a 
GROUP BY column4, column5
HAVING count(*) > 1 ) b
ON a.column1 = b.column1
AND a.column2 = b.column2

This is a fun solution with SQL Server 2005 that I like. I'm going to assume that by "for every record except for the first one", you mean that there is another "id" column that we can use to identify which row is "first".

SELECT id
    , field1
    , field2
    , field3
FROM
(
    SELECT id
        , field1
        , field2
        , field3
        , RANK() OVER (PARTITION BY field1, field2, field3 ORDER BY id ASC) AS [rank]
    FROM table_name
) a
WHERE [rank] > 1

CREATE TABLE #tmp
(
    sizeId Varchar(MAX)
)

INSERT  #tmp 
    VALUES ('44'),
        ('44,45,46'),
        ('44,45,46'),
        ('44,45,46'),
        ('44,45,46'),
        ('44,45,46'),
        ('44,45,46')


SELECT * FROM #tmp
DECLARE @SqlStr VARCHAR(MAX)

SELECT @SqlStr = STUFF((SELECT ',' + sizeId
              FROM #tmp
              ORDER BY sizeId
              FOR XML PATH('')), 1, 1, '') 


SELECT TOP 1 * FROM (
select items, count(*)AS Occurrence
  FROM dbo.Split(@SqlStr,',')
  group by items
  having count(*) > 1
  )K
  ORDER BY K.Occurrence DESC    

To see duplicate values:

with MYCTE  as (
    select row_number() over ( partition by name  order by name) rown, *
    from tmptest  
    ) 
select * from MYCTE where rown <=1

try this query to have sepratley count of each SELECT statements :

select field1,count(field1) as field1Count,field2,count(field2) as field2Counts,field3, count(field3) as field3Counts
from table_name
group by field1,field2,field3
having count(*) > 1

If you're using SQL Server 2005 or later (and the tags for your question indicate SQL Server 2008), you can use ranking functions to return the duplicate records after the first one if using joins is less desirable or impractical for some reason. The following example shows this in action, where it also works with null values in the columns examined.

create table Table1 (
 Field1 int,
 Field2 int,
 Field3 int,
 Field4 int 
)

insert  Table1 
values    (1,1,1,1)
        , (1,1,1,2)
        , (1,1,1,3)
        , (2,2,2,1)
        , (3,3,3,1)
        , (3,3,3,2)
        , (null, null, 2, 1)
        , (null, null, 2, 3)

select    *
from     (select      Field1
                    , Field2
                    , Field3
                    , Field4
                    , row_number() over (partition by   Field1
                                                      , Field2
                                                      , Field3
                                         order by       Field4) as occurrence
          from      Table1) x
where     occurrence > 1

Notice after running this example that the first record out of every "group" is excluded, and that records with null values are handled properly.

If you don't have a column available to order the records within a group, you can use the partition-by columns as the order-by columns.


Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to sql-server

Passing multiple values for same variable in stored procedure SQL permissions for roles Count the Number of Tables in a SQL Server Database Visual Studio 2017 does not have Business Intelligence Integration Services/Projects ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database How to create temp table using Create statement in SQL Server? SQL Query Where Date = Today Minus 7 Days How do I pass a list as a parameter in a stored procedure? SQL Server date format yyyymmdd

Examples related to tsql

Passing multiple values for same variable in stored procedure Count the Number of Tables in a SQL Server Database Change Date Format(DD/MM/YYYY) in SQL SELECT Statement Stored procedure with default parameters Format number as percent in MS SQL Server EXEC sp_executesql with multiple parameters SQL Server after update trigger How to compare datetime with only date in SQL Server Text was truncated or one or more characters had no match in the target code page including the primary key in an unpivot Printing integer variable and string on same line in SQL

Examples related to sql-server-2008

Violation of PRIMARY KEY constraint. Cannot insert duplicate key in object How to Use Multiple Columns in Partition By And Ensure No Duplicate Row is Returned SQL Server : How to test if a string has only digit characters Conversion of a varchar data type to a datetime data type resulted in an out-of-range value in SQL query Get last 30 day records from today date in SQL Server How to subtract 30 days from the current date using SQL Server Calculate time difference in minutes in SQL Server SQL Connection Error: System.Data.SqlClient.SqlException (0x80131904) SQL Server Service not available in service list after installation of SQL Server Management Studio How to delete large data of table in SQL without log?