[sql] Find duplicate records in a table using SQL Server

I am validating a table which has a transaction level data of an eCommerce site and find the exact errors.

I want your help to find duplicate records in a 50 column table on SQL Server.

Suppose my data is:

OrderNo shoppername amountpayed city Item       
1       Sam         10          A    Iphone
1       Sam         10          A    Iphone--->>Duplication to be detected
1       Sam         5           A    Ipod
2       John        20          B    Macbook
3       John        25          B    Macbookair
4       Jack        5           A    Ipod

Suppose I use the below query:

Select shoppername,count(*) as cnt
from dbo.sales
having count(*) > 1
group by shoppername

will return me

Sam  2
John 2

But I don't want to find duplicate just over 1 or 2 columns. I want to find the duplicate over all the columns together in my data. I want the result as:

1       Sam         10          A    Iphone

This question is related to sql sql-server sql-server-2005

The answer is


Try this instead

SELECT MAX(shoppername), COUNT(*) AS cnt
FROM dbo.sales
GROUP BY CHECKSUM(*)
HAVING COUNT(*) > 1

Read about the CHECKSUM function first, as there can be duplicates.


SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as cnt
FROM dbo.sales
GROUP BY OrderNo, shoppername, amountPayed, city, item
HAVING COUNT(*) > 1

Try this

with T1 AS
(
SELECT LASTNAME, COUNT(1) AS 'COUNT' FROM Employees GROUP BY LastName HAVING  COUNT(1) > 1
)
SELECT E.*,T1.[COUNT] FROM Employees E INNER JOIN T1 ON T1.LastName = E.LastName

Select EventID,count() as cnt from dbo.EventInstances group by EventID having count() > 1


The following is running code:

SELECT abnno, COUNT(abnno)
FROM tbl_Name
GROUP BY abnno
HAVING ( COUNT(abnno) > 1 )

with x as   (select  *,rn = row_number()
            over(PARTITION BY OrderNo,item  order by OrderNo)
            from    #temp1)

select * from x
where rn > 1

you can remove duplicates by replacing select statement by

delete x where rn > 1

First of all, I doubt that the result it not accurate? Seem like there are Three 'Sam' from the original table. But it is not critical to the question.

Then here we come for the question itself. Based on your table, the best way to show duplicate value is to use count(*) and Group by clause. The query would look like this

SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as RepeatTimes FROM dbo.sales GROUP BY OrderNo, shoppername, amountPayed, city, item HAVING COUNT(*) > 1

The reason is that all columns together from your table uniquely identified each record, which means the records will be considered as duplicate only when all values from each column are exactly the same, also you want to show all fields for duplicate records, so the group by will not miss any column, otherwise yes because you can only select columns that participate in the 'group by' clause.

Now I would like to give you any example for With...Row_Number()Over(...), which is using table expression together with Row_Number function.

Suppose you have a nearly same table but with one extra column called Shipping Date, and the value may change even the rest are the same. Here it is:

OrderNo shoppername amountpayed city Item Shipping Date
1 Sam 10 A Iphone 2016-01-01 1 Sam 10 A Iphone 2016-02-02 1 Sam 5 A Ipod 2016-03-03 2 John 20 B Macbook 2016-04-04 3 John 25 B Macbookair 2016-05-05 4 Jack 5 A Ipod 2016-06-06

Notice that row# 2 is not a duplicate one if you still take all columns as a unit. But what if you want to treat them as duplicate as well in this case? You should use With...Row_Number()Over(...), and the query would look like this:

WITH TABLEEXPRESSION AS (SELECT *,ROW_NUMBER() OVER (PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier) --if you consider the one with late shipping date as the duplicate FROM dbo.sales) SELECT * FROM TABLEEXPRESSION WHERE Identifier !=1 --or use '>1'

The above query will give result together with Shipping Date, for example:

OrderNo shoppername amountpayed city Item Shipping Date Identifier 1 Sam 10 A Iphone 2016-02-02 2

Note this one is different from the one with 2016-01-01, and the reason why 2016-02-02 has been filtered out is PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier, and Shipping Date is NOT one of the column that need to be took care of for duplicate records, which means the one with 2016-02-02 still could be a perfect result for your question.

Now summarize it little bit, using count(*) and Group by clause together is the best choice when you only want to show all columns from Group byclause as the result, otherwise you will miss the columns that do not participate in group by.

While For With...Row_Number()Over(...), it is suitable in every scenario that you want to find duplicate records, however, it is little bit complicated to write the query and little bit over engineered compared to the former one.

If your purpose is to delete duplicate records from table, you have to use the later WITH...ROW_NUMBER()OVER(...)...DELETE FROM...WHERE one.

Hope this helps!


You can use below methods to find the output

 with Ctec AS
 (
select *,Row_number() over(partition by name order by Name)Rnk
 from Table_A
)
select  Name from ctec
where rnk>1

select name from Table_A
 group by name
 having count(*)>1

SQL> SELECT JOB,COUNT(JOB) FROM EMP GROUP BY JOB;

JOB       COUNT(JOB)
--------- ----------
ANALYST            2
CLERK              4
MANAGER            3
PRESIDENT          1
SALESMAN           4

Select * from dbo.sales group by shoppername having(count(Item) > 1)


with x as (
select shoppername,count(shoppername)
              from sales
              having count(shoppername)>1
            group by shoppername)
select t.* from x,win_gp_pin1510 t
where x.shoppername=t.shoppername
order by t.shoppername

Just add all fields to the query and remember to add them to Group By as well.

Select shoppername, a, b, amountpayed, item, count(*) as cnt
from dbo.sales
group by shoppername, a, b, amountpayed, item
having count(*) > 1

To get the list of multiple records use following command

select field1,field2,field3, count(*)
  from table_name
  group by field1,field2,field3
  having count(*) > 1

Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to sql-server

Passing multiple values for same variable in stored procedure SQL permissions for roles Count the Number of Tables in a SQL Server Database Visual Studio 2017 does not have Business Intelligence Integration Services/Projects ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database How to create temp table using Create statement in SQL Server? SQL Query Where Date = Today Minus 7 Days How do I pass a list as a parameter in a stored procedure? SQL Server date format yyyymmdd

Examples related to sql-server-2005

Add a row number to result set of a SQL query SQL Server : Transpose rows to columns Select info from table where row has max date How to query for Xml values and attributes from table in SQL Server? How to restore SQL Server 2014 backup in SQL Server 2008 SQL Server 2005 Using CHARINDEX() To split a string Is it necessary to use # for creating temp tables in SQL server? SQL Query to find the last day of the month JDBC connection to MSSQL server in windows authentication mode How to convert the system date format to dd/mm/yy in SQL Server 2008 R2?