T-SQL Deleting all duplicate rows but keeping one

Question

I have a table with a very large amount of rows  Duplicates are not allowed but due to a problem with how the rows were created I know there are some duplicates in this table  I need to eliminate the extra rows from the perspective of the key columns  Some other columns may have slightly different data but I do not care about that  I still need to keep one of these rows however  SELECT DISTINCT won t work because it operates on all columns and I need to suppress duplicates based on the key columns   How can I delete the extra rows but still keep one efficiently

User · Accepted Answer

You didn t say what version you were using  but in SQL 2005 and above  you can use a common table expression with the OVER Clause   It goes a little something like this   WITH cte AS     SELECT foo    bar         row number   OVER PARTITION BY foo  bar ORDER BY baz  AS  rn    FROM TABLE   DELETE cte WHERE  rn   gt  1   Play around with it and see what you get    Edit  In an attempt to be helpful  someone edited the ORDER BY clause within the CTE  To be clear  you can order by anything you want here  it needn t be one of the columns returned by the cte  In fact  a common use-case here is that  foo  bar  are the group identifier and  baz  is some sort of time stamp  In order to keep the latest  you d do ORDER BY baz desc

User · Answer

Example query   DELETE FROM Table WHERE ID NOT IN   SELECT MIN ID  FROM Table GROUP BY Field1  Field2  Field3          Here fields are column on which you want to group the duplicate rows

User · Answer

Here s my twist on it  with a runnable example  Note this will only work in the situation where Id is unique  and you have duplicate values in other columns   DECLARE  SampleData AS TABLE  Id int  Duplicate varchar 20    INSERT INTO  SampleData SELECT 1   ABC  UNION ALL SELECT 2   ABC  UNION ALL SELECT 3   LMN  UNION ALL SELECT 4   XYZ  UNION ALL SELECT 5   XYZ   DELETE FROM  SampleData WHERE Id IN       SELECT Id FROM           SELECT              Id              ROW NUMBER   OVER  PARTITION BY  Duplicate  ORDER BY Id  AS  ItemNumber              -- Change the partition columns to include the ones that make the row distinct         FROM               SampleData       a WHERE ItemNumber  gt  1 -- Keep only the first unique item    SELECT   FROM  SampleData   And the results   Id          Duplicate ----------- --------- 1           ABC 3           LMN 4           XYZ   Not sure why that s what I thought of first    definitely not the simplest way to go but it works

[sql] T-SQL: Deleting all duplicate rows but keeping one

Examples related to sql

Examples related to sql-server

Examples related to tsql