How to delete duplicate rows in SQL Server

Question

How can I delete duplicate rows where no unique row id exists   My table is  col1  col2 col3 col4 col5 col6 col7 john  1    1    1    1    1    1  john  1    1    1    1    1    1 sally 2    2    2    2    2    2 sally 2    2    2    2    2    2   I want to be left with the following after the duplicate removal   john  1    1    1    1    1    1 sally 2    2    2    2    2    2   I ve tried a few queries but I think they depend on having a row id as I don t get the desired result  For example   DELETE FROM table WHERE col1 IN       SELECT id     FROM table     GROUP BY id     HAVING  COUNT col1   gt  1

User · Answer

should work equally as in other SQL servers  like Postgres  DELETE FROM table WHERE id NOT IN      select min id  from table    group by col1  col2  col3  col4  col5  col6  col7

User · Answer

Deleting duplicates from a huge several millions of records  table might take long time   I suggest that you do a bulk insert into a temp table of the selected rows rather than deleting    --REWRITING YOUR CODE TAKE NOTE OF THE 3RD LINE  WITH CTE AS SELECT NAME ROW NUMBER    OVER  PARTITION BY NAME ORDER BY NAME  ID FROM  TB  SELECT   INTO  unique records FROM  CTE WHERE ID  1

User · Answer

After trying the suggested solution above  that works for small medium tables   I can suggest that solution for very large tables  since it runs in iterations    Drop all dependency views of the LargeSourceTable you can find the dependecies by using sql managment studio  right click on the table and click  View Dependencies   Rename the table   sp rename  LargeSourceTable    LargeSourceTable Temp   GO Create the LargeSourceTable again  but now  add a primary key with all the columns that define the duplications add WITH  IGNORE DUP KEY   ON  For example     CREATE TABLE  dbo   LargeSourceTable                ID int IDENTITY 1 1            CreateDate         DATETIME CONSTRAINT  DF LargeSourceTable CreateDate  DEFAULT  getdate    NOT NULL           Column1     CHAR  36       NOT NULL           Column2   NVARCHAR  100  NOT NULL           Column3         CHAR  36       NOT NULL          PRIMARY KEY  Column1  Column2  WITH  IGNORE DUP KEY   ON             GO Create again the views that you dropped in the first place for the new created table Now  Run the following sql script  you will see the results in 1 000 000 rows per page  you can change the row number per page to see the results more often  Note  that I set the IDENTITY INSERT on and off because one the columns contains auto incremental id  which I m also copying   SET IDENTITY INSERT LargeSourceTable ON DECLARE  PageNumber AS INT   RowspPage AS INT DECLARE  TotalRows AS INT  declare  dt varchar 19  SET  PageNumber   0 SET  RowspPage   1000000  select  TotalRows   count     from LargeSourceTable TEMP  While    PageNumber - 1     RowspPage  lt   TotalRows   Begin     begin transaction tran inner           with cte as                       SELECT   FROM LargeSourceTable TEMP ORDER BY ID             OFFSET    PageNumber     RowspPage  ROWS             FETCH NEXT  RowspPage ROWS ONLY                    INSERT INTO LargeSourceTable                         ID                                    CreateDate                       Column1                   Column2                 Column3                                  select               ID                                    CreateDate                       Column1                   Column2                 Column3                 from cte      commit transaction tran inner      PRINT  Page      convert varchar 10    PageNumber      PRINT  Transfered      convert varchar 20    PageNumber    RowspPage      PRINT  Of      convert varchar 20    TotalRows       SELECT  dt   convert varchar 19   getdate    121      RAISERROR  Inserted on   s   0  1   dt  WITH NOWAIT     SET  PageNumber    PageNumber   1 End   SET IDENTITY INSERT LargeSourceTable OFF

User · Answer

This might help in your case DELETE t1 FROM table t1 INNER JOIN table t2 WHERE t1 id  gt  t2 id AND t1 col1   t2 col1

User · Answer

Without using CTE and ROW NUMBER   you can just delete the records just by using group by with MAX function here is and example  DELETE FROM MyDuplicateTable WHERE ID NOT IN   SELECT MAX ID  FROM MyDuplicateTable GROUP BY DuplicateColumn1  DuplicateColumn2  DuplicateColumn3

User · Answer

with myCTE as    select productName ROW NUMBER   over PARTITION BY productName order by slno  as Duplicate from productDetails   Delete from myCTE where Duplicate gt 1

User · Answer

If you have the ability to add a column to the table temporarily  this was a solution that worked for me   ALTER TABLE dbo DUPPEDTABLE ADD RowID INT NOT NULL IDENTITY 1 1    Then perform a DELETE using a combination of MIN and GROUP BY  DELETE b FROM dbo DUPPEDTABLE b WHERE b RowID NOT IN                        SELECT MIN RowID  AS RowID                      FROM dbo DUPPEDTABLE a WITH  NOLOCK                       GROUP BY a ITEM NUMBER                                a CHARACTERISTIC                                a INTVALUE                                a FLOATVALUE                                a STRINGVALUE                       Verify that the DELETE performed correctly   SELECT a ITEM NUMBER      a CHARACTERISTIC      a INTVALUE      a FLOATVALUE      a STRINGVALUE  COUNT   --MIN RowID  AS RowID FROM dbo DUPPEDTABLE a WITH  NOLOCK  GROUP BY a ITEM NUMBER      a CHARACTERISTIC      a INTVALUE      a FLOATVALUE      a STRINGVALUE ORDER BY COUNT    DESC    The result should have no rows with a count greater than 1   Finally  remove the rowid column   ALTER TABLE dbo DUPPEDTABLE DROP COLUMN RowID

User · Answer

-- this query will keep only one instance of a duplicate record   WITH cte      AS  SELECT ROW NUMBER   OVER  PARTITION BY col1  col2  col3-- based on what  --can be multiple columns                                        ORDER BY   SELECT 0   RN          FROM   Mytable     delete  FROM cte WHERE  RN  gt  1

User · Answer

Another way of removing dublicated rows without loosing information in one step is like following   delete from dublicated table t1  nolock  join       select t2 dublicated field       min len t2 field kept   as min field kept     from dublicated table t2  nolock      group by t2 dublicated field having COUNT    gt 1   t3  on t1 dublicated field t3 dublicated field      and len t1 field kept  t3 min field kept

User · Answer

Please see the below way of deletion too   Declare  table table  col1 varchar 10  col2 int col3 int  col4 int  col5 int  col6 int  col7 int  Insert into  table values    john  1 1 1 1 1 1     john  1 1 1 1 1 1     sally  2 2 2 2 2 2     sally  2 2 2 2 2 2    Created a sample table named  table and loaded it with given data     Delete  aliasName from   Select             ROW NUMBER   over  Partition by col1 col2 col3 col4 col5 col6 col7 order by col1  as rowNumber From     table  aliasName  Where   rowNumber  gt  1  Select   from  table     Note  If you are giving all columns in the Partition by part  then order by do not have much significance   I know  the question is asked three years ago  and my answer is another version of what Tim has posted  But posting just incase it is helpful for anyone

User · Answer

DELETE FROM TBL1  WHERE ID  IN  SELECT ID FROM TBL1  a WHERE ID    select MAX ID  from TBL1  where DUPVAL a DUPVAL  group by DUPVAL having count DUPVAL  gt 1

User · Answer

You need to group by the duplicate records according to the field s   then hold one of the records and delete the rest  For example   DELETE prg Person WHERE Id IN   SELECT dublicateRow Id FROM   select MIN Id  MinId  NationalCode  from  prg Person group by NationalCode  having count NationalCode    gt  1    GroupSelect  JOIN  prg Person dublicateRow ON dublicateRow NationalCode   GroupSelect NationalCode   WHERE dublicateRow Id  lt  gt  GroupSelect MinId

User · Answer

Oh wow  i feel so stupid by ready all this answers  they are like experts  answer with all CTE and temp table and etc    And all I did to get it working was simply aggregated the ID column by using MAX   DELETE FROM table WHERE col1 IN       SELECT MAX id  FROM table GROUP BY id HAVING   COUNT col1   gt  1       NOTE  you might need to run it multiple time to remove duplicate as this will only delete one set of duplicate rows at a time

User · Answer

With reference to https   support microsoft com en-us help 139444 how-to-remove-duplicate-rows-from-a-table-in-sql-server  The idea of removing duplicate involves    a  Protecting those rows that are not duplicate b  Retain one of the many rows that qualified together as duplicate    Step-by-step   1  First identify the rows those satisfy the definition of duplicate and insert them into temp table  say  tableAll   2  Select non-duplicate single-rows  or distinct rows into temp table say  tableUnique  3  Delete from source table joining  tableAll to delete the duplicates  4  Insert into source table all the rows from  tableUnique  5  Drop  tableAll and  tableUnique

User · Answer

Try to Use    SELECT linkorder      Row Number   OVER           PARTITION BY linkorder ORDER BY linkorder DESC           AS RowNum FROM u links

User · Answer

DECLARE  TB TABLE NAME VARCHAR 100    INSERT INTO  TB VALUES   Red     Red     Green     Blue     White     White   --  Delete by Rank    WITH CTE AS SELECT NAME DENSE RANK   OVER  PARTITION BY NAME ORDER BY NEWID    ID FROM  TB  DELETE FROM CTE WHERE ID gt 1 SELECT NAME FROM  TB  --  Delete by Row Number     WITH CTE AS SELECT NAME ROW NUMBER   OVER  PARTITION BY NAME ORDER BY NAME  ID FROM  TB  DELETE FROM CTE WHERE ID gt 1  SELECT NAME FROM  TB

User · Answer

I would prefer CTE for deleting duplicate rows from sql server table  strongly recommend to follow this article   http   codaffection com sql-server-article delete-duplicate-rows-in-sql-server      by keeping original   WITH CTE AS   SELECT   ROW NUMBER   OVER  PARTITION BY col1 col2 col3 ORDER BY col1 col2 col3  AS RN FROM MyTable    DELETE FROM CTE WHERE RN lt  gt 1      without keeping original   WITH CTE AS  SELECT   R RANK   OVER  ORDER BY col1 col2 col3  FROM MyTable     DELETE CTE WHERE R IN  SELECT R FROM CTE GROUP BY R HAVING COUNT    gt 1

User · Answer

Microsoft has a vey ry neat guide on how to remove duplicates  Check out  http   support microsoft com kb 139444  In brief  here is the easiest way to delete duplicates when you have just a few rows to delete   SET rowcount 1  DELETE FROM t1 WHERE myprimarykey 1    myprimarykey is the identifier for the row   I set rowcount to 1 because I only had two rows that were duplicated  If I had had 3 rows duplicated then I would have set rowcount to 2 so that it deletes the first two that it sees and only leaves one in table t1   Hope it helps anyone

User · Answer

If you have no references  like foreign keys  you can do this   I do it a lot when testing proofs of concept and the test data gets duplicated  SELECT DISTINCT  col1   col2   col3   col4   col5   col6   col7   INTO  newTable   FROM  oldTable   Go into the object explorer and delete the old table  Rename the new table with the old table s name

User · Answer

There are two solutions in mysql   A  Delete duplicate rows using DELETE JOIN statement  DELETE t1 FROM contacts t1 INNER JOIN contacts t2  WHERE      t1 id  lt  t2 id AND      t1 email   t2 email    This query references the contacts table twice  therefore  it uses the table alias t1 and t2   The output is      1   Query OK  4 rows affected  0 10 sec    In case you want to delete duplicate rows and keep the lowest id  you can use the following statement   DELETE c1 FROM contacts c1 INNER JOIN contacts c2  WHERE     c1 id  gt  c2 id AND      c1 email   c2 email     nbsp   nbsp   B  Delete duplicate rows using an intermediate table  The following shows the steps for removing duplicate rows using an intermediate table    nbsp   nbsp  1  Create a new table with the structure the same as the original table that you want to delete duplicate rows    nbsp   nbsp  2  Insert distinct rows from the original table to the immediate table    nbsp   nbsp  3  Insert distinct rows from the original table to the immediate table    nbsp   Step 1  Create a new table whose structure is the same as the original table   CREATE TABLE source copy LIKE source    Step 2  Insert distinct rows from the original table to the new table   INSERT INTO source copy SELECT   FROM source GROUP BY col  -- column that has duplicate values   Step 3  drop the original table and rename the immediate table to the original one  DROP TABLE source  ALTER TABLE source copy RENAME TO source    Source  http   www mysqltutorial org mysql-delete-duplicate-rows

User · Answer

I like CTEs and ROW NUMBER as the two combined allow us to see which rows are deleted  or updated   therefore just change the DELETE FROM CTE    to SELECT   FROM CTE   WITH CTE AS     SELECT  col1    col2    col3    col4    col5    col6    col7          RN   ROW NUMBER  OVER PARTITION BY col1 ORDER BY col1     FROM dbo Table1   DELETE FROM CTE WHERE RN  gt  1   DEMO   result is different  I assume that it s due to a typo on your part   COL1    COL2    COL3    COL4    COL5    COL6    COL7 john    1        1       1       1       1       1 sally   2        2       2       2       2       2   This example determines duplicates by a single column col1 because of the PARTITION BY col1  If you want to include multiple columns simply add them to the PARTITION BY   ROW NUMBER  OVER PARTITION BY Col1  Col2      ORDER BY OrderColumn

User · Answer

DELETE from search where id not in      select min id  from search    group by url    having count    1     union     SELECT min id  FROM search    group by url    having count     gt  1

User · Answer

It can be done by many ways in sql server the most simplest way to do so is  Insert the distinct rows from the duplicate rows table to new temporary table  Then delete all the data from duplicate rows  table then insert all data from temporary table which has no duplicates as shown below   select distinct   into  tmp From table    delete from table    insert into table    select   from  tmp drop table  tmp     select   from table   Delete duplicate rows using Common Table Expression CTE   With CTE Duplicates as   select id name   row number    over partition by id name order by id name   rownumber  from table     delete from CTE Duplicates where rownumber  1

[sql] How to delete duplicate rows in SQL Server?

Examples related to sql

Examples related to sql-server-2008

Examples related to duplicates

Examples related to sql-delete