How to delete duplicates on a MySQL table

Question

I need to DELETE duplicated rows for specified sid on a MySQL table   How can I do this with an SQL query   DELETE  DUPLICATED TITLES  FROM table WHERE SID    1    Something like this  but I don t know how to do it

User · Answer

delete p from  product p inner join       select max id  as id  url from product      group by url      having count     gt  1   unik on unik url   p url and unik id    p id

User · Answer

This procedure will remove all duplicates  incl multiples  in a table  keeping the last duplicate  This is an extension of Retrieving last record in each group  Hope this is useful to someone   DROP TABLE IF EXISTS UniqueIDs  CREATE Temporary table UniqueIDs  id Int 11     INSERT INTO UniqueIDs      SELECT T1 ID FROM Table T1 LEFT JOIN Table T2 ON      T1 Field1   T2 Field1 AND T1 Field2   T2 Field2  Comparison Fields      AND T1 ID  lt  T2 ID      WHERE T2 ID IS NULL    DELETE FROM Table WHERE id NOT IN  SELECT ID FROM UniqueIDs

User · Answer

DELETE T2 FROM   table name T1 JOIN   same table name T2 ON  T1 title   T2 title AND T1 ID  lt  gt  T2 ID

User · Answer

After running into this issue myself  on a huge database  I wasn t completely impressed with the performance of any of the other answers   I want to keep only the latest duplicate row  and delete the rest   In a one-query statement  without a temp table  this worked best for me   DELETE e   FROM employee e WHERE id IN   SELECT id    FROM  SELECT MIN id  as id           FROM employee e2           GROUP BY first name  last name           HAVING COUNT     gt  1  x     The only caveat is that I have to run the query multiple times  but even with that  I found it worked better for me than the other options

User · Answer

The following works for all tables  CREATE TABLE  noDup  LIKE  Dup    INSERT  noDup  SELECT DISTINCT   FROM  Dup    DROP TABLE  Dup    ALTER TABLE  noDup  RENAME  Dup

User · Answer

This works for large tables    CREATE Temporary table duplicates AS select max id  as id  url from links group by url having count     gt  1    DELETE l from links l inner join duplicates ld on ld id   l id WHERE ld id IS NOT NULL    To delete oldest change max id  to min id

User · Answer

here is how I usually eliminate duplicates   add a temporary column  name it whatever you want i ll refer as active  group by the fields that you think shouldn t be duplicate and set their active to 1  grouping by will select only one of duplicate values will not select duplicates for that columns delete the ones with active zero drop column active optionally if fits to your purposes   add unique index for those columns to not have duplicates again

User · Answer

delete from  table  where  table   SID  in            select t SID from table t join table t1 on t title   t1 title  where t SID  gt  t1 SID

User · Answer

This always seems to work for me   CREATE TABLE NoDupeTable LIKE DupeTable   INSERT NoDupeTable SELECT   FROM DupeTable group by CommonField1 CommonFieldN    Which keeps the lowest ID on each of the dupes and the rest of the non-dupe records   I ve also taken to doing the following so that the dupe issue no longer occurs after the removal   CREATE TABLE NoDupeTable LIKE DupeTable   Alter table NoDupeTable Add Unique  Unique   CommonField1 CommonField2   INSERT IGNORE NoDupeTable SELECT   FROM DupeTable    In other words  I create a duplicate of the first table  add a unique index on the fields I don t want duplicates of  and then do an Insert IGNORE which has the advantage of not failing as a normal Insert would the first time it tried to add a duplicate record based on the two fields and rather ignores any such records   Moving fwd it becomes impossible to create any duplicate records based on those two fields

User · Answer

Deleting duplicates on MySQL tables is a common issue  that usually comes with specific needs  In case anyone is interested  here  Remove duplicate rows in MySQL  I explain how to use a temporary table to delete MySQL duplicates in a reliable and fast way  also valid to handle big data sources  with examples for different use cases    Ali  in your case  you can run something like this   -- create a new temporary table CREATE TABLE tmp table1 LIKE table1   -- add a unique constraint     ALTER TABLE tmp table1 ADD UNIQUE sid  title    -- scan over the table to insert entries INSERT IGNORE INTO tmp table1 SELECT   FROM table1 ORDER BY sid   -- rename tables RENAME TABLE table1 TO backup table1  tmp table1 TO table1

User · Answer

This work for me to remove old records   delete from table where id in   select min e id      from  select   from table  e      group by column1  column2     having count     gt  1       You can replace min e id  to max e id  to remove newest records

User · Answer

Could it work if you count them  and then add a limit to your delete query leaving just one   For example  if you have two or more  write your query like this   DELETE FROM table WHERE SID   1 LIMIT 1

User · Answer

I find Werner s solution above to be the most convenient because it works regardless of the presence of a primary key  doesn t mess with tables  uses future-proof plain sql  is very understandable   As I stated in my comment  that solution hasn t been properly explained though  So this is mine  based on it   1  add a new boolean column  alter table mytable add tokeep boolean    2  add a constraint on the duplicated columns AND the new column  alter table mytable add constraint preventdupe unique  mycol1  mycol2  tokeep     3  set the boolean column to true  This will succeed only on one of the duplicated rows because of the new constraint  update ignore mytable set tokeep   true    4  delete rows that have not been marked as tokeep  delete from mytable where tokeep is null    5  drop the added column  alter table mytable drop tokeep    I suggest that you keep the constraint you added  so that new duplicates are prevented in the future

User · Answer

You could just use a DISTINCT clause to select the  cleaned up  list  and here is a very easy example on how to do that

User · Answer

Love  eric s answer but it doesn t seem to work if you have a really big table  I m getting The SELECT would examine more than MAX JOIN SIZE rows  check your WHERE and use SET SQL BIG SELECTS 1 or SET MAX JOIN SIZE   if the SELECT is okay when I try to run it    So I limited the join query to only consider the duplicate rows and I ended up with   DELETE a FROM penguins a     LEFT JOIN  SELECT COUNT baz  AS num  MIN baz  AS keepBaz  foo         FROM penguins         GROUP BY deviceId HAVING num  gt  1  b         ON a baz    b keepBaz         AND a foo   b foo     WHERE b foo IS NOT NULL   The WHERE clause in this case allows MySQL to ignore any row that doesn t have a duplicate and will also ignore if this is the first instance of the duplicate so only subsequent duplicates will be ignored   Change MIN baz  to MAX baz  to keep the last instance instead of the first

User · Answer

There are just a few basic steps when removing duplicate data from your table    Back up your table  Find the duplicate rows Remove the duplicate rows   Here is the full tutorial  https   blog teamsql io deleting-duplicate-data-3541485b3473

User · Answer

I think this will work by  basically copying the table and emptying it then putting only the distinct values back into it but please double check it before doing it on large amounts of data   Creates a carbon copy of your table     create table temp table like oldtablename    insert temp table select   from oldtablename    Empties your original table     DELETE   from oldtablename    Copies all distinct values from the copied table back to your original table     INSERT oldtablename SELECT   from temp table group by firstname lastname dob   Deletes your temp table      Drop Table temp table   You need to group by aLL fields that you want to keep distinct

User · Answer

Deleting duplicate rows in MySQL in-place   Assuming you have a timestamp col to sort by  walkthrough   Create the table and insert some rows   create table penguins foo int  bar varchar 15   baz datetime   insert into penguins values 1   skipper   now     insert into penguins values 1   skipper   now     insert into penguins values 3   kowalski   now     insert into penguins values 3   kowalski   now     insert into penguins values 3   kowalski   now     insert into penguins values 4   rico   now     select   from penguins       ------ ---------- ---------------------        foo    bar        baz                        ------ ---------- ---------------------           1   skipper    2014-08-25 14 21 54            1   skipper    2014-08-25 14 21 59            3   kowalski   2014-08-25 14 22 09            3   kowalski   2014-08-25 14 22 13            3   kowalski   2014-08-25 14 22 15            4   rico       2014-08-25 14 22 22        ------ ---------- ---------------------  6 rows in set  0 00 sec    Remove the duplicates in place   delete a     from penguins a     left join      select max baz  maxtimestamp  foo  bar     from penguins     group by foo  bar  b     on a baz   maxtimestamp and     a foo   b foo and     a bar   b bar     where b maxtimestamp IS NULL  Query OK  3 rows affected  0 01 sec  select   from penguins   ------ ---------- ---------------------    foo    bar        baz                    ------ ---------- ---------------------       1   skipper    2014-08-25 14 21 59        3   kowalski   2014-08-25 14 22 15        4   rico       2014-08-25 14 22 22    ------ ---------- ---------------------  3 rows in set  0 00 sec    You re done  duplicate rows are removed  last one by timestamp is kept   For those of you without a timestamp or unique column   You don t have a timestamp or a unique index column to sort by   You re living in a state of degeneracy   You ll have to do additional steps to delete duplicate rows   create the penguins table and add some rows   create table penguins foo int  bar varchar 15     insert into penguins values 1   skipper     insert into penguins values 1   skipper     insert into penguins values 3   kowalski     insert into penguins values 3   kowalski     insert into penguins values 3   kowalski     insert into penguins values 4   rico     select   from penguins          ------ ----------           foo    bar                ------ ----------              1   skipper                1   skipper                3   kowalski               3   kowalski               3   kowalski               4   rico               ------ ----------     make a clone of the first table and copy into it    drop table if exists penguins copy   create table penguins copy as   SELECT foo  bar FROM penguins        add an autoincrementing primary key   ALTER TABLE penguins copy ADD moo int AUTO INCREMENT PRIMARY KEY first    select   from penguins copy          ----- ------ ----------           moo   foo    bar                ----- ------ ----------             1      1   skipper               2      1   skipper               3      3   kowalski              4      3   kowalski              5      3   kowalski              6      4   rico               ----- ------ ----------     The max aggregate operates upon the new moo index    delete a from penguins copy a left join       select max moo  myindex  foo  bar      from penguins copy      group by foo  bar  b      on a moo   b myindex and      a foo   b foo and      a bar   b bar      where b myindex IS NULL     drop the extra column on the copied table  alter table penguins copy drop moo   select   from penguins copy     drop the first table and put the copy table back   drop table penguins   create table penguins select   from penguins copy     observe and cleanup   drop table penguins copy   select   from penguins   ------ ----------     foo    bar          ------ ----------        1   skipper          3   kowalski         4   rico         ------ ----------       Elapsed  1458 359 milliseconds    What s that big SQL delete statement doing   Table penguins with alias  a  is left joined on a subset of table penguins called alias  b    The right hand table  b  which is a subset finds the max timestamp   or max moo   grouped by columns foo and bar   This is matched to left hand table  a     foo bar baz  on left has every row in the table   The right hand subset  b  has a  maxtimestamp foo bar  which is matched to left only on the one that IS the max   Every row that is not that max has value maxtimestamp of NULL   Filter down on those NULL rows and you have a set of all rows grouped by foo and bar that isn t the latest timestamp baz   Delete those ones     Make a backup of the table before you run this     Prevent this problem from ever happening again on this table   If you got this to work  and it put out your  duplicate row  fire   Great   Now define a new composite unique key on your table  on those two columns  to prevent more duplicates from being added in the first place     Like a good immune system  the bad rows shouldn t even be allowed in to the table at the time of insert   Later on all those programs adding duplicates will broadcast their protest  and when you fix them  this issue never comes up again

User · Answer

If you want to keep the row with the lowest id value   DELETE n1 FROM  yourTableName  n1   yourTableName  n2 WHERE n1 id  gt  n2 id AND n1 email   n2 email  If you want to keep the row with the highest id value   DELETE n1 FROM  yourTableName  n1   yourTableName  n2 WHERE n1 id  lt  n2 id AND n1 email   n2 email

User · Answer

Following remove duplicates for all SID-s  not only single one   With temp table  CREATE TABLE table temp AS SELECT   FROM table GROUP BY title  SID   DROP TABLE table  RENAME TABLE table temp TO table    Since temp table is freshly created it has no indexes  You ll need to recreate them after removing duplicates  You can check what indexes you have in the table with SHOW INDEXES IN table  Without temp table   DELETE FROM  table  WHERE id IN     SELECT all duplicates id FROM       SELECT id FROM  table  WHERE   title    SID   IN         SELECT  title    SID  FROM  table  GROUP BY  title    SID  having count     gt  1           AS all duplicates    LEFT JOIN       SELECT id FROM  table  GROUP BY  title    SID  having count     gt  1     AS grouped duplicates    ON all duplicates id   grouped duplicates id    WHERE grouped duplicates id IS NULL

User · Answer

Here is a simple answer   delete a from target table a left JOIN  select max id field  as id  field being repeated       from target table GROUP BY field being repeated  b      on a field being repeated   b field being repeated       and a id field   b id field     where b id field is null

User · Answer

this removes duplicates in place  without making a new table  ALTER IGNORE TABLE  table name  ADD UNIQUE  title  SID    note  only works well if index fits in memory

User · Answer

Suppose you have a table employee  with the following columns   employee  first name  last name  start date    In order to delete the rows with a duplicate first name column   delete from employee using employee      employee e1 where employee id  gt  e1 id     and employee first name   e1 first name

User · Answer

Another easy way    using UPDATE IGNORE   U have to use an index on one or more columns  type index   Create a new temporary reference column  not part of the index   In this column  you mark the uniques in by updating it with ignore clause  Step by step    Add a temporary reference column to mark the uniques   ALTER TABLE  yourtable  ADD  unique  VARCHAR 3  NOT NULL AFTER  lastcolname        this will add a column to your table    Update the table  try to mark everything as unique  but ignore possible errors due to to duplicate key issue  records will be skipped    UPDATE IGNORE  yourtable  SET  unique     Yes  WHERE 1       you will find your duplicate records will not be marked as unique    Yes   in other words only one of each set of duplicate records will be marked as unique   Delete everything that s not unique   DELETE   FROM  yourtable  WHERE  unique   lt  gt   Yes        This will remove all duplicate records   Drop the column     ALTER TABLE  yourtable  DROP  unique

User · Answer

This here will make the column column name into a primary key  and in the meantime ignore all errors  So it will delete the rows with a duplicate value for column name   ALTER IGNORE TABLE  table name  ADD PRIMARY KEY   column name

[mysql] How to delete duplicates on a MySQL table?

Examples related to mysql

Examples related to duplicates