Remove duplicate rows in MySQL

Question

I have a table with the following fields   id  Unique  url  Unique  title company site id   Now  I need  to remove rows having same title  company and site id  One way to do it will be using the following SQL along with a script  PHP    SELECT title  site id  location  id  count       FROM jobs GROUP BY site id  company  title  location HAVING count       gt 1   After running this query  I can remove duplicates using a server side script   But  I want to know if this can be done only using SQL query

User · Answer

I like to be a bit more specific   as to which records I delete  so here is my solution   delete from jobs c1 where not c1 location    Paris  and  c1 site id  gt  64218 and exists      select   from jobs c2  where c2 site id   c1 site id and   c2 company   c1 company and   c2 location   c1 location and   c2 title   c1 title and   c2 site id  gt  63412 and   c2 site id  lt  64219

User · Answer

To Delete the duplicate record in a table   delete from job s  where rowid  lt  any   select rowid from job k  where s site id   k site id and  s title   k title and  s company   k company     or  delete from job s  where rowid not in   select max rowid  from job k  where s site id   k site id and s title   k title and  s company   k company

User · Answer

Delete duplicate rows using DELETE JOIN statement MySQL provides you with the DELETE JOIN statement that you can use to remove duplicate rows quickly   The following statement deletes duplicate rows and keeps the highest id   DELETE t1 FROM contacts t1     INNER JOIN contacts t2 WHERE t1 id  lt  t2 id AND t1 email   t2 email

User · Answer

I had to do this with text fields and came across the limit of 100 bytes on the index   I solved this by adding a column  doing a md5 hash of the fields  and the doing the alter   ALTER TABLE table ADD  merged  VARCHAR  40   NOT NULL   UPDATE TABLE SET merged    MD5 CONCAT  col1    col2    col3    ALTER IGNORE TABLE table ADD UNIQUE INDEX idx name   merged

User · Answer

The faster way is to insert distinct rows into a temporary table  Using delete  it took me a few hours to remove duplicates from a table of 8 million rows  Using insert and distinct  it took just 13 minutes     CREATE TABLE tempTableName LIKE tableName    CREATE INDEX ix all id ON tableName cellId attributeId entityRowId value     INSERT INTO tempTableName cellId attributeId entityRowId value  SELECT DISTINCT cellId attributeId entityRowId value FROM tableName    TRUNCATE TABLE tableName  INSERT INTO tableName SELECT   FROM tempTableName   DROP TABLE tempTableName

User · Answer

This will delete the duplicate rows with same values for title  company and site  The first occurrence will be kept and rest all duplicates will be deleted  DELETE t1 FROM tablename t1 INNER JOIN tablename t2  WHERE      t1 id  lt  t2 id AND     t1 title   t2 title AND     t1 company t2 company AND     t1 site ID t2 site ID

User · Answer

There is another solution    DELETE t1 FROM my table t1  my table t2 WHERE t1 id  lt  t2 id AND t1 my field   t2 my field AND t1 my field 2   t2 my field 2 AND

User · Answer

I found a simple way   keep latest   DELETE t1 FROM tablename t1 INNER JOIN tablename t2  WHERE t1 id  lt  t2 id AND t1 column1   t2 column1 AND t1 column2   t2 column2

User · Answer

A solution that is simple to understand and works with no primary key    1  add a new boolean column  alter table mytable add tokeep boolean    2  add a constraint on the duplicated columns AND the new column  alter table mytable add constraint preventdupe unique  mycol1  mycol2  tokeep     3  set the boolean column to true  This will succeed only on one of the duplicated rows because of the new constraint  update ignore mytable set tokeep   true    4  delete rows that have not been marked as tokeep  delete from mytable where tokeep is null    5  drop the added column  alter table mytable drop tokeep    I suggest that you keep the constraint you added  so that new duplicates are prevented in the future

User · Answer

I have this query snipet for SQLServer but I think It can be used in others DBMS with little changes   DELETE FROM Table WHERE Table idTable IN          SELECT MAX idTable      FROM idTable     GROUP BY field1  field2  field3     HAVING COUNT     gt  1    I forgot to tell you that this query doesn t remove the row with the lowest id of the duplicated rows  If this works for you try this query     DELETE FROM jobs WHERE jobs id IN          SELECT MAX id      FROM jobs     GROUP BY site id  company  title  location     HAVING COUNT     gt  1

User · Answer

As of version 8 0  2018   MySQL finally supports window functions    Window functions are both handy and efficient  Here is a solution that demonstrates how to use them to solve this assignment   In a subquery  we can use ROW NUMBER   to assign a position to each record in the table within column1 column2 groups  ordered by id  If there is no duplicates  the record will get row number 1  If duplicate exists  they will be numbered by ascending id  starting at 1    Once records are properly numbered in the subquery  the outer query just deletes all records whose row number is not 1   Query    DELETE FROM tablename WHERE id IN       SELECT id     FROM           SELECT              id               ROW NUMBER   OVER PARTITION BY column1  column2 ORDER BY id  rn         FROM output       t     WHERE rn  gt  1

User · Answer

If the IGNORE statement won t work like in my case  you can use the below statement   CREATE TABLE your table deduped LIKE your table    INSERT your table deduped SELECT   FROM your table GROUP BY index1 id           index2 id   RENAME TABLE your table TO your table with dupes   RENAME TABLE your table deduped TO your table    OPTIONAL ALTER TABLE  your table  ADD UNIQUE  unique index    index1 id    index2 id      OPTIONAL DROP TABLE your table with dupes

User · Answer

if you have a large table with huge number of records then above solutions will not work or take too much time  Then we have a different solution   -- Create temporary table  CREATE TABLE temp table LIKE table1   -- Add constraint ALTER TABLE temp table ADD UNIQUE title  company site id    -- Copy data INSERT IGNORE INTO temp table SELECT   FROM table1   -- Rename and drop RENAME TABLE table1 TO old table1  temp table TO table1  DROP TABLE old table1

User · Answer

MySQL has restrictions about referring to the table you are deleting from   You can work around that with a temporary table  like   create temporary table tmpTable  id int    insert  into tmpTable          id  select  id from    YourTable yt where   exists                   select            from    YourTabe yt2         where   yt2 title   yt title                 and yt2 company   yt company                 and yt2 site id   yt site id                 and yt2 id  gt  yt id             delete   from    YourTable where   ID in  select id from tmpTable     From Kostanos  suggestion in the comments  The only slow query above is DELETE  for cases where you have a very large database  This query could be faster     DELETE FROM YourTable USING YourTable  tmpTable WHERE YourTable id tmpTable id

User · Answer

A really easy way to do this is to add a UNIQUE index on the 3 columns   When you write the ALTER statement  include the IGNORE keyword   Like so   ALTER IGNORE TABLE jobs ADD UNIQUE INDEX idx name  site id  title  company     This will drop all the duplicate rows   As an added benefit  future INSERTs that are duplicates will error out   As always  you may want to take a backup before running something like this

User · Answer

I have a table which forget to add a primary key in the id row  Though is has auto increment on the id  But one day  one stuff replay the mysql bin log on the database which insert some duplicate rows   I remove the duplicate row by   select the unique duplicate rows and export them   select T1   from table name T1 inner join  select count    as c id from table name group by id  T2 on T1 id   T2 id where T2 c  gt  1 group by T1 id    delete the duplicate rows by id insert the row from the exported data  Then add the primary key on id

User · Answer

I keep visiting this page anytime I google  remove duplicates form mysql  but for my theIGNORE solutions don t work because I have an InnoDB mysql tables  this code works better anytime  CREATE TABLE tableToclean temp LIKE tableToclean  ALTER TABLE tableToclean temp ADD UNIQUE INDEX  fontsinuse id   INSERT IGNORE INTO tableToclean temp SELECT   FROM tableToclean  DROP TABLE tableToclean  RENAME TABLE tableToclean temp TO tableToclean    tableToclean   the name of the table you need to clean  tableToclean temp   a temporary table created and deleted

User · Answer

Delete duplicate rows with the DELETE JOIN statement  DELETE t1 FROM table name t1 JOIN table name t2 WHERE     t1 id  lt  t2 id AND     t1 title   t2 title AND t1 company   t2 company AND t1 site id   t2 site id

User · Answer

-- Here is what I used  and it works  create table temp table like my table  -- t id is my unique column insert into temp table  id  select id from my table GROUP by t id  delete from my table where id not in  select id from temp table   drop table temp table

User · Answer

Simple and fast for all cases   CREATE TEMPORARY TABLE IF NOT EXISTS  temp duplicates AS  SELECT dub id FROM table with duplications dub GROUP BY dub field must be uniq 1  dub field must be uniq 2 HAVING COUNT      gt  1    DELETE FROM table with duplications WHERE id IN  SELECT id FROM  temp duplicates

User · Answer

This solution will move the duplicates into one table and the uniques into another   -- speed up creating uniques table if dealing with many rows CREATE INDEX temp idx ON jobs site id  company  title  location    -- create the table with unique rows INSERT jobs uniques SELECT   FROM           SELECT        FROM jobs     GROUP BY site id  company  title  location     HAVING count 1   gt  1     UNION     SELECT       FROM jobs     GROUP BY site id  company  title  location     HAVING count 1    1   x  -- create the table with duplicate rows INSERT jobs dupes  SELECT    FROM jobs WHERE id NOT IN  SELECT id FROM jobs uniques   -- confirm the difference between uniques and dupes tables SELECT COUNT 1  AS jobs    SELECT COUNT 1  FROM jobs dupes     SELECT COUNT 1  FROM jobs uniques  AS sum FROM jobs

User · Answer

Deleting duplicates on MySQL tables is a common issue  that s genarally the result of a missing constraint to avoid those duplicates before hand  But this common issue usually comes with specific needs    that do require specific approaches  The approach should be different depending on  for example  the size of the data  the duplicated entry that should be kept  generally the first or the last one   whether there are indexes to be kept  or whether we want to perform any additional action on the duplicated data   There are also some specificities on MySQL itself  such as not being able to reference the same table on a FROM cause when performing a table UPDATE  it ll raise MySQL error  1093   This limitation can be overcome by using an inner query with a temporary table  as suggested on some approaches above   But this inner query won t perform specially well when dealing with big data sources   However  a better approach does exist to remove duplicates  that s both efficient and reliable  and that can be easily adapted to different needs   The general idea is to create a new temporary table  usually adding a unique constraint to avoid further duplicates  and to INSERT the data from your former table into the new one  while taking care of the duplicates  This approach relies on simple MySQL INSERT queries  creates a new constraint to avoid further duplicates  and skips the need of using an inner query to search for duplicates and a temporary table that should be kept in memory  thus fitting big data sources too    This is how it can be achieved  Given we have a table employee  with the following columns   employee  id  first name  last name  start date  ssn    In order to delete the rows with a duplicate ssn column  and keeping only the first entry found  the following process can be followed   -- create a new tmp eployee table CREATE TABLE tmp employee LIKE employee   -- add a unique constraint ALTER TABLE tmp employee ADD UNIQUE ssn    -- scan over the employee table to insert employee entries INSERT IGNORE INTO tmp employee SELECT   FROM employee ORDER BY id   -- rename tables RENAME TABLE employee TO backup employee  tmp employee TO employee    Technical explanation   Line  1 creates a new tmp eployee table with exactly the same structure as the employee table Line  2 adds a UNIQUE constraint to the new tmp eployee table to avoid any further duplicates Line  3 scans over the original employee table by id  inserting new employee entries into the new tmp eployee table  while ignoring duplicated entries Line  4 renames tables  so that the new employee table holds all the entries without the duplicates  and a backup copy of the former data is kept on the backup employee table     Using this approach  1 6M registers were converted into 6k in less than 200s   Chetan  following this process  you could fast and easily remove all your duplicates and create a UNIQUE constraint by running   CREATE TABLE tmp jobs LIKE jobs   ALTER TABLE tmp jobs ADD UNIQUE site id  title  company    INSERT IGNORE INTO tmp jobs SELECT   FROM jobs ORDER BY id   RENAME TABLE jobs TO backup jobs  tmp jobs TO jobs    Of course  this process can be further modified to adapt it for different needs when deleting duplicates  Some examples follow     Variation for keeping the last entry instead of the first one  Sometimes we need to keep the last duplicated entry instead of the first one   CREATE TABLE tmp employee LIKE employee   ALTER TABLE tmp employee ADD UNIQUE ssn    INSERT IGNORE INTO tmp employee SELECT   FROM employee ORDER BY id DESC   RENAME TABLE employee TO backup employee  tmp employee TO employee     On line  3  the ORDER BY id DESC clause makes the last ID s to get priority over the rest     Variation for performing some tasks on the duplicates  for example keeping a count on the duplicates found  Sometimes we need to perform some further processing on the duplicated entries that are found  such as keeping a count of the duplicates    CREATE TABLE tmp employee LIKE employee   ALTER TABLE tmp employee ADD UNIQUE ssn    ALTER TABLE tmp employee ADD COLUMN n duplicates INT DEFAULT 0   INSERT INTO tmp employee SELECT   FROM employee ORDER BY id ON DUPLICATE KEY UPDATE n duplicates n duplicates 1   RENAME TABLE employee TO backup employee  tmp employee TO employee     On line  3  a new column n duplicates is created On line  4  the INSERT INTO     ON DUPLICATE KEY UPDATE query is used to perform an additional update when a duplicate is found  in this case  increasing a counter  The INSERT INTO     ON DUPLICATE KEY UPDATE query can be used to perform different types of updates for the duplicates found       Variation for regenerating the auto-incremental field id  Sometimes we use an auto-incremental field and  in order the keep the index as compact as possible  we can take advantage of the deletion of the duplicates to regenerate the auto-incremental field in the new temporary table   CREATE TABLE tmp employee LIKE employee   ALTER TABLE tmp employee ADD UNIQUE ssn    INSERT IGNORE INTO tmp employee SELECT  first name  last name  start date  ssn  FROM employee ORDER BY id   RENAME TABLE employee TO backup employee  tmp employee TO employee     On line  3  instead of selecting all the fields on the table  the id field is skipped so that the DB engine generates a new one automatically     Further variations  Many further modifications are also doable depending on the desired behavior  As an example  the following queries will use a second temporary table to  besides 1  keep the last entry instead of the first one  and 2  increase a counter on the duplicates found  also 3  regenerate the auto-incremental field id while keeping the entry order as it was on the former data   CREATE TABLE tmp employee LIKE employee   ALTER TABLE tmp employee ADD UNIQUE ssn    ALTER TABLE tmp employee ADD COLUMN n duplicates INT DEFAULT 0   INSERT INTO tmp employee SELECT   FROM employee ORDER BY id DESC ON DUPLICATE KEY UPDATE n duplicates n duplicates 1   CREATE TABLE tmp employee2 LIKE tmp employee   INSERT INTO tmp employee2 SELECT  first name  last name  start date  ssn  FROM tmp employee ORDER BY id   DROP TABLE tmp employee   RENAME TABLE employee TO backup employee  tmp employee2 TO employee

User · Answer

You can easily delete the duplicate records from this code     qry   mysql query  SELECT   from cities    while  qry row   mysql fetch array  qry      qry2   mysql query  SELECT   from cities2 where city       qry row  city          if mysql num rows  qry2   gt  1       while  row   mysql fetch array  qry2             city arry      row                   total   sizeof  city arry  - 1          for  i 1   i lt   total   i                   mysql query   delete from cities2 where town id       city arry  i  0                                   exit

User · Answer

If you don t want to alter the column properties  then you can use the query below   Since you have a column which has unique IDs  e g   auto increment columns   you can use it to remove the duplicates   DELETE  a  FROM      jobs  AS  a        jobs  AS  b  WHERE     -- IMPORTANT  Ensures one version remains     -- Change  ID  to your unique column s name      a   ID   lt   b   ID       -- Any duplicates you want to check for     AND   a   title     b   title  OR  a   title  IS NULL AND  b   title  IS NULL      AND   a   company     b   company  OR  a   company  IS NULL AND  b   company  IS NULL      AND   a   site id     b   site id  OR  a   site id  IS NULL AND  b   site id  IS NULL     In MySQL  you can simplify it even more with the NULL-safe equal operator  aka  spaceship operator     DELETE  a  FROM      jobs  AS  a        jobs  AS  b  WHERE     -- IMPORTANT  Ensures one version remains     -- Change  ID  to your unique column s name      a   ID   lt   b   ID       -- Any duplicates you want to check for     AND  a   title   lt   gt   b   title      AND  a   company   lt   gt   b   company      AND  a   site id   lt   gt   b   site id

User · Answer

In Order to duplicate records with unique columns  e g  COL1 COL2  COL3 should not be replicated  suppose we have missed 3 column unique in table structure and multiple duplicate entries have been made into the table   DROP TABLE TABLE NAME copy  CREATE TABLE TABLE NAME copy LIKE TABLE NAME  INSERT INTO TABLE NAME copy SELECT   FROM TABLE NAME GROUP BY COLUMN1  COLUMN2  COLUMN3   DROP TABLE TABLE NAME  ALTER TABLE TABLE NAME copy RENAME TO TABLE NAME    Hope will help dev

[mysql] Remove duplicate rows in MySQL

Examples related to mysql

Examples related to sql

Examples related to duplicates