[mysql] "INSERT IGNORE" vs "INSERT ... ON DUPLICATE KEY UPDATE"

While executing an INSERT statement with many rows, I want to skip duplicate entries that would otherwise cause failure. After some research, my options appear to be the use of either:

  • ON DUPLICATE KEY UPDATE which implies an unnecessary update at some cost, or
  • INSERT IGNORE which implies an invitation for other kinds of failure to slip in unannounced.

Am I right in these assumptions? What's the best way to simply skip the rows that might cause duplicates and just continue on to the other rows?

This question is related to mysql insert

The answer is


I routinely use INSERT IGNORE, and it sounds like exactly the kind of behavior you're looking for as well. As long as you know that rows which would cause index conflicts will not be inserted and you plan your program accordingly, it shouldn't cause any trouble.


If you want to insert in the table and on the conflict of the primary key or unique index it will update the conflicting row instead of inserting that row.

Syntax:

insert into table1 set column1 = a, column2 = b on duplicate update column2 = c;

Now here, this insert statement may look different what you have seen earlier. This insert statement trying to insert a row in table1 with the value of a and b into column column1 and column2 respectively.

Let's understand this statement in depth:

For example: here column1 is defined as the primary key in table1.

Now if in table1 there is no row having the value “a” in column1. So this statement will insert a row in the table1.

Now if in table1 there is a row having the value “a” in column2. So this statement will update the row’s column2 value with “c” where the column1 value is “a”.

So if you want to insert a new row otherwise update that row on the conflict of the primary key or unique index.
Read more on this link


ON DUPLICATE KEY UPDATE is not really in the standard. It's about as standard as REPLACE is. See SQL MERGE.

Essentially both commands are alternative-syntax versions of standard commands.


Potential danger of INSERT IGNORE. If you are trying to insert VARCHAR value longer then column was defined with - the value will be truncated and inserted EVEN IF strict mode is enabled.


Replace Into seems like an option. Or you can check with

IF NOT EXISTS(QUERY) Then INSERT

This will insert or delete then insert. I tend to go for a IF NOT EXISTS check first.


Adding to this. If you use both INSERT IGNORE and ON DUPLICATE KEY UPDATE in the same statement, the update will still happen if the insert finds a duplicate key. In other words, the update takes precedence over the ignore. However, if the ON DUPLICATE KEY UPDATE clause itself causes a duplicate key error, that error will be ignored.

This can happen if you have more than one unique key, or if your update attempts to violate a foreign key constraint.

CREATE TABLE test 
 (id BIGINT (20) UNSIGNED AUTO_INCREMENT, 
  str VARCHAR(20), 
  PRIMARY KEY(id), 
  UNIQUE(str));

INSERT INTO test (str) VALUES('A'),('B');

/* duplicate key error caused not by the insert, 
but by the update: */
INSERT INTO test (str) VALUES('B') 
 ON DUPLICATE KEY UPDATE str='A'; 

/* duplicate key error is suppressed */
INSERT IGNORE INTO test (str) VALUES('B') 
 ON DUPLICATE KEY UPDATE str='A';

If using insert ignore having a SHOW WARNINGS; statement at the end of your query set will show a table with all the warnings, including which IDs were the duplicates.


As mentioned above, if you use INSERT..IGNORE, errors that occur while executing the INSERT statement are treated as warnings instead.

One thing which is not explicitly mentioned is that INSERT..IGNORE will cause invalid values will be adjusted to the closest values when inserted (whereas invalid values would cause the query to abort if the IGNORE keyword was not used).


INSERT...ON DUPLICATE KEY UPDATE is prefered to prevent unexpected Exceptions management.

This solution works when you have **1 unique constraint** only

In my case I know that col1 and col2 make a unique composite index.

It keeps track of the error, but does not throw an exception on duplicate. Regarding the performance, the update by the same value is efficient as MySQL notices this and does not update it

INSERT INTO table
  (col1, col2, col3, col4)
VALUES
  (?, ?, ?, ?)
ON DUPLICATE KEY UPDATE
    col1 = VALUES(col1),
    col2 = VALUES(col2)

The idea to use this approach came from the comments at phpdelusions.net/pdo.


In case you want to see what this all means, here is a blow-by-blow of everything:

CREATE TABLE `users_partners` (
  `uid` int(11) NOT NULL DEFAULT '0',
  `pid` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`uid`,`pid`),
  KEY `partner_user` (`pid`,`uid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8

Primary key is based on both columns of this quick reference table. A Primary key requires unique values.

Let's begin:

INSERT INTO users_partners (uid,pid) VALUES (1,1);
...1 row(s) affected

INSERT INTO users_partners (uid,pid) VALUES (1,1);
...Error Code : 1062
...Duplicate entry '1-1' for key 'PRIMARY'

INSERT IGNORE INTO users_partners (uid,pid) VALUES (1,1);
...0 row(s) affected

INSERT INTO users_partners (uid,pid) VALUES (1,1) ON DUPLICATE KEY UPDATE uid=uid
...0 row(s) affected

note, the above saved too much extra work by setting the column equal to itself, no update actually needed

REPLACE INTO users_partners (uid,pid) VALUES (1,1)
...2 row(s) affected

and now some multiple row tests:

INSERT INTO users_partners (uid,pid) VALUES (1,1),(1,2),(1,3),(1,4)
...Error Code : 1062
...Duplicate entry '1-1' for key 'PRIMARY'

INSERT IGNORE INTO users_partners (uid,pid) VALUES (1,1),(1,2),(1,3),(1,4)
...3 row(s) affected

no other messages were generated in console, and it now has those 4 values in the table data. I deleted everything except (1,1) so I could test from the same playing field

INSERT INTO users_partners (uid,pid) VALUES (1,1),(1,2),(1,3),(1,4) ON DUPLICATE KEY UPDATE uid=uid
...3 row(s) affected

REPLACE INTO users_partners (uid,pid) VALUES (1,1),(1,2),(1,3),(1,4)
...5 row(s) affected

So there you have it. Since this was all performed on a fresh table with nearly no data and not in production, the times for execution were microscopic and irrelevant. Anyone with real-world data would be more than welcome to contribute it.


Something important to add: When using INSERT IGNORE and you do have key violations, MySQL does NOT raise a warning!

If you try for instance to insert 100 records at a time, with one faulty one, you would get in interactive mode:

Query OK, 99 rows affected (0.04 sec)

Records: 100 Duplicates: 1 Warnings: 0

As you see: No Warnings! This behavior is even wrongly described in the official Mysql Documentation.

If your script needs to be informed, if some records have not been added (due to key violations) you have to call mysql_info() and parse it for the "Duplicates" value.