[sql] MySQL "NOT IN" query

I wanted to run a simple query to throw up all the rows of Table1 where a principal column value is not present in a column in another table (Table2).

I tried using:

SELECT * FROM Table1 WHERE Table1.principal NOT IN Table2.principal

This is instead throwing a syntax error. Google search led me to forums where people were saying that MySQL does not support NOT IN and something extremely complex needs to be used. Is this true? Or am I making a horrendous mistake?

This question is related to sql mysql

The answer is


NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL in MySQL

MySQL, as well as all other systems except SQL Server, is able to optimize LEFT JOIN / IS NULL to return FALSE as soon the matching value is found, and it is the only system that cared to document this behavior. […] Since MySQL is not capable of using HASH and MERGE join algorithms, the only ANTI JOIN it is capable of is the NESTED LOOPS ANTI JOIN

[…]

Essentially, [NOT IN] is exactly the same plan that LEFT JOIN / IS NULL uses, despite the fact these plans are executed by the different branches of code and they look different in the results of EXPLAIN. The algorithms are in fact the same in fact and the queries complete in same time.

[…]

It’s hard to tell exact reason for [performance drop when using NOT EXISTS], since this drop is linear and does not seem to depend on data distribution, number of values in both tables etc., as long as both fields are indexed. Since there are three pieces of code in MySQL that essentialy do one job, it is possible that the code responsible for EXISTS makes some kind of an extra check which takes extra time.

[…]

MySQL can optimize all three methods to do a sort of NESTED LOOPS ANTI JOIN. […] However, these three methods generate three different plans which are executed by three different pieces of code. The code that executes EXISTS predicate is about 30% less efficient […]

That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT EXISTS.

(emphases added)


Be carefull NOT IN is not an alias for <> ANY, but for <> ALL!

http://dev.mysql.com/doc/refman/5.0/en/any-in-some-subqueries.html

SELECT c FROM t1 LEFT JOIN t2 USING (c) WHERE t2.c IS NULL

cant' be replaced by

SELECT c FROM t1 WHERE c NOT IN (SELECT c FROM t2)

You must use

SELECT c FROM t1 WHERE c <> ANY (SELECT c FROM t2)

Unfortunately it seems to be a issue with MySql usage of "NOT IN" clause, the screen-shoot below shows the sub-query option returning wrong results:

mysql> show variables like '%version%';
+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| innodb_version          | 1.1.8                        |
| protocol_version        | 10                           |
| slave_type_conversions  |                              |
| version                 | 5.5.21                       |
| version_comment         | MySQL Community Server (GPL) |
| version_compile_machine | x86_64                       |
| version_compile_os      | Linux                        |
+-------------------------+------------------------------+
7 rows in set (0.07 sec)

mysql> select count(*) from TABLE_A where TABLE_A.Pkey not in (select distinct TABLE_B.Fkey from TABLE_B );
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.07 sec)

mysql> select count(*) from TABLE_A left join TABLE_B on TABLE_A.Pkey = TABLE_B.Fkey where TABLE_B.Pkey is null;
+----------+
| count(*) |
+----------+
|      139 |
+----------+
1 row in set (0.06 sec)

mysql> select count(*) from TABLE_A where NOT EXISTS (select * FROM TABLE_B WHERE TABLE_B.Fkey = TABLE_A.Pkey );
+----------+
| count(*) |
+----------+
|      139 |
+----------+
1 row in set (0.06 sec)

mysql> 

The subquery option has already been answered, but note that in many cases a LEFT JOIN can be a faster way to do this:

SELECT table1.*
FROM table1 LEFT JOIN table2 ON table2.principal=table1.principal
WHERE table2.principal IS NULL

If you want to check multiple tables to make sure it's not present in any of the tables (like in SRKR's comment), you can use this:

SELECT table1.*
FROM table1
LEFT JOIN table2 ON table2.name=table1.name
LEFT JOIN table3 ON table3.name=table1.name
WHERE table2.name IS NULL AND table3.name IS NULL