NOT IN vs NOT EXISTS

Question

Which of these queries is the faster   NOT EXISTS   SELECT ProductID  ProductName  FROM Northwind  Products p WHERE NOT EXISTS       SELECT 1      FROM Northwind   Order Details  od      WHERE p ProductId   od ProductId    Or NOT IN   SELECT ProductID  ProductName  FROM Northwind  Products p WHERE p ProductID NOT IN       SELECT ProductID      FROM Northwind   Order Details     The query execution plan says they both do the same thing   If that is the case  which is the recommended form   This is based on the NorthWind database    Edit   Just found this helpful article   http   weblogs sqlteam com mladenp archive 2007 05 18 60210 aspx  I think I ll stick with NOT EXISTS

User · Answer

Actually  I believe this would be the fastest   SELECT ProductID  ProductName      FROM Northwind  Products p             outer join Northwind   Order Details  od on p ProductId   od ProductId  WHERE od ProductId is null

User · Answer

Also be aware that NOT IN is not equivalent to NOT EXISTS when it comes to null   This post explains it very well   http   sqlinthewild co za index php 2010 02 18 not-exists-vs-not-in      When the subquery returns even one null  NOT IN will not match any   rows       The reason for this can be found by looking at the details of what the   NOT IN operation actually means       Let   s say  for illustration purposes that there are 4 rows in the   table called t  there   s a column called ID with values 1  4  WHERE SomeValue NOT IN  SELECT AVal FROM t        is equivalent to  WHERE SomeValue     SELECT AVal FROM t WHERE ID 1  AND SomeValue     SELECT AVal FROM t WHERE ID 2  AND SomeValue     SELECT AVal FROM t WHERE ID 3  AND SomeValue     SELECT AVal FROM t WHERE ID 4        Let   s further say that AVal is NULL where ID   4  Hence that      comparison returns UNKNOWN  The logical truth table for AND states   that UNKNOWN and TRUE is UNKNOWN  UNKNOWN and FALSE is FALSE  There is   no value that can be AND   d with UNKNOWN to produce the result TRUE      Hence  if any row of that subquery returns NULL  the entire NOT IN   operator will evaluate to either FALSE or NULL and no records will be   returned

User · Answer

They are very similar but not really the same    In terms of efficiency  I ve found the left join is null statement more efficient  when an abundance of rows are to be selected that is

User · Answer

It depends    SELECT x col FROM big table x WHERE x key IN  SELECT key FROM really big table      would not be relatively slow the isn t much to limit size of what the query check to see if they key is in  EXISTS would be preferable in this case   But  depending on the DBMS s optimizer  this could be no different   As an example of when EXISTS is better  SELECT x col FROM big table x WHERE EXISTS  SELECT key FROM really big table WHERE key   x key     AND id   very limiting criteria

User · Answer

In your specific example they are the same  because the optimizer has figured out what you are trying to do is the same in both examples  But it is possible that in non-trivial examples the optimizer may not do this  and in that case there are reasons to prefer one to other on occasion   NOT IN should be preferred if you are testing multiple rows in your outer select  The subquery inside the NOT IN statement can be evaluated at the beginning of the execution  and the temporary table can be checked against each value in the outer select  rather than re-running the subselect every time as would be required with the NOT EXISTS statement   If the subquery must be correlated with the outer select  then NOT EXISTS may be preferable  since the optimizer may discover a simplification that prevents the creation of any temporary tables to perform the same function

User · Answer

Also be aware that NOT IN is not equivalent to NOT EXISTS when it comes to null   This post explains it very well   http   sqlinthewild co za index php 2010 02 18 not-exists-vs-not-in      When the subquery returns even one null  NOT IN will not match any   rows       The reason for this can be found by looking at the details of what the   NOT IN operation actually means       Let   s say  for illustration purposes that there are 4 rows in the   table called t  there   s a column called ID with values 1  4  WHERE SomeValue NOT IN  SELECT AVal FROM t        is equivalent to  WHERE SomeValue     SELECT AVal FROM t WHERE ID 1  AND SomeValue     SELECT AVal FROM t WHERE ID 2  AND SomeValue     SELECT AVal FROM t WHERE ID 3  AND SomeValue     SELECT AVal FROM t WHERE ID 4        Let   s further say that AVal is NULL where ID   4  Hence that      comparison returns UNKNOWN  The logical truth table for AND states   that UNKNOWN and TRUE is UNKNOWN  UNKNOWN and FALSE is FALSE  There is   no value that can be AND   d with UNKNOWN to produce the result TRUE      Hence  if any row of that subquery returns NULL  the entire NOT IN   operator will evaluate to either FALSE or NULL and no records will be   returned

User · Answer

I always default to NOT EXISTS   The execution plans may be the same at the moment but if either column is altered in the future to allow NULLs the NOT IN version will need to do more work  even if no NULLs are actually present in the data  and the semantics of NOT IN if NULLs are present are unlikely to be the ones you want anyway   When neither Products ProductID or  Order Details  ProductID allow NULLs the NOT IN will be treated identically to the following query   SELECT ProductID         ProductName FROM   Products p WHERE  NOT EXISTS  SELECT                      FROM    Order Details  od                    WHERE  p ProductId   od ProductId     The exact plan may vary but for my example data I get the following     A reasonably common misconception seems to be that correlated sub queries are always  bad  compared to joins  They certainly can be when they force a nested loops plan  sub query evaluated row by row  but  this plan includes an anti semi join logical operator  Anti semi joins are not restricted to nested loops but can use hash or merge  as in this example  joins too     Not valid syntax but better reflects the plan    SELECT p ProductID         p ProductName FROM   Products p        LEFT ANTI SEMI JOIN  Order Details  od          ON p ProductId   od ProductId    If  Order Details  ProductID is NULL-able the query then becomes  SELECT ProductID         ProductName FROM   Products p WHERE  NOT EXISTS  SELECT                      FROM    Order Details  od                    WHERE  p ProductId   od ProductId         AND NOT EXISTS  SELECT                          FROM    Order Details                         WHERE  ProductId IS NULL     The reason for this is that the correct semantics if  Order Details  contains any NULL ProductIds is to return no results  See the extra anti semi join and row count spool to verify this that is added to the plan     If Products ProductID is also changed to become NULL-able the query then becomes  SELECT ProductID         ProductName FROM   Products p WHERE  NOT EXISTS  SELECT                      FROM    Order Details  od                    WHERE  p ProductId   od ProductId         AND NOT EXISTS  SELECT                          FROM    Order Details                         WHERE  ProductId IS NULL         AND NOT EXISTS  SELECT                          FROM    SELECT TOP 1                                  FROM    Order Details   S                        WHERE  p ProductID IS NULL     The reason for that one is because a NULL Products ProductId should not be returned in the results except if the NOT IN sub query were to return no results at all  i e  the  Order Details  table is empty   In which case it should  In the plan for my sample data this is implemented by adding another anti semi join as below     The effect of this is shown in the blog post already linked by Buckley  In the example there the number of logical reads increase from around 400 to 500 000   Additionally the fact that a single NULL can reduce the row count to zero makes cardinality estimation very difficult  If SQL Server assumes that this will happen but in fact there were no NULL rows in the data the rest of the execution plan may be catastrophically worse  if this is just part of a larger query  with inappropriate nested loops causing repeated execution of an expensive sub tree for example    This is not the only possible execution plan for a NOT IN on a NULL-able column however  This article shows another one for a query against the AdventureWorks2008 database   For the NOT IN on a NOT NULL column or the NOT EXISTS against either a nullable or non nullable column it gives the following plan     When the column changes to NULL-able the NOT IN plan now looks like    It adds an extra inner join operator to the plan  This apparatus is explained here  It is all there to convert the previous single correlated index seek on Sales SalesOrderDetail ProductID    lt correlated product id gt  to two seeks per outer row  The additional one is on WHERE Sales SalesOrderDetail ProductID IS NULL    As this is under an anti semi join if that one returns any rows the second seek will not occur  However if Sales SalesOrderDetail does not contain any NULL ProductIDs it will double the number of seek operations required

User · Answer

In your specific example they are the same  because the optimizer has figured out what you are trying to do is the same in both examples  But it is possible that in non-trivial examples the optimizer may not do this  and in that case there are reasons to prefer one to other on occasion   NOT IN should be preferred if you are testing multiple rows in your outer select  The subquery inside the NOT IN statement can be evaluated at the beginning of the execution  and the temporary table can be checked against each value in the outer select  rather than re-running the subselect every time as would be required with the NOT EXISTS statement   If the subquery must be correlated with the outer select  then NOT EXISTS may be preferable  since the optimizer may discover a simplification that prevents the creation of any temporary tables to perform the same function

User · Answer

Actually  I believe this would be the fastest   SELECT ProductID  ProductName      FROM Northwind  Products p             outer join Northwind   Order Details  od on p ProductId   od ProductId  WHERE od ProductId is null

User · Answer

If the execution planner says they re the same  they re the same  Use whichever one will make your intention more obvious -- in this case  the second

User · Answer

Database table model Let   s assume we have the following two tables in our database  that form a one-to-many table relationship   The student table is the parent  and the student grade is the child table since it has a student id Foreign Key column referencing the id Primary Key column in the student table  The student table contains the following two records    id   first name   last name   admission score    ---- ------------ ----------- -----------------    1    Alice        Smith       8 95                2    Bob          Johnson     8 75               And  the student grade table stores the grades the students received    id   class name   grade   student id    ---- ------------ ------- ------------    1    Math         10      1              2    Math         9 5     1              3    Math         9 75    1              4    Science      9 5     1              5    Science      9       1              6    Science      9 25    1              7    Math         8 5     2              8    Math         9 5     2              9    Math         9       2              10   Science      10      2              11   Science      9 4     2             SQL EXISTS Let   s say we want to get all students that have received a 10 grade in Math class  If we are only interested in the student identifier  then we can run a query like this one  SELECT     student grade student id FROM     student grade WHERE     student grade grade   10 AND     student grade class name    Math  ORDER BY     student grade student id  But  the application is interested in displaying the full name of a student  not just the identifier  so we need info from the student table as well  In order to filter the student records that have a 10 grade in Math  we can use the EXISTS SQL operator  like this  SELECT     id  first name  last name FROM     student WHERE EXISTS       SELECT 1     FROM         student grade     WHERE         student grade student id   student id AND         student grade grade   10 AND         student grade class name    Math    ORDER BY id  When running the query above  we can see that only the Alice row is selected    id   first name   last name    ---- ------------ -----------    1    Alice        Smith        The outer query selects the student row columns we are interested in returning to the client  However  the WHERE clause is using the EXISTS operator with an associated inner subquery  The EXISTS operator returns true if the subquery returns at least one record and false if no row is selected  The database engine does not have to run the subquery entirely  If a single record is matched  the EXISTS operator returns true  and the associated other query row is selected  The inner subquery is correlated because the student id column of the student grade table is matched against the id column of the outer student table  SQL NOT EXISTS Let   s consider we want to select all students that have no grade lower than 9  For this  we can use NOT EXISTS  which negates the logic of the EXISTS operator  Therefore  the NOT EXISTS operator returns true if the underlying subquery returns no record  However  if a single record is matched by the inner subquery  the NOT EXISTS operator will return false  and the subquery execution can be stopped  To match all student records that have no associated student grade with a value lower than 9  we can run the following SQL query  SELECT     id  first name  last name FROM     student WHERE NOT EXISTS       SELECT 1     FROM         student grade     WHERE         student grade student id   student id AND         student grade grade  lt  9   ORDER BY id  When running the query above  we can see that only the Alice record is matched    id   first name   last name    ---- ------------ -----------    1    Alice        Smith        So  the advantage of using the SQL EXISTS and NOT EXISTS operators is that the inner subquery execution can be stopped as long as a matching record is found

User · Answer

I have a table which has about 120 000 records and need to select only those which does not exist  matched with a varchar column  in four other tables with number of rows approx 1500  4000  40000  200  All the involved tables have unique index on the concerned Varchar column    NOT IN took about 10 mins  NOT EXISTS took 4 secs   I have a recursive query which might had some untuned section which might have contributed to the 10 mins  but the other option taking 4 secs explains  atleast to me that NOT EXISTS is far better or at least that IN and EXISTS are not exactly the same and always worth a check before going ahead with code

User · Answer

Actually  I believe this would be the fastest   SELECT ProductID  ProductName      FROM Northwind  Products p             outer join Northwind   Order Details  od on p ProductId   od ProductId  WHERE od ProductId is null

User · Answer

It depends    SELECT x col FROM big table x WHERE x key IN  SELECT key FROM really big table      would not be relatively slow the isn t much to limit size of what the query check to see if they key is in  EXISTS would be preferable in this case   But  depending on the DBMS s optimizer  this could be no different   As an example of when EXISTS is better  SELECT x col FROM big table x WHERE EXISTS  SELECT key FROM really big table WHERE key   x key     AND id   very limiting criteria

User · Answer

Actually  I believe this would be the fastest   SELECT ProductID  ProductName      FROM Northwind  Products p             outer join Northwind   Order Details  od on p ProductId   od ProductId  WHERE od ProductId is null

User · Answer

In your specific example they are the same  because the optimizer has figured out what you are trying to do is the same in both examples  But it is possible that in non-trivial examples the optimizer may not do this  and in that case there are reasons to prefer one to other on occasion   NOT IN should be preferred if you are testing multiple rows in your outer select  The subquery inside the NOT IN statement can be evaluated at the beginning of the execution  and the temporary table can be checked against each value in the outer select  rather than re-running the subselect every time as would be required with the NOT EXISTS statement   If the subquery must be correlated with the outer select  then NOT EXISTS may be preferable  since the optimizer may discover a simplification that prevents the creation of any temporary tables to perform the same function

User · Answer

I was using  SELECT   from TABLE1 WHERE Col1 NOT IN  SELECT Col1 FROM TABLE2    and found that it was giving wrong results  By wrong I mean no results   As there was a NULL in TABLE2 Col1   While changing the query to   SELECT   from TABLE1 T1 WHERE NOT EXISTS  SELECT Col1 FROM TABLE2 T2 WHERE T1 Col1   T2 Col2    gave me the correct results   Since then I have started using NOT EXISTS every where

User · Answer

I have a table which has about 120 000 records and need to select only those which does not exist  matched with a varchar column  in four other tables with number of rows approx 1500  4000  40000  200  All the involved tables have unique index on the concerned Varchar column    NOT IN took about 10 mins  NOT EXISTS took 4 secs   I have a recursive query which might had some untuned section which might have contributed to the 10 mins  but the other option taking 4 secs explains  atleast to me that NOT EXISTS is far better or at least that IN and EXISTS are not exactly the same and always worth a check before going ahead with code

User · Answer

If the execution planner says they re the same  they re the same  Use whichever one will make your intention more obvious -- in this case  the second

User · Answer

If the execution planner says they re the same  they re the same  Use whichever one will make your intention more obvious -- in this case  the second

User · Answer

They are very similar but not really the same    In terms of efficiency  I ve found the left join is null statement more efficient  when an abundance of rows are to be selected that is

User · Answer

If the execution planner says they re the same  they re the same  Use whichever one will make your intention more obvious -- in this case  the second

User · Answer

It depends    SELECT x col FROM big table x WHERE x key IN  SELECT key FROM really big table      would not be relatively slow the isn t much to limit size of what the query check to see if they key is in  EXISTS would be preferable in this case   But  depending on the DBMS s optimizer  this could be no different   As an example of when EXISTS is better  SELECT x col FROM big table x WHERE EXISTS  SELECT key FROM really big table WHERE key   x key     AND id   very limiting criteria

User · Answer

I was using  SELECT   from TABLE1 WHERE Col1 NOT IN  SELECT Col1 FROM TABLE2    and found that it was giving wrong results  By wrong I mean no results   As there was a NULL in TABLE2 Col1   While changing the query to   SELECT   from TABLE1 T1 WHERE NOT EXISTS  SELECT Col1 FROM TABLE2 T2 WHERE T1 Col1   T2 Col2    gave me the correct results   Since then I have started using NOT EXISTS every where

User · Answer

If the optimizer says they are the same then consider the human factor  I prefer to see NOT EXISTS

User · Answer

In your specific example they are the same  because the optimizer has figured out what you are trying to do is the same in both examples  But it is possible that in non-trivial examples the optimizer may not do this  and in that case there are reasons to prefer one to other on occasion   NOT IN should be preferred if you are testing multiple rows in your outer select  The subquery inside the NOT IN statement can be evaluated at the beginning of the execution  and the temporary table can be checked against each value in the outer select  rather than re-running the subselect every time as would be required with the NOT EXISTS statement   If the subquery must be correlated with the outer select  then NOT EXISTS may be preferable  since the optimizer may discover a simplification that prevents the creation of any temporary tables to perform the same function

User · Answer

If the optimizer says they are the same then consider the human factor  I prefer to see NOT EXISTS

User · Answer

Database table model Let   s assume we have the following two tables in our database  that form a one-to-many table relationship   The student table is the parent  and the student grade is the child table since it has a student id Foreign Key column referencing the id Primary Key column in the student table  The student table contains the following two records    id   first name   last name   admission score    ---- ------------ ----------- -----------------    1    Alice        Smith       8 95                2    Bob          Johnson     8 75               And  the student grade table stores the grades the students received    id   class name   grade   student id    ---- ------------ ------- ------------    1    Math         10      1              2    Math         9 5     1              3    Math         9 75    1              4    Science      9 5     1              5    Science      9       1              6    Science      9 25    1              7    Math         8 5     2              8    Math         9 5     2              9    Math         9       2              10   Science      10      2              11   Science      9 4     2             SQL EXISTS Let   s say we want to get all students that have received a 10 grade in Math class  If we are only interested in the student identifier  then we can run a query like this one  SELECT     student grade student id FROM     student grade WHERE     student grade grade   10 AND     student grade class name    Math  ORDER BY     student grade student id  But  the application is interested in displaying the full name of a student  not just the identifier  so we need info from the student table as well  In order to filter the student records that have a 10 grade in Math  we can use the EXISTS SQL operator  like this  SELECT     id  first name  last name FROM     student WHERE EXISTS       SELECT 1     FROM         student grade     WHERE         student grade student id   student id AND         student grade grade   10 AND         student grade class name    Math    ORDER BY id  When running the query above  we can see that only the Alice row is selected    id   first name   last name    ---- ------------ -----------    1    Alice        Smith        The outer query selects the student row columns we are interested in returning to the client  However  the WHERE clause is using the EXISTS operator with an associated inner subquery  The EXISTS operator returns true if the subquery returns at least one record and false if no row is selected  The database engine does not have to run the subquery entirely  If a single record is matched  the EXISTS operator returns true  and the associated other query row is selected  The inner subquery is correlated because the student id column of the student grade table is matched against the id column of the outer student table  SQL NOT EXISTS Let   s consider we want to select all students that have no grade lower than 9  For this  we can use NOT EXISTS  which negates the logic of the EXISTS operator  Therefore  the NOT EXISTS operator returns true if the underlying subquery returns no record  However  if a single record is matched by the inner subquery  the NOT EXISTS operator will return false  and the subquery execution can be stopped  To match all student records that have no associated student grade with a value lower than 9  we can run the following SQL query  SELECT     id  first name  last name FROM     student WHERE NOT EXISTS       SELECT 1     FROM         student grade     WHERE         student grade student id   student id AND         student grade grade  lt  9   ORDER BY id  When running the query above  we can see that only the Alice record is matched    id   first name   last name    ---- ------------ -----------    1    Alice        Smith        So  the advantage of using the SQL EXISTS and NOT EXISTS operators is that the inner subquery execution can be stopped as long as a matching record is found

User · Answer

I always default to NOT EXISTS   The execution plans may be the same at the moment but if either column is altered in the future to allow NULLs the NOT IN version will need to do more work  even if no NULLs are actually present in the data  and the semantics of NOT IN if NULLs are present are unlikely to be the ones you want anyway   When neither Products ProductID or  Order Details  ProductID allow NULLs the NOT IN will be treated identically to the following query   SELECT ProductID         ProductName FROM   Products p WHERE  NOT EXISTS  SELECT                      FROM    Order Details  od                    WHERE  p ProductId   od ProductId     The exact plan may vary but for my example data I get the following     A reasonably common misconception seems to be that correlated sub queries are always  bad  compared to joins  They certainly can be when they force a nested loops plan  sub query evaluated row by row  but  this plan includes an anti semi join logical operator  Anti semi joins are not restricted to nested loops but can use hash or merge  as in this example  joins too     Not valid syntax but better reflects the plan    SELECT p ProductID         p ProductName FROM   Products p        LEFT ANTI SEMI JOIN  Order Details  od          ON p ProductId   od ProductId    If  Order Details  ProductID is NULL-able the query then becomes  SELECT ProductID         ProductName FROM   Products p WHERE  NOT EXISTS  SELECT                      FROM    Order Details  od                    WHERE  p ProductId   od ProductId         AND NOT EXISTS  SELECT                          FROM    Order Details                         WHERE  ProductId IS NULL     The reason for this is that the correct semantics if  Order Details  contains any NULL ProductIds is to return no results  See the extra anti semi join and row count spool to verify this that is added to the plan     If Products ProductID is also changed to become NULL-able the query then becomes  SELECT ProductID         ProductName FROM   Products p WHERE  NOT EXISTS  SELECT                      FROM    Order Details  od                    WHERE  p ProductId   od ProductId         AND NOT EXISTS  SELECT                          FROM    Order Details                         WHERE  ProductId IS NULL         AND NOT EXISTS  SELECT                          FROM    SELECT TOP 1                                  FROM    Order Details   S                        WHERE  p ProductID IS NULL     The reason for that one is because a NULL Products ProductId should not be returned in the results except if the NOT IN sub query were to return no results at all  i e  the  Order Details  table is empty   In which case it should  In the plan for my sample data this is implemented by adding another anti semi join as below     The effect of this is shown in the blog post already linked by Buckley  In the example there the number of logical reads increase from around 400 to 500 000   Additionally the fact that a single NULL can reduce the row count to zero makes cardinality estimation very difficult  If SQL Server assumes that this will happen but in fact there were no NULL rows in the data the rest of the execution plan may be catastrophically worse  if this is just part of a larger query  with inappropriate nested loops causing repeated execution of an expensive sub tree for example    This is not the only possible execution plan for a NOT IN on a NULL-able column however  This article shows another one for a query against the AdventureWorks2008 database   For the NOT IN on a NOT NULL column or the NOT EXISTS against either a nullable or non nullable column it gives the following plan     When the column changes to NULL-able the NOT IN plan now looks like    It adds an extra inner join operator to the plan  This apparatus is explained here  It is all there to convert the previous single correlated index seek on Sales SalesOrderDetail ProductID    lt correlated product id gt  to two seeks per outer row  The additional one is on WHERE Sales SalesOrderDetail ProductID IS NULL    As this is under an anti semi join if that one returns any rows the second seek will not occur  However if Sales SalesOrderDetail does not contain any NULL ProductIDs it will double the number of seek operations required

User · Answer

If the optimizer says they are the same then consider the human factor  I prefer to see NOT EXISTS

User · Answer

It depends    SELECT x col FROM big table x WHERE x key IN  SELECT key FROM really big table      would not be relatively slow the isn t much to limit size of what the query check to see if they key is in  EXISTS would be preferable in this case   But  depending on the DBMS s optimizer  this could be no different   As an example of when EXISTS is better  SELECT x col FROM big table x WHERE EXISTS  SELECT key FROM really big table WHERE key   x key     AND id   very limiting criteria

User · Answer

If the optimizer says they are the same then consider the human factor  I prefer to see NOT EXISTS

[sql] NOT IN vs NOT EXISTS

Examples related to sql

Examples related to sql-server

Examples related to notin