[sql] SQL "select where not in subquery" returns no results

Disclaimer: I have figured out the problem (I think), but I wanted to add this issue to Stack Overflow since I couldn't (easily) find it anywhere. Also, someone might have a better answer than I do.

I have a database where one table "Common" is referenced by several other tables. I wanted to see what records in the Common table were orphaned (i.e., had no references from any of the other tables).

I ran this query:

select *
from Common
where common_id not in (select common_id from Table1)
and common_id not in (select common_id from Table2)

I know that there are orphaned records, but no records were returned. Why not?

(This is SQL Server, if it matters.)

This question is related to sql sql-server tsql

The answer is


Update:

These articles in my blog describe the differences between the methods in more detail:


There are three ways to do such a query:

  • LEFT JOIN / IS NULL:

    SELECT  *
    FROM    common
    LEFT JOIN
            table1 t1
    ON      t1.common_id = common.common_id
    WHERE   t1.common_id IS NULL
    
  • NOT EXISTS:

    SELECT  *
    FROM    common
    WHERE   NOT EXISTS
            (
            SELECT  NULL
            FROM    table1 t1
            WHERE   t1.common_id = common.common_id
            )
    
  • NOT IN:

    SELECT  *
    FROM    common
    WHERE   common_id NOT IN
            (
            SELECT  common_id
            FROM    table1 t1
            )
    

When table1.common_id is not nullable, all these queries are semantically the same.

When it is nullable, NOT IN is different, since IN (and, therefore, NOT IN) return NULL when a value does not match anything in a list containing a NULL.

This may be confusing but may become more obvious if we recall the alternate syntax for this:

common_id = ANY
(
SELECT  common_id
FROM    table1 t1
)

The result of this condition is a boolean product of all comparisons within the list. Of course, a single NULL value yields the NULL result which renders the whole result NULL too.

We never cannot say definitely that common_id is not equal to anything from this list, since at least one of the values is NULL.

Suppose we have these data:

common

--
1
3

table1

--
NULL
1
2

LEFT JOIN / IS NULL and NOT EXISTS will return 3, NOT IN will return nothing (since it will always evaluate to either FALSE or NULL).

In MySQL, in case on non-nullable column, LEFT JOIN / IS NULL and NOT IN are a little bit (several percent) more efficient than NOT EXISTS. If the column is nullable, NOT EXISTS is the most efficient (again, not much).

In Oracle, all three queries yield same plans (an ANTI JOIN).

In SQL Server, NOT IN / NOT EXISTS are more efficient, since LEFT JOIN / IS NULL cannot be optimized to an ANTI JOIN by its optimizer.

In PostgreSQL, LEFT JOIN / IS NULL and NOT EXISTS are more efficient than NOT IN, sine they are optimized to an Anti Join, while NOT IN uses hashed subplan (or even a plain subplan if the subquery is too large to hash)


Just off the top of my head...

select c.commonID, t1.commonID, t2.commonID
from Common c
     left outer join Table1 t1 on t1.commonID = c.commonID
     left outer join Table2 t2 on t2.commonID = c.commonID
where t1.commonID is null 
     and t2.commonID is null

I ran a few tests and here were my results w.r.t. @patmortech's answer and @rexem's comments.

If either Table1 or Table2 is not indexed on commonID, you get a table scan but @patmortech's query is still twice as fast (for a 100K row master table).

If neither are indexed on commonID, you get two table scans and the difference is negligible.

If both are indexed on commonID, the "not exists" query runs in 1/3 the time.


Table1 or Table2 has some null values for common_id. Use this query instead:

select *
from Common
where common_id not in (select common_id from Table1 where common_id is not null)
and common_id not in (select common_id from Table2 where common_id is not null)

Let's suppose these values for common_id:

Common - 1
Table1 - 2
Table2 - 3, null

We want the row in Common to return, because it doesn't exist in any of the other tables. However, the null throws in a monkey wrench.

With those values, the query is equivalent to:

select *
from Common
where 1 not in (2)
and 1 not in (3, null)

That is equivalent to:

select *
from Common
where not (1=2)
and not (1=3 or 1=null)

This is where the problem starts. When comparing with a null, the answer is unknown. So the query reduces to

select *
from Common
where not (false)
and not (false or unkown)

false or unknown is unknown:

select *
from Common
where true
and not (unknown)

true and not unkown is also unkown:

select *
from Common
where unknown

The where condition does not return records where the result is unkown, so we get no records back.

One way to deal with this is to use the exists operator rather than in. Exists never returns unkown because it operates on rows rather than columns. (A row either exists or it doesn't; none of this null ambiguity at the row level!)

select *
from Common
where not exists (select common_id from Table1 where common_id = Common.common_id)
and not exists (select common_id from Table2 where common_id = Common.common_id)

SELECT T.common_id
  FROM Common T
       LEFT JOIN Table1 T1 ON T.common_id = T1.common_id
       LEFT JOIN Table2 T2 ON T.common_id = T2.common_id
 WHERE T1.common_id IS NULL
   AND T2.common_id IS NULL

Please follow the below example to understand the above topic:

Also you can visit the following link to know Anti join

select department_name,department_id from hr.departments dep
where not exists 
    (select 1 from hr.employees emp
    where emp.department_id=dep.department_id
    )
order by dep.department_name;
DEPARTMENT_NAME DEPARTMENT_ID
Benefits    160
Construction    180
Contracting 190
.......

But if we use NOT IN in that case we do not get any data.

select Department_name,department_id from hr.departments dep 
where department_id not in (select department_id from hr.employees );

no data found

This is happening as (select department_id from hr.employees) is returning a null value and the entire query is evaluated as false. We can see it if we change the SQL slightly like below and handle null values with NVL function.

select Department_name,department_id from hr.departments dep 
where department_id not in (select NVL(department_id,0) from hr.employees )

Now we are getting data:

DEPARTMENT_NAME DEPARTMENT_ID
Treasury    120
Corporate Tax   130
Control And Credit  140
Shareholder Services    150
Benefits    160
....

Again we are getting data as we have handled the null value with NVL function.


If you want the world to be a two-valued boolean place, you must prevent the null (third value) case yourself.

Don't write IN clauses that allow nulls in the list side. Filter them out!

common_id not in
(
  select common_id from Table1
  where common_id is not null
)

I had an example where I was looking up and because one table held the value as a double, the other as a string, they would not match (or not match without a cast). But only NOT IN. As SELECT ... IN ... worked. Weird, but thought I would share in case anyone else encounters this simple fix.


select *
from Common c
where not exists (select t1.commonid from table1 t1 where t1.commonid = c.commonid)
and not exists (select t2.commonid from table2 t2 where t2.commonid = c.commonid)

this worked for me :)

select * from Common

where

common_id not in (select ISNULL(common_id,'dummy-data') from Table1)

and common_id not in (select ISNULL(common_id,'dummy-data') from Table2)


select *,
(select COUNT(ID)  from ProductMaster where ProductMaster.CatID = CategoryMaster.ID) as coun 
from CategoryMaster

Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to sql-server

Passing multiple values for same variable in stored procedure SQL permissions for roles Count the Number of Tables in a SQL Server Database Visual Studio 2017 does not have Business Intelligence Integration Services/Projects ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database How to create temp table using Create statement in SQL Server? SQL Query Where Date = Today Minus 7 Days How do I pass a list as a parameter in a stored procedure? SQL Server date format yyyymmdd

Examples related to tsql

Passing multiple values for same variable in stored procedure Count the Number of Tables in a SQL Server Database Change Date Format(DD/MM/YYYY) in SQL SELECT Statement Stored procedure with default parameters Format number as percent in MS SQL Server EXEC sp_executesql with multiple parameters SQL Server after update trigger How to compare datetime with only date in SQL Server Text was truncated or one or more characters had no match in the target code page including the primary key in an unpivot Printing integer variable and string on same line in SQL