Theory question here:
Why does specifying table.field IS NULL or table.field IS NOT NULL not work on a join condition (left or right join for instance) but only in the where condition?
Non working Example:
-this should return all shipments with any returns (non null values) filtered out. However, this returns all shipments regardless if anything meets the [r.id is null] statement.
SELECT
*
FROM
shipments s
LEFT OUTER JOIN returns r
ON s.id = r.id
AND r.id is null
WHERE
s.day >= CURDATE() - INTERVAL 10 DAY
Working example:
-This returns the correct amount of rows which is total shipments, less any related to a returns (non null values).
SELECT
*
FROM
shipments s
LEFT OUTER JOIN returns r
ON s.id = r.id
WHERE
s.day >= CURDATE() - INTERVAL 10 DAY
AND r.id is null
Why is this the case? All other filter conditions between two tables being joined work just fine, but for some reason IS NULL and IS NOT NULL filters do not work unless in the where statement.
What is the reason for this?
Your execution plan should make this clear; the JOIN takes precedence, after which the results are filtered.
Actually NULL filter is not being ignored. Thing is this is how joining two tables work.
I will try to walk down with the steps performed by database server to make it understand.
For example when you execute the query which you said is ignoring the NULL condition.
SELECT
*
FROM
shipments s
LEFT OUTER JOIN returns r
ON s.id = r.id
AND r.id is null
WHERE
s.day >= CURDATE() - INTERVAL 10 DAY
1st thing happened is all the rows from table SHIPMENTS get selected
on next step database server will start selecting one by one record from 2nd(RETURNS) table.
on third step the record from RETURNS table will be qualified against the join conditions you have provided in the query which in this case is (s.id = r.id and r.id is NULL)
note that this qualification applied on third step only decides if server should accept or reject the current record of RETURNS table to append with the selected row of SHIPMENT table. It can in no way effect the selection of record from SHIPMENT table.
And once server is done with joining two tables which contains all the rows of SHIPMENT table and selected rows of RETURNS table it applies the where clause on the intermediate result. so when you put (r.id is NULL) condition in where clause than all the records from the intermediate result with r.id = null gets filtered out.
The WHERE
clause is evaluated after the JOIN
conditions have been processed.
The NULL
part is calculated AFTER the actual join, so that is why it needs to be in the where clause.
You're doing a LEFT OUTTER JOIN
which indicates that you want every tuple from the table on the LEFT of the statement regardless of it has a matching record in the RIGHT table. This being the case, your results are being pruned from the RIGHT table but you're ending up with the same results as if you didn't include the AND at all within the ON clause.
Performing the AND in the WHERE clause causes the prune to happen after the LEFT JOIN takes place.
Source: Stackoverflow.com