INNER JOIN vs LEFT JOIN performance in SQL Server

Question

I ve created SQL command that uses INNER JOIN on 9 tables  anyway this command takes a very long time  more than five minutes   So my folk suggested me to change INNER JOIN to LEFT JOIN because the performance of LEFT JOIN is better  despite what I know  After I changed it  the speed of query got significantly improved   I would like to know why LEFT JOIN is faster than INNER JOIN   My SQL command look like below  SELECT   FROM A INNER JOIN B ON     INNER JOIN C ON     INNER JOIN D and so on  Update  This is brief of my schema   FROM sidisaleshdrmly a -- NOT HAVE PK AND FK     INNER JOIN sidisalesdetmly b -- THIS TABLE ALSO HAVE NO PK AND FK         ON a CompanyCd   b CompanyCd             AND a SPRNo   b SPRNo             AND a SuffixNo   b SuffixNo             AND a dnno   b dnno     INNER JOIN exFSlipDet h -- PK   CompanyCd  FSlipNo  FSlipSuffix  FSlipLine         ON a CompanyCd   h CompanyCd            AND a sprno   h AcctSPRNo     INNER JOIN exFSlipHdr c -- PK   CompanyCd  FSlipNo  FSlipSuffix         ON c CompanyCd   h CompanyCd            AND c FSlipNo   h FSlipNo             AND c FSlipSuffix   h FSlipSuffix      INNER JOIN coMappingExpParty d -- NO PK AND FK         ON c CompanyCd   d CompanyCd            AND c CountryCd   d CountryCd      INNER JOIN coProduct e -- PK   CompanyCd  ProductSalesCd         ON b CompanyCd   e CompanyCd            AND b ProductSalesCd   e ProductSalesCd      LEFT JOIN coUOM i -- PK   UOMId         ON h UOMId   i UOMId      INNER JOIN coProductOldInformation j -- PK   CompanyCd  BFStatus  SpecCd         ON a CompanyCd   j CompanyCd             AND b BFStatus   j BFStatus             AND b ProductSalesCd   j ProductSalesCd     INNER JOIN coProductGroup1 g1 -- PK   CompanyCd  ProductCategoryCd  UsedDepartment  ProductGroup1Cd         ON e ProductGroup1Cd    g1 ProductGroup1Cd     INNER JOIN coProductGroup2 g2 -- PK   CompanyCd  ProductCategoryCd  UsedDepartment  ProductGroup2Cd         ON e ProductGroup1Cd    g2 ProductGroup1Cd

User · Answer

There is one important scenario that can lead to an outer join being faster than an inner join that has not been discussed yet.

When using an outer join, the optimizer is always free to drop the outer joined table from the execution plan if the join columns are the PK of the outer table, and none of the outer table columns are referenced outside of the outer join itself. For example SELECT A.* FROM A LEFT OUTER JOIN B ON A.KEY=B.KEY and B.KEY is the PK for B. Both Oracle (I believe I was using release 10) and Sql Server (I used 2008 R2) prune table B from the execution plan.

The same is not necessarily true for an inner join: SELECT A.* FROM A INNER JOIN B ON A.KEY=B.KEY may or may not require B in the execution plan depending on what constraints exist.

If A.KEY is a nullable foreign key referencing B.KEY, then the optimizer cannot drop B from the plan because it must confirm that a B row exists for every A row.

If A.KEY is a mandatory foreign key referencing B.KEY, then the optimizer is free to drop B from the plan because the constraints guarantee the existence of the row. But just because the optimizer can drop the table from the plan, doesn't mean it will. SQL Server 2008 R2 does NOT drop B from the plan. Oracle 10 DOES drop B from the plan. It is easy to see how the outer join will out-perform the inner join on SQL Server in this case.

This is a trivial example, and not practical for a stand-alone query. Why join to a table if you don't need to?

But this could be a very important design consideration when designing views. Frequently a "do-everything" view is built that joins everything a user might need related to a central table. (Especially if there are naive users doing ad-hoc queries that do not understand the relational model) The view may include all the relevent columns from many tables. But the end users might only access columns from a subset of the tables within the view. If the tables are joined with outer joins, then the optimizer can (and does) drop the un-needed tables from the plan.

It is critical to make sure that the view using outer joins gives the correct results. As Aaronaught has said - you cannot blindly substitute OUTER JOIN for INNER JOIN and expect the same results. But there are times when it can be useful for performance reasons when using views.

One last note - I haven't tested the impact on performance in light of the above, but in theory it seems you should be able to safely replace an INNER JOIN with an OUTER JOIN if you also add the condition <FOREIGN_KEY> IS NOT NULL to the where clause.

User · Answer

If everything works as it should it shouldn t  BUT we all know everything doesn t work the way it should especially when it comes to the query optimizer  query plan caching and statistics   First I would suggest rebuilding index and statistics  then clearing the query plan cache just to make sure that s not screwing things up  However I ve experienced problems even when that s done   I ve experienced some cases where a left join has been faster than a inner join   The underlying reason is this  If you have two tables and you join on a column with an index  on both tables   The inner join will produce the same result no matter if you loop over the entries in the index on table one and match with index on table two as if you would do the reverse  Loop over entries in the index on table two and match with index in table one  The problem is when you have misleading statistics  the query optimizer will use the statistics of the index to find the table with least matching entries  based on your other criteria   If you have two tables with 1 million in each  in table one you have 10 rows matching and in table two you have 100000 rows matching  The best way would be to do an index scan on table one and matching 10 times in table two  The reverse would be an index scan that loops over 100000 rows and tries to match 100000 times and only 10 succeed  So if the statistics isn t correct the optimizer might choose the wrong table and index to loop over   If the optimizer chooses to optimize the left join in the order it is written it will perform better than the inner join   BUT  the optimizer may also optimize a left join sub-optimally as a left semi join  To make it choose the one you want you can use the force order hint

User · Answer

From my comparisons  I find that they have the exact same execution plan  There re three scenarios    If and when they return the same results  they have the same speed  However  we must keep in mind that they are not the same queries  and that LEFT JOIN will possibly return more results  when some ON conditions aren t met  --- this is why it s usually slower  When the main table  first non-const one in the execution plan  has a restrictive condition  WHERE id      and the corresponding ON condition is on a NULL value  the  right  table is not joined --- this is when LEFT JOIN is faster  As discussed in Point 1  usually INNER JOIN is more restrictive and returns fewer results and is therefore faster    Both use  the same  indices

User · Answer

Your performance problems are more likely to be because of the number of joins you are doing and whether the columns you are joining on have indexes or not   Worst case you could easily be doing 9 whole table scans for each join

User · Answer

A LEFT JOIN is absolutely not faster than an INNER JOIN   In fact  it s slower  by definition  an outer join  LEFT JOIN or RIGHT JOIN   has to do all the work of an INNER JOIN plus the extra work of null-extending the results   It would also be expected to return more rows  further increasing the total execution time simply due to the larger size of the result set    And even if a LEFT JOIN were faster in specific situations due to some difficult-to-imagine confluence of factors  it is not functionally equivalent to an INNER JOIN  so you cannot simply go replacing all instances of one with the other    Most likely your performance problems lie elsewhere  such as not having a candidate key or foreign key indexed properly   9 tables is quite a lot to be joining so the slowdown could literally be almost anywhere   If you post your schema  we might be able to provide more details     Edit   Reflecting further on this  I could think of one circumstance under which a LEFT JOIN might be faster than an INNER JOIN  and that is when    Some of the tables are very small  say  under 10 rows   The tables do not have sufficient indexes to cover the query    Consider this example   CREATE TABLE  Test1       ID int NOT NULL PRIMARY KEY      Name varchar 50  NOT NULL   INSERT  Test1  ID  Name  VALUES  1   One   INSERT  Test1  ID  Name  VALUES  2   Two   INSERT  Test1  ID  Name  VALUES  3   Three   INSERT  Test1  ID  Name  VALUES  4   Four   INSERT  Test1  ID  Name  VALUES  5   Five    CREATE TABLE  Test2       ID int NOT NULL PRIMARY KEY      Name varchar 50  NOT NULL   INSERT  Test2  ID  Name  VALUES  1   One   INSERT  Test2  ID  Name  VALUES  2   Two   INSERT  Test2  ID  Name  VALUES  3   Three   INSERT  Test2  ID  Name  VALUES  4   Four   INSERT  Test2  ID  Name  VALUES  5   Five    SELECT   FROM  Test1 t1 INNER JOIN  Test2 t2 ON t2 Name   t1 Name  SELECT   FROM  Test1 t1 LEFT JOIN  Test2 t2 ON t2 Name   t1 Name  DROP TABLE  Test1 DROP TABLE  Test2   If you run this and view the execution plan  you ll see that the INNER JOIN query does indeed cost more than the LEFT JOIN  because it satisfies the two criteria above   It s because SQL Server wants to do a hash match for the INNER JOIN  but does nested loops for the LEFT JOIN  the former is normally much faster  but since the number of rows is so tiny and there s no index to use  the hashing operation turns out to be the most expensive part of the query   You can see the same effect by writing a program in your favourite programming language to perform a large number of lookups on a list with 5 elements  vs  a hash table with 5 elements   Because of the size  the hash table version is actually slower   But increase it to 50 elements  or 5000 elements  and the list version slows to a crawl  because it s O N  vs  O 1  for the hashtable   But change this query to be on the ID column instead of Name and you ll see a very different story   In that case  it does nested loops for both queries  but the INNER JOIN version is able to replace one of the clustered index scans with a seek - meaning that this will literally be an order of magnitude faster with a large number of rows   So the conclusion is more or less what I mentioned several paragraphs above  this is almost certainly an indexing or index coverage problem  possibly combined with one or more very small tables   Those are the only circumstances under which SQL Server might sometimes choose a worse execution plan for an INNER JOIN than a LEFT JOIN

User · Answer

Have done a number of comparisons between left outer and inner joins and have not been able to find a consisten difference   There are many variables   Am working on a reporting database with thousands of tables many with a large number of fields  many changes over time  vendor versions and local workflow     It is not possible to create all of the combinations of covering indexes to meet the needs of such a wide variety of queries and handle historical data   Have seen inner queries kill server performance because two large  millions to tens of millions of rows  tables are inner joined both pulling a large number of fields and no covering index exists   The biggest issue though  doesn t seem to appeaer in the discussions above  Maybe your database is well designed with triggers and well designed transaction processing to ensure good data   Mine frequently has NULL values where they aren t expected   Yes the table definitions could enforce no-Nulls but that isn t an option in my environment   So the question is    do you design your query only for speed  a higher priority for transaction processing that runs the same code thousands of times a minute   Or do you go for accuracy that a left outer join will provide   Remember that inner joins must find matches on both sides  so an unexpected NULL will not only remove data from the two tables but possibly entire rows of information  And it happens so nicely  no error messages    You can be very fast as getting 90  of the needed data and not discover the inner joins have silently removed information   Sometimes inner joins can be faster  but I don t believe anyone making that assumption unless they have reviewed the execution plan   Speed is important  but accuracy is more important

User · Answer

Try both queries  the one with inner and left join  with OPTION  FORCE ORDER  at the end and post the results  OPTION  FORCE ORDER  is a query hint that forces the optimizer to build the execution plan with the join order you provided in the query   If INNER JOIN starts performing as fast as LEFT JOIN  it s because    In a query composed entirely by INNER JOINs  the join order doesn t matter  This gives freedom for the query optimizer to order the joins as it sees fit  so the problem might rely on the optimizer  With LEFT JOIN  that s not the case because changing the join order will alter the results of the query  This means the engine must follow the join order you provided on the query  which might be better than the optimized one    Don t know if this answers your question but I was once in a project that featured highly complex queries making calculations  which completely messed up the optimizer  We had cases where a FORCE ORDER would reduce the execution time of a query from 5 minutes to 10 seconds

User · Answer

I found something interesting in SQL server when checking if inner joins are faster than left joins   If you dont include the items of the left joined table  in the select statement  the left join will be faster than the same query with inner join   If you do include the left joined table in the select statement  the inner join with the same query was equal or faster than the left join

User · Answer

Outer joins can offer superior performance when used in views   Say you have a query that involves a view  and that view is comprised of 10 tables joined together   Say your query only happens to use columns from 3 out of those 10 tables   If those 10 tables had been inner-joined together  then the query optimizer would have to join them all even though your query itself doesn t need 7 out of 10 of the tables   That s because the inner joins themselves might filter down the data  making them essential to compute   If those 10 tables had been outer-joined together instead  then the query optimizer would only actually join the ones that were necessary  3 out of 10 of them in this case   That s because the joins themselves are no longer filtering the data  and thus unused joins can be skipped   Source  http   www sqlservercentral com blogs sql coach 2010 07 29 poor-little-misunderstood-views

[sql] INNER JOIN vs LEFT JOIN performance in SQL Server

Examples related to sql

Examples related to sql-server

Examples related to performance