SQL Joins Vs SQL Subqueries Performance

Question

I wish to know if I have a join query something like this -   Select E Id E Name from Employee E join Dept D on E DeptId D Id   and a subquery something like this -  Select E Id E Name from Employee Where DeptId in  Select Id from Dept    When I consider performance which of the two queries would be faster and why    Also is there a time when I should prefer one over the other   Sorry if this is too trivial and asked before but I am confused about it  Also  it would be great if you guys can suggest me tools i should use to measure performance of two queries  Thanks a lot

User · Answer

The performance should be the same  it s much more important to have the correct indexes and clustering applied on your tables  there exist some good resources on that topic     Edited to reflect the updated question

User · Answer

The two queries may not be semantically equivalent  If a employee works for more than one department  possible in the enterprise I work for  admittedly  this would imply your table is not fully normalized  then the first query would return duplicate rows whereas the second query would not  To make the queries equivalent in this case  the DISTINCT keyword would have to be added to the SELECT clause  which may have an impact on performance   Note there is a design rule of thumb that states a table should model an entity class or a relationship between entities classes but not both  Therefore  I suggest you create a third table  say OrgChart  to model the relationship between employees and departments

User · Answer

Performance is based on the amount of data you are executing on     If it is less data around 20k  JOIN works better   If the data is more like 100k  then IN works better   If you do not need the data from the other table  IN is good  But it is alwys better to go for EXISTS    All these criterias I tested and the tables have proper indexes

User · Answer

Well  I believe it s an  Old but Gold  question  The answer is   It depends     The performances are such a delicate subject that it would be too much silly to say   Never use subqueries  always join   In the following links  you ll find some basic best practices that I have found to be very helpful     Optimizing Subqueries Optimizing Subqueries with Semijoin Transformations Rewriting Subqueries as Joins   I have a table with 50000 elements  the result i was looking for was 739 elements   My query at first was this   SELECT  p id      p fixedId      p azienda id      p categoria id      p linea      p tipo      p nome FROM prodotto p WHERE p azienda id   2699 AND p anno         SELECT MAX p2 anno       FROM prodotto p2      WHERE p2 fixedId   p fixedId      and it took 7 9s to execute   My query at last is this   SELECT  p id      p fixedId      p azienda id      p categoria id      p linea      p tipo      p nome FROM prodotto p WHERE p azienda id   2699 AND  p fixedId  p anno  IN       SELECT p2 fixedId  MAX p2 anno      FROM prodotto p2     WHERE p azienda id   p2 azienda id     GROUP BY p2 fixedId     and it took 0 0256s  Good SQL  good

User · Answer

Start to look at the execution plans to see the differences in how the SQl Server will interpret them   You can also use Profiler to actually run the queries multiple times and get the differnce   I would not expect these to be so horribly different  where you can get get real  large performance gains in using joins instead of subqueries is when you use correlated subqueries   EXISTS is often better than either of these two and when you are talking left joins where you want to all records not in the left join table  then  NOT EXISTS is often a much better choice

User · Answer

I would EXPECT the first query to be quicker  mainly because you have an equivalence and an explicit JOIN   In my experience IN is a very slow operator  since SQL normally evaluates it as a series of WHERE clauses separated by  OR   WHERE x Y OR x Z OR       As with ALL THINGS SQL though   your mileage may vary   The speed will depend a lot on indexes  do you have indexes on both ID columns   That will help a lot     among other things   The only REAL way to tell with 100  certainty which is faster is to turn on performance tracking  IO Statistics is especially useful  and run them both   Make sure to clear your cache between runs

User · Answer

You can use an Explain Plan to get an objective answer     For your problem  an Exists filter would probably perform the fastest

User · Answer

I know this is an old post  but I think this is a very important topic  especially nowadays where we have 10M  records and talk about terabytes of data     I will also weight in with the following observations   I have about 45M records in my table   data    and about 300 records in my  cats  table   I have extensive indexing for all of the queries I am about to talk about   Consider Example 1   UPDATE d set category   c categoryname FROM  data  d JOIN  cats  c on c id   d catid   versus Example 2   UPDATE d set category    SELECT TOP 1  c categoryname FROM  cats  c where c id   d catid  FROM  data  d   Example 1 took about 23 mins to run   Example 2 took around 5 mins   So I would conclude that sub-query in this case is much faster  Of course keep in mind that I am using M 2 SSD drives capable of i o   1GB sec  thats bytes not bits   so my indexes are really fast too   So this may affect the speeds too in your circumstance  If its a one-off data cleansing  probably best to just leave it run and finish   I use TOP 10000  and see how long it takes and multiply by number of records before I hit the big query   If you are optimizing production databases  I would strongly suggest pre-processing data  i e  use triggers or job-broker to async update records  so that real-time access retrieves static data

[sql] SQL Joins Vs SQL Subqueries (Performance)?

Examples related to sql

Examples related to performance

Examples related to sql-server-2008

Examples related to subquery

Examples related to join