Join vs sub-query

Question

I am an old-school MySQL user and have always preferred JOIN over sub-query  But nowadays everyone uses sub-query  and I hate it  I don t know why      I lack the theoretical knowledge to judge for myself if there is any difference  Is a sub-query as good as a JOIN and therefore is there nothing to worry about

User · Answer

Sub-queries are the logically correct way to solve problems of the form, "Get facts from A, conditional on facts from B". In such instances, it makes more logical sense to stick B in a sub-query than to do a join. It is also safer, in a practical sense, since you don't have to be cautious about getting duplicated facts from A due to multiple matches against B.

Practically speaking, however, the answer usually comes down to performance. Some optimisers suck lemons when given a join vs a sub-query, and some suck lemons the other way, and this is optimiser-specific, DBMS-version-specific and query-specific.

Historically, explicit joins usually win, hence the established wisdom that joins are better, but optimisers are getting better all the time, and so I prefer to write queries first in a logically coherent way, and then restructure if performance constraints warrant this.

User · Answer

In most cases JOINs are faster than sub-queries and it is very rare for a sub-query to be faster   In JOINs RDBMS can create an execution plan that is better for your query and can predict what data should be loaded to be processed and save time  unlike the sub-query where it will run all the queries and load all their data to do the processing   The good thing in sub-queries is that they are more readable than JOINs  that s why most new SQL people prefer them  it is the easy way  but when it comes to performance  JOINS are better in most cases even though they are not hard to read too

User · Answer

I think what has been under-emphasized in the cited answers is the issue of duplicates and problematic results that may arise from specific  use  cases    although Marcelo Cantos does mention it   I will cite the example from Stanford s Lagunita courses on SQL   Student Table   ------ -------- ------ --------    sID    sName    GPA    sizeHS    ------ -------- ------ --------     123   Amy       3 9     1000      234   Bob       3 6     1500      345   Craig     3 5      500      456   Doris     3 9     1000      567   Edward    2 9     2000      678   Fay       3 8      200      789   Gary      3 4      800      987   Helen     3 7      800      876   Irene     3 9      400      765   Jay       2 9     1500      654   Amy       3 9     1000      543   Craig     3 4     2000    ------ -------- ------ --------    Apply Table   applications made to specific universities and majors    ------ ---------- ---------------- ----------    sID    cName      major            decision    ------ ---------- ---------------- ----------     123   Stanford   CS               Y             123   Stanford   EE               N             123   Berkeley   CS               Y             123   Cornell    EE               Y             234   Berkeley   biology          N             345   MIT        bioengineering   Y             345   Cornell    bioengineering   N             345   Cornell    CS               Y             345   Cornell    EE               N             678   Stanford   history          Y             987   Stanford   CS               Y             987   Berkeley   CS               Y             876   Stanford   CS               N             876   MIT        biology          Y             876   MIT        marine biology   N             765   Stanford   history          Y             765   Cornell    history          N             765   Cornell    psychology       Y             543   MIT        CS               N           ------ ---------- ---------------- ----------    Let s try to find the GPA scores for students that have applied to CS major  regardless of the university   Using a subquery   select GPA from Student where sID in  select sID from Apply where major    CS      ------    GPA     ------     3 9      3 5      3 7      3 9      3 4    ------    The average value for this resultset is   select avg GPA  from Student where sID in  select sID from Apply where major    CS      --------------------    avg GPA               --------------------    3 6800000000000006    --------------------    Using a join   select GPA from Student  Apply where Student sID   Apply sID and Apply major    CS     ------    GPA     ------     3 9      3 9      3 5      3 7      3 7      3 9      3 4    ------    average value for this resultset   select avg GPA  from Student  Apply where Student sID   Apply sID and Apply major    CS     -------------------    avg GPA              -------------------    3 714285714285714    -------------------    It is obvious that the second attempt yields misleading results in our use case  given that it counts duplicates for the computation of the average value  It is also evident that usage of distinct with the join - based statement will not eliminate the problem  given that it will erroneously keep one out of three occurrences of the 3 9 score  The correct case is to account for TWO  2  occurrences of the 3 9 score given that we actually have TWO  2  students with that score that comply with our query criteria   It seems that in some cases a sub-query is the safest way to go  besides any performance issues

User · Answer

In the year 2010 I would have joined the author of this questions and would have strongly voted for JOIN  but with much more experience  especially in MySQL  I can state  Yes subqueries can be better  I ve read multiple answers here  some stated subqueries are faster  but it lacked a good explanation  I hope I can provide one with this  very  late answer   First of all  let me say the most important  There are different forms of sub-queries  And the second important statement  Size matters  If you use sub-queries  you should be aware of how the DB-Server executes the sub-query  Especially if the sub-query is evaluated once or for every row  On the other side  a modern DB-Server is able to optimize a lot  In some cases a subquery helps optimizing a query  but a newer version of the DB-Server might make the optimization obsolete   Sub-queries in Select-Fields  SELECT moo   SELECT roger FROM wilco WHERE moo   me  AS bar FROM foo   Be aware that a sub-query is executed for every resulting row from foo  Avoid this if possible  it may drastically slow down your query on huge datasets  However  if the sub-query has no reference to foo it can be optimized by the DB-server as static content and could be evaluated only once   Sub-queries in the Where-statement  SELECT moo FROM foo WHERE bar    SELECT roger FROM wilco WHERE moo   me    If you are lucky  the DB optimizes this internally into a JOIN  If not  your query will become very  very slow on huge datasets because it will execute the sub-query for every row in foo  not just the results like in the select-type   Sub-queries in the Join-statement  SELECT moo  bar    FROM foo      LEFT JOIN         SELECT MIN bar   me FROM wilco GROUP BY me       ON moo   me   This is interesting  We combine JOIN with a sub-query  And here we get the real strength of sub-queries  Imagine a dataset with millions of rows in wilco but only a few distinct me  Instead of joining against a huge table  we have now a smaller temporary table to join against  This can result in much faster queries depending on database size  You can have the same effect with CREATE TEMPORARY TABLE     and INSERT INTO     SELECT      which might provide better readability on very complex queries  but can lock datasets in a repeatable read isolation level    Nested sub-queries  SELECT moo  bar   FROM       SELECT moo  CONCAT roger  wilco  AS bar       FROM foo       GROUP BY moo       HAVING bar LIKE  SpaceQ       AS temp foo   ORDER BY bar   You can nest sub-queries in multiple levels  This can help on huge datasets if you have to group or sort the results  Usually the DB-Server creates a temporary table for this  but sometimes you do not need sorting on the whole table  only on the resultset  This might provide much better performance depending on the size of the table   Conclusion  Sub-queries are no replacement for a JOIN and you should not use them like this  although possible   In my humble opinion  the correct use of a sub-query is the use as a quick replacement of CREATE TEMPORARY TABLE      A good sub-query reduces a dataset in a way you cannot accomplish in an ON statement of a JOIN  If a sub-query has one of the keywords GROUP BY or DISTINCT and is preferably not situated in the select fields or the where statement  then it might improve performance a lot

User · Answer

Subqueries have ability to calculate aggregation functions on a fly  E g  Find minimal price of the book and get all books which are sold with this price  1  Using Subqueries   SELECT titles  price FROM Books  Orders WHERE price     SELECT MIN price   FROM Orders  AND  Books ID Orders ID     2  using JOINs  SELECT MIN price       FROM Orders  ----------------- 2 99  SELECT titles  price FROM Books b INNER JOIN  Orders o ON b ID   o ID WHERE o price   2 99

User · Answer

If you want to speed up your query using join   For  inner join join   Don t use where condition instead use it in  ON  condition  Eg        select id name from table1 a      join table2 b on a name b name    where id  123    Try       select id name from table1 a      join table2 b on a name b name and a id  123    For  Left Right Join   Don t use in  ON  condition  Because if you use left right join it will get all rows for any one table So  No use of using it in  On   So  Try to use  Where  condition

User · Answer

As per my observation like two cases  if a table has less then 100 000 records then the join will work fast   But in the case that a table has more than 100 000 records then a subquery is best result    I have one table that has 500 000 records on that I created below query and its result time is like  SELECT    FROM crv workorder details wd  inner join  crv workorder wr on wr workorder id   wd workorder id       Result   13 3 Seconds   select    from crv workorder details  where workorder id in  select workorder id from crv workorder       Result   1 65 Seconds

User · Answer

Taken from the MySQL manual  13 2 10 11 Rewriting Subqueries as Joins       A LEFT  OUTER  JOIN can be faster than an equivalent subquery because the server might be able to optimize it better   a fact that is not specific to MySQL Server alone    So subqueries can be slower than LEFT  OUTER  JOIN  but in my opinion their strength is slightly higher readability

User · Answer

The difference is only seen when the second joining table has significantly more data than the primary table  I had an experience like below     We had a users table of one hundred thousand entries and their membership data  friendship  about 3 hundred thousand entries  It was a join statement in order to take friends and their data  but with a great delay  But it was working fine where there was only a small amount of data in the membership table  Once we changed it to use a sub-query it worked fine   But in the mean time the join queries are working with other tables that have fewer entries than the primary table   So I think the join and sub query statements are working fine and it depends on the data and the situation

User · Answer

I just thinking about the same problem  but I am using subquery in the FROM part  I need connect and query from large tables  the  slave  table have 28 million record but the result is only 128 so small result big data  I am using MAX   function on it   First I am using LEFT JOIN because I think that is the correct way  the mysql can optimalize etc   Second time just for testing  I rewrite to sub-select against the JOIN   LEFT JOIN runtime  1 12s SUB-SELECT runtime  0 06s  18 times faster the subselect than the join  Just in the chokito adv  The subselect looks terrible but the result

User · Answer

Run on a very large database from an old Mambo CMS   SELECT id  alias FROM   mos categories WHERE   id IN       SELECT       DISTINCT catid     FROM mos content        0 seconds  SELECT   DISTINCT mos content catid    mos categories alias FROM   mos content  mos categories WHERE   mos content catid   mos categories id     3 seconds  An EXPLAIN shows that they examine the exact same number of rows  but one takes 3 seconds and one is near instant  Moral of the story  If performance is important  when isn t it    try it multiple ways and see which one is fastest   And     SELECT   DISTINCT mos categories id    mos categories alias FROM   mos content  mos categories WHERE   mos content catid   mos categories id    0 seconds  Again  same results  same number of rows examined  My guess is that DISTINCT mos content catid takes far longer to figure out than DISTINCT mos categories id does

User · Answer

MySQL version  5 5 28-0ubuntu0 12 04 2-log  I was also under the impression that JOIN is always better than a sub-query in MySQL  but EXPLAIN is a better way to make a judgment  Here is an example where sub queries work better than JOINs   Here is my query with 3 sub-queries   EXPLAIN SELECT vrl list id vrl ontology id vrl position l name AS list name  vrlih position AS previous position  vrl moved date  FROM  vote-ranked-listory  vrl  INNER JOIN lists l ON l list id   vrl list id  INNER JOIN  vote-ranked-list-item-history  vrlih ON vrl list id   vrlih list id AND vrl ontology id vrlih ontology id AND vrlih type  PREVIOUS POSITION   INNER JOIN list burial state lbs ON lbs list id   vrl list id AND lbs burial score  lt  0 5  WHERE vrl position  lt   15 AND l status  ACTIVE  AND l is public 1 AND vrl ontology id  lt  1000000000   AND  SELECT list id FROM list tag WHERE list id l list id AND tag id 43  IS NULL   AND  SELECT list id FROM list tag WHERE list id l list id AND tag id 55  IS NULL   AND  SELECT list id FROM list tag WHERE list id l list id AND tag id 246403  IS NOT NULL  ORDER BY vrl moved date DESC LIMIT 200    EXPLAIN shows    ---- -------------------- ---------- -------- ----------------------------------------------------- -------------- --------- ------------------------------------------------- ------ --------------------------    id   select type          table      type     possible keys                                         key            key len   ref                                               rows   Extra                       ---- -------------------- ---------- -------- ----------------------------------------------------- -------------- --------- ------------------------------------------------- ------ --------------------------     1   PRIMARY              vrl        index    PRIMARY                                               moved date     8         NULL                                               200   Using where                   1   PRIMARY              l          eq ref   PRIMARY status ispublic idx lookup is public status   PRIMARY        4         ranker vrl list id                                   1   Using where                   1   PRIMARY              vrlih      eq ref   PRIMARY                                               PRIMARY        9         ranker vrl list id ranker vrl ontology id const      1   Using where                   1   PRIMARY              lbs        eq ref   PRIMARY idx list burial state burial score            PRIMARY        4         ranker vrl list id                                   1   Using where                   4   DEPENDENT SUBQUERY   list tag   ref      list tag key list id tag id                           list tag key   9         ranker l list id const                               1   Using where  Using index      3   DEPENDENT SUBQUERY   list tag   ref      list tag key list id tag id                           list tag key   9         ranker l list id const                               1   Using where  Using index      2   DEPENDENT SUBQUERY   list tag   ref      list tag key list id tag id                           list tag key   9         ranker l list id const                               1   Using where  Using index    ---- -------------------- ---------- -------- ----------------------------------------------------- -------------- --------- ------------------------------------------------- ------ --------------------------    The same query with JOINs is   EXPLAIN SELECT vrl list id vrl ontology id vrl position l name AS list name  vrlih position AS previous position  vrl moved date  FROM  vote-ranked-listory  vrl  INNER JOIN lists l ON l list id   vrl list id  INNER JOIN  vote-ranked-list-item-history  vrlih ON vrl list id   vrlih list id AND vrl ontology id vrlih ontology id AND vrlih type  PREVIOUS POSITION   INNER JOIN list burial state lbs ON lbs list id   vrl list id AND lbs burial score  lt  0 5  LEFT JOIN list tag lt1 ON lt1 list id   vrl list id AND lt1 tag id   43  LEFT JOIN list tag lt2 ON lt2 list id   vrl list id AND lt2 tag id   55  INNER JOIN list tag lt3 ON lt3 list id   vrl list id AND lt3 tag id   246403  WHERE vrl position  lt   15 AND l status  ACTIVE  AND l is public 1 AND vrl ontology id  lt  1000000000  AND lt1 list id IS NULL AND lt2 tag id IS NULL  ORDER BY vrl moved date DESC LIMIT 200    and the output is    ---- ------------- ------- -------- ----------------------------------------------------- -------------- --------- --------------------------------------------- ------ ----------------------------------------------    id   select type   table   type     possible keys                                         key            key len   ref                                           rows   Extra                                           ---- ------------- ------- -------- ----------------------------------------------------- -------------- --------- --------------------------------------------- ------ ----------------------------------------------     1   SIMPLE        lt3     ref      list tag key list id tag id                           tag id         5         const                                         2386   Using where  Using temporary  Using filesort      1   SIMPLE        l       eq ref   PRIMARY status ispublic idx lookup is public status   PRIMARY        4         ranker lt3 list id                               1   Using where                                       1   SIMPLE        vrlih   ref      PRIMARY                                               PRIMARY        4         ranker lt3 list id                             103   Using where                                       1   SIMPLE        vrl     ref      PRIMARY                                               PRIMARY        8         ranker lt3 list id ranker vrlih ontology id     65   Using where                                       1   SIMPLE        lt1     ref      list tag key list id tag id                           list tag key   9         ranker lt3 list id const                         1   Using where  Using index  Not exists              1   SIMPLE        lbs     eq ref   PRIMARY idx list burial state burial score            PRIMARY        4         ranker vrl list id                               1   Using where                                       1   SIMPLE        lt2     ref      list tag key list id tag id                           list tag key   9         ranker lt3 list id const                         1   Using where  Using index                        ---- ------------- ------- -------- ----------------------------------------------------- -------------- --------- --------------------------------------------- ------ ----------------------------------------------    A comparison of the rows column tells the difference and the query with JOINs is using Using temporary  Using filesort   Of course when I run both the queries  the first one is done in 0 02 secs  the second one does not complete even after 1 min  so EXPLAIN explained these queries properly   If I do not have the INNER JOIN on the list tag table i e  if I remove   AND  SELECT list id FROM list tag WHERE list id l list id AND tag id 246403  IS NOT NULL     from the first query and correspondingly   INNER JOIN list tag lt3 ON lt3 list id   vrl list id AND lt3 tag id   246403   from the second query  then EXPLAIN returns the same number of rows for both queries and both these queries run equally fast

User · Answer

First of all  to compare the two first you should distinguish queries with subqueries to         a class of subqueries that always have corresponding equivalent query written with joins    a class of subqueries that can not be rewritten using joins   For the first class of queries a good RDBMS will see joins and subqueries as equivalent and will produce same query plans   These days even mysql does that   Still  sometimes it does not  but this does not mean that joins will always win - I had cases when using subqueries in mysql improved performance   For example if there is something preventing mysql planner to correctly estimate the cost and if the planner doesn t see the join-variant and subquery-variant as same then subqueries can outperform the joins by forcing a certain path       Conclusion is that you should test your queries for both join and subquery variants if you want to be sure which one will perform better   For the second class the comparison makes no sense as those queries can not be rewritten using joins and in these cases subqueries are natural way to do the required tasks and you should not discriminate against them

User · Answer

It depends on several factors  including the specific query you re running  the amount of data in your database  Subquery runs the internal queries first and then from the result set again filter out the actual results  Whereas in join runs the and produces the result in one go  The best strategy is that you should test both the join solution and the subquery solution to get the optimized solution

User · Answer

Use EXPLAIN to see how your database executes the query on your data  There is a huge  it depends  in this answer     PostgreSQL can rewrite a subquery to a join or a join to a subquery when it thinks one is faster than the other  It all depends on the data  indexes  correlation  amount of data  query  etc

User · Answer

MSDN Documentation for SQL Server says      Many Transact-SQL statements that include subqueries can be alternatively formulated as joins  Other questions can be posed only with subqueries  In Transact-SQL  there is usually no performance difference between a statement that includes a subquery and a semantically equivalent version that does not  However  in some cases where existence must be checked  a join yields better performance  Otherwise  the nested query must be processed for each result of the outer query to ensure elimination of duplicates  In such cases  a join approach would yield better results    so if you need something like   select   from t1 where exists select   from t2 where t2 parent t1 id   try to use join instead  In other cases  it makes no difference   I say  Creating functions for subqueries eliminate the problem of cluttter and allows you to implement additional logic to subqueries  So I recommend creating functions for subqueries whenever possible    Clutter in code is a big problem and the industry has been working on avoiding it for decades

User · Answer

Subqueries are generally used to return a single row as an atomic value  though they may be used to compare values against multiple rows with the IN keyword  They are allowed at nearly any meaningful point in a SQL statement  including the target list  the WHERE clause  and so on  A simple sub-query could be used as a search condition  For example  between a pair of tables  SELECT title  FROM books  WHERE author id         SELECT id      FROM authors      WHERE last name    Bar  AND first name    Foo      Note that using a normal value operator on the results of a sub-query requires that only one field must be returned  If you re interested in checking for the existence of a single value within a set of other values  use IN  SELECT title  FROM books  WHERE author id IN       SELECT id FROM authors WHERE last name      A-E       This is obviously different from say a LEFT-JOIN where you just want to join stuff from table A and B even if the join-condition doesn t find any matching record in table B  etc  If you re just worried about speed you ll have to check with your database and write a good query and see if there s any significant difference in performance

User · Answer

A general rule is that joins are faster in most cases  99    The more data tables have  the subqueries are slower  The less data tables have  the subqueries have equivalent speed as joins  The subqueries are simpler  easier to understand  and easier to read  Most of the web and app frameworks and their  ORM s and  Active record s generate queries with subqueries  because with subqueries are easier to split responsibility  maintain code  etc  For smaller web sites or apps subqueries are OK  but for larger web sites and apps you will often have to rewrite generated queries to join queries  especial if a query uses many subqueries in the query    Some people say  some RDBMS can rewrite a subquery to a join or a join to a subquery when it thinks one is faster than the other    but this statement applies to simple cases  surely not for complicated queries with subqueries which actually cause a problems in performance

User · Answer

These days  many dbs can optimize subqueries and joins  Thus  you just gotto examine your query using explain and see which one is faster  If there is not much difference in performance  I prefer to use subquery as they are simple and easier to understand

[sql] Join vs. sub-query

Examples related to sql

Examples related to mysql

Examples related to subquery

Examples related to join