SQL Server IN vs EXISTS Performance

Question

I m curious which of the following below would be more efficient   I ve always been a bit cautious about using IN because I believe SQL Server turns the result set into a big IF statement  For a large result set  this could result in poor performance  For small result sets  I m not sure either is preferable  For large result sets  wouldn t EXISTS be more efficient   WHERE EXISTS  SELECT   FROM Base WHERE bx BoxID   Base BoxID AND  Rank    2    vs   WHERE bx BoxID IN  SELECT BoxID FROM Base WHERE  Rank   2

User · Answer

To optimize the EXISTS  be very literal  something just has to be there  but you don t actually need any data returned from the correlated sub-query  You re just evaluating a Boolean condition   So    WHERE EXISTS  SELECT TOP 1 1 FROM Base WHERE bx BoxID   Base BoxID AND  Rank    2   Because the correlated sub-query is RBAR  the first result hit makes the condition true  and it is processed no further

User · Answer

I ve done some testing on SQL Server 2005 and 2008  and on both the EXISTS and the IN come back with the exact same actual execution plan  as other have stated  The Optimizer is optimal      Something to be aware of though  EXISTS  IN  and JOIN can sometimes return different results if you don t phrase your query just right  http   weblogs sqlteam com mladenp archive 2007 05 18 60210 aspx

User · Answer

Off the top of my head and not guaranteed to be correct  I believe the second will be faster in this case     In the first  the correlated subquery will likely cause the subquery to be run for each row  In the second example  the subquery should only run once  since not correlated  In the second example  the IN will short-circuit as soon as it finds a match

User · Answer

There are many misleading answers answers here  including the highly upvoted one  although I don t believe their ops meant harm   The short answer is  These are the same   There are many keywords in the  T- SQL language  but in the end  the only thing that really happens on the hardware is the operations as seen in the execution query plan    The relational  maths theory  operation we do when we invoke  NOT  IN and  NOT  EXISTS is the semi join  anti-join when using NOT   It is not a coincidence that the corresponding sql-server operations have the same name  There is no operation that mentions IN or EXISTS anywhere - only  anti- semi joins  Thus  there is no way that a logically-equivalent IN vs EXISTS choice could affect performance because there is one and only way  the  anti semi join execution operation  to get their results   An example   Query 1   plan    select   from dt where dt customer in  select c code from customer c where c active 0    Query 2   plan    select   from dt where exists  select 1 from customer c where c code dt customer and c active 0

User · Answer

The accepted answer is shortsighted and the question a bit loose in that      1  Neither explicitly mention whether a covering index is present in   the left  right  or both sides       2  Neither takes into account the size of input left side set and   input right side set           The question just mentions an overall large result set     I believe the optimizer is smart enough to convert between  in  vs  exists  when there is a significant cost difference due to  1  and  2   otherwise it may just be used as a hint  e g  exists to encourage use of an a seekable index on the right side      Both forms can be converted to join forms internally  have the join order reversed  and run as loop  hash or merge--based on the estimated row counts  left and right  and index existence in left  right  or both sides

User · Answer

I d go with EXISTS over IN  see below link  SQL Server  JOIN vs IN vs EXISTS - the logical difference  There is a common misconception that IN behaves equally to EXISTS or JOIN in terms of returned results  This is simply not true  IN  Returns true if a specified value matches any value in a subquery or a list  Exists  Returns true if a subquery contains any rows  Join  Joins 2 resultsets on the joining column   Blog credit  https   stackoverflow com users 31345 mladen-prajdic

User · Answer

The execution plans are typically going to be identical in these cases  but until you see how the optimizer factors in all the other aspects of indexes etc   you really will never know

User · Answer

EXISTS will be faster because once the engine has found a hit  it will quit looking as the condition has proved true   With IN  it will collect all the results from the sub-query before further processing

User · Answer

So  IN is not the same as EXISTS nor it will produce the same execution plan   Usually EXISTS is used in a correlated subquery  that means you will JOIN the EXISTS inner query with your outer query  That will add more steps to produce a result as you need to solve the outer query joins and the inner query joins then match their where clauses to join both   Usually IN is used without correlating the inner query with the outer query  and that can be solved in only one step  in the best case scenario    Consider this    If you use IN and the inner query result is millions of rows of distinct values  it will probably perform SLOWER than EXISTS given that the EXISTS query is performant  has the right indexes to join with the outer query   If you use EXISTS and the join with your outer query is complex  takes more time to perform  no suitable indexes  it will slow the query by the number of rows in the outer table  sometimes the estimated time to complete can be in days  If the number of rows is acceptable for your given hardware  or the cardinality of data is correct  for example fewer DISTINCT values in a large data set  IN can perform faster than EXISTS  All of the above will be noted when you have a fair amount of rows on each table  by fair I mean something that exceeds your CPU processing and or ram thresholds for caching     So the ANSWER is it DEPENDS  You can write a complex query inside IN or EXISTS  but as a rule of thumb  you should try to use IN with a limited set of distinct values and EXISTS when you have a lot of rows with a lot of distinct values   The trick is to limit the number of rows to be scanned   Regards   MarianoC

[sql-server] SQL Server IN vs. EXISTS Performance

Examples related to sql-server

Examples related to sql-server-2005

Examples related to exists

Examples related to query-performance

Examples related to sql-in