In this excellent SO question, differences between CTE
and sub-queries
were discussed.
I would like to specifically ask:
In what circumstance is each of the following more efficient/faster?
Traditionally, I've used lots of temp tables
in developing stored procedures
- as they seem more readable than lots of intertwined sub-queries.
Non-recursive CTE
s encapsulate sets of data very well, and are very readable, but are there specific circumstances where one can say they will always perform better? or is it a case of having to always fiddle around with the different options to find the most efficient solution?
EDIT
I've recently been told that in terms of efficiency, temporary tables are a good first choice as they have an associated histogram i.e. statistics.
This question is related to
sql
sql-server
tsql
subquery
common-table-expression
Just 2 things I think make it ALWAYS preferable to use a # Temp Table rather then a CTE are:
You can not put a primary key on a CTE so the data being accessed by the CTE will have to traverse each one of the indexes in the CTE's tables rather then just accessing the PK or Index on the temp table.
Because you can not add constraints, indexes and primary keys to a CTE they are more prone to bugs creeping in and bad data.
-onedaywhen yesterday
Here is an example where #table constraints can prevent bad data which is not the case in CTE's
DECLARE @BadData TABLE (
ThisID int
, ThatID int );
INSERT INTO @BadData
( ThisID
, ThatID
)
VALUES
( 1, 1 ),
( 1, 2 ),
( 2, 2 ),
( 1, 1 );
IF OBJECT_ID('tempdb..#This') IS NOT NULL
DROP TABLE #This;
CREATE TABLE #This (
ThisID int NOT NULL
, ThatID int NOT NULL
UNIQUE(ThisID, ThatID) );
INSERT INTO #This
SELECT * FROM @BadData;
WITH This_CTE
AS (SELECT *
FROM @BadData)
SELECT *
FROM This_CTE;
#temp is materalized and CTE is not.
CTE is just syntax so in theory it is just a subquery. It is executed. #temp is materialized. So an expensive CTE in a join that is execute many times may be better in a #temp. On the other side if it is an easy evaluation that is not executed but a few times then not worth the overhead of #temp.
The are some people on SO that don't like table variable but I like them as the are materialized and faster to create than #temp. There are times when the query optimizer does better with a #temp compared to a table variable.
The ability to create a PK on a #temp or table variable gives the query optimizer more information than a CTE (as you cannot declare a PK on a CTE).
There is no rule. I find CTEs more readable, and use them unless they exhibit some performance problem, in which case I investigate the actual problem rather than guess that the CTE is the problem and try to re-write it using a different approach. There is usually more to the issue than the way I chose to declaratively state my intentions with the query.
There are certainly cases when you can unravel CTEs or remove subqueries and replace them with a #temp table and reduce duration. This can be due to various things, such as stale stats, the inability to even get accurate stats (e.g. joining to a table-valued function), parallelism, or even the inability to generate an optimal plan because of the complexity of the query (in which case breaking it up may give the optimizer a fighting chance). But there are also cases where the I/O involved with creating a #temp table can outweigh the other performance aspects that may make a particular plan shape using a CTE less attractive.
Quite honestly, there are way too many variables to provide a "correct" answer to your question. There is no predictable way to know when a query may tip in favor of one approach or another - just know that, in theory, the same semantics for a CTE or a single subquery should execute the exact same. I think your question would be more valuable if you present some cases where this is not true - it may be that you have discovered a limitation in the optimizer (or discovered a known one), or it may be that your queries are not semantically equivalent or that one contains an element that thwarts optimization.
So I would suggest writing the query in a way that seems most natural to you, and only deviate when you discover an actual performance problem the optimizer is having. Personally I rank them CTE, then subquery, with #temp table being a last resort.
Source: Stackoverflow.com