[sql] Is there a performance difference between CTE , Sub-Query, Temporary Table or Table Variable?

In this excellent SO question, differences between CTE and sub-queries were discussed.

I would like to specifically ask:

In what circumstance is each of the following more efficient/faster?

  • CTE
  • Sub-Query
  • Temporary Table
  • Table Variable

Traditionally, I've used lots of temp tables in developing stored procedures - as they seem more readable than lots of intertwined sub-queries.

Non-recursive CTEs encapsulate sets of data very well, and are very readable, but are there specific circumstances where one can say they will always perform better? or is it a case of having to always fiddle around with the different options to find the most efficient solution?


EDIT

I've recently been told that in terms of efficiency, temporary tables are a good first choice as they have an associated histogram i.e. statistics.

This question is related to sql sql-server tsql subquery common-table-expression

The answer is


Just 2 things I think make it ALWAYS preferable to use a # Temp Table rather then a CTE are:

  1. You can not put a primary key on a CTE so the data being accessed by the CTE will have to traverse each one of the indexes in the CTE's tables rather then just accessing the PK or Index on the temp table.

  2. Because you can not add constraints, indexes and primary keys to a CTE they are more prone to bugs creeping in and bad data.


-onedaywhen yesterday

Here is an example where #table constraints can prevent bad data which is not the case in CTE's

DECLARE @BadData TABLE ( 
                       ThisID int
                     , ThatID int );
INSERT INTO @BadData
       ( ThisID
       , ThatID
       ) 
VALUES
       ( 1, 1 ),
       ( 1, 2 ),
       ( 2, 2 ),
       ( 1, 1 );

IF OBJECT_ID('tempdb..#This') IS NOT NULL
    DROP TABLE #This;
CREATE TABLE #This ( 
             ThisID int NOT NULL
           , ThatID int NOT NULL
                        UNIQUE(ThisID, ThatID) );
INSERT INTO #This
SELECT * FROM @BadData;
WITH This_CTE
     AS (SELECT *
           FROM @BadData)
     SELECT *
       FROM This_CTE;

#temp is materalized and CTE is not.

CTE is just syntax so in theory it is just a subquery. It is executed. #temp is materialized. So an expensive CTE in a join that is execute many times may be better in a #temp. On the other side if it is an easy evaluation that is not executed but a few times then not worth the overhead of #temp.

The are some people on SO that don't like table variable but I like them as the are materialized and faster to create than #temp. There are times when the query optimizer does better with a #temp compared to a table variable.

The ability to create a PK on a #temp or table variable gives the query optimizer more information than a CTE (as you cannot declare a PK on a CTE).


There is no rule. I find CTEs more readable, and use them unless they exhibit some performance problem, in which case I investigate the actual problem rather than guess that the CTE is the problem and try to re-write it using a different approach. There is usually more to the issue than the way I chose to declaratively state my intentions with the query.

There are certainly cases when you can unravel CTEs or remove subqueries and replace them with a #temp table and reduce duration. This can be due to various things, such as stale stats, the inability to even get accurate stats (e.g. joining to a table-valued function), parallelism, or even the inability to generate an optimal plan because of the complexity of the query (in which case breaking it up may give the optimizer a fighting chance). But there are also cases where the I/O involved with creating a #temp table can outweigh the other performance aspects that may make a particular plan shape using a CTE less attractive.

Quite honestly, there are way too many variables to provide a "correct" answer to your question. There is no predictable way to know when a query may tip in favor of one approach or another - just know that, in theory, the same semantics for a CTE or a single subquery should execute the exact same. I think your question would be more valuable if you present some cases where this is not true - it may be that you have discovered a limitation in the optimizer (or discovered a known one), or it may be that your queries are not semantically equivalent or that one contains an element that thwarts optimization.

So I would suggest writing the query in a way that seems most natural to you, and only deviate when you discover an actual performance problem the optimizer is having. Personally I rank them CTE, then subquery, with #temp table being a last resort.


Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to sql-server

Passing multiple values for same variable in stored procedure SQL permissions for roles Count the Number of Tables in a SQL Server Database Visual Studio 2017 does not have Business Intelligence Integration Services/Projects ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database How to create temp table using Create statement in SQL Server? SQL Query Where Date = Today Minus 7 Days How do I pass a list as a parameter in a stored procedure? SQL Server date format yyyymmdd

Examples related to tsql

Passing multiple values for same variable in stored procedure Count the Number of Tables in a SQL Server Database Change Date Format(DD/MM/YYYY) in SQL SELECT Statement Stored procedure with default parameters Format number as percent in MS SQL Server EXEC sp_executesql with multiple parameters SQL Server after update trigger How to compare datetime with only date in SQL Server Text was truncated or one or more characters had no match in the target code page including the primary key in an unpivot Printing integer variable and string on same line in SQL

Examples related to subquery

What is the difference between LATERAL and a subquery in PostgreSQL? Postgres Error: More than one row returned by a subquery used as an expression How does Subquery in select statement work in oracle Difference between Subquery and Correlated Subquery How to do this in Laravel, subquery where in SQL LEFT JOIN Subquery Alias Subquery returned more than 1 value.This is not permitted when the subquery follows =,!=,<,<=,>,>= or when the subquery is used as an expression subquery in FROM must have an alias Is there a performance difference between CTE , Sub-Query, Temporary Table or Table Variable? How can I insert values into a table, using a subquery with more than one result?

Examples related to common-table-expression

SQL Server CTE and recursion example getting "No column was specified for column 2 of 'd'" in sql server cte? Update records in table from CTE How to create Temp table with SELECT * INTO tempTable FROM CTE Query Is there a performance difference between CTE , Sub-Query, Temporary Table or Table Variable? The maximum recursion 100 has been exhausted before statement completion When to use Common Table Expression (CTE) Combining INSERT INTO and WITH/CTE Keeping it simple and how to do multiple CTE in a query Can you create nested WITH clauses for Common Table Expressions?