Why use the INCLUDE clause when creating an index

Question

While studying for the 70-433 exam I noticed you can create a covering index in one of the following two ways   CREATE INDEX idx1 ON MyTable  Col1  Col2  Col3    -- OR --  CREATE INDEX idx1 ON MyTable  Col1  INCLUDE  Col2  Col3    The INCLUDE clause is new to me   Why would you use it and what guidelines would you suggest in determining whether to create a covering index with or without the INCLUDE clause

User · Answer

You would use the INCLUDE to add one or more columns to the leaf level of a non-clustered index, if by doing so, you can "cover" your queries.

Imagine you need to query for an employee's ID, department ID, and lastname.

SELECT EmployeeID, DepartmentID, LastName
FROM Employee
WHERE DepartmentID = 5

If you happen to have a non-clustered index on (EmployeeID, DepartmentID), once you find the employees for a given department, you now have to do "bookmark lookup" to get the actual full employee record, just to get the lastname column. That can get pretty expensive in terms of performance, if you find a lot of employees.

If you had included that lastname in your index:

CREATE NONCLUSTERED INDEX NC_EmpDep 
  ON Employee(EmployeeID, DepartmentID)
  INCLUDE (Lastname)

then all the information you need is available in the leaf level of the non-clustered index. Just by seeking in the non-clustered index and finding your employees for a given department, you have all the necessary information, and the bookmark lookup for each employee found in the index is no longer necessary --> you save a lot of time.

Obviously, you cannot include every column in every non-clustered index - but if you do have queries which are missing just one or two columns to be "covered" (and that get used a lot), it can be very helpful to INCLUDE those into a suitable non-clustered index.

User · Answer

The reasons why  including the data in the leaf level of the index  have been nicely explained   The reason that you give two shakes about this  is that when you run your query  if you don t have the additional columns included  new feature in SQL 2005  the SQL Server has to go to the clustered index to get the additional columns which takes more time  and adds more load to the SQL Server service  the disks  and the memory  buffer cache to be specific  as new data pages are loaded into memory  potentially pushing other more often needed data out of the buffer cache

User · Answer

One reason to prefer INCLUDE over key-columns if you don t need that column in the key is documentation  That makes evolving indexes much more easy in the future   Considering your example   CREATE INDEX idx1 ON MyTable  Col1  INCLUDE  Col2  Col3    That index is best if your query looks like this   SELECT col2  col3   FROM MyTable  WHERE col1         Of course you should not put columns in INCLUDE if you can get an additional benefit from having them in the key part  Both of the following queries would actually prefer the col2 column in the key of the index   SELECT col2  col3   FROM MyTable  WHERE col1          AND col2         SELECT TOP 1 col2  col3   FROM MyTable  WHERE col1        ORDER BY col2   Let s assume this is not the case and we have col2 in the INCLUDE clause because there is just no benefit of having it in the tree part of the index   Fast forward some years   You need to tune this query   SELECT TOP 1 col2   FROM MyTable  WHERE col1        ORDER BY another col   To optimize that query  the following index would be great   CREATE INDEX idx1 ON MyTable  Col1  another col  INCLUDE  Col2    If you check what indexes you have on that table already  your previous index might still be there   CREATE INDEX idx1 ON MyTable  Col1  INCLUDE  Col2  Col3    Now you know that Col2 and Col3 are not part of the index tree and are thus not used to narrow the read index range nor for ordering the rows  Is is rather safe to add another column to the end of the key-part of the index  after col1   There is little risk to break anything   DROP INDEX idx1 ON MyTable  CREATE INDEX idx1 ON MyTable  Col1  another col  INCLUDE  Col2  Col3     That index will become bigger  which still has some risks  but it is generally better to extend existing indexes compared to introducing new ones   If you would have an index without INCLUDE  you could not know what queries you would break by adding another col right after Col1   CREATE INDEX idx1 ON MyTable  Col1  Col2  Col3    What happens if you add another col between Col1 and Col2  Will other queries suffer   There are other  benefits  of INCLUDE vs  key columns if you add those columns just to avoid fetching them from the table  However  I consider the documentation aspect the most important one   To answer your question      what guidelines would you suggest in determining whether to create a covering index with or without the INCLUDE clause    If you add a column to the index for the sole purpose to have that column available in the index without visiting the table  put it into the INCLUDE clause   If adding the column to the index key brings additional benefits  e g  for order by or because it can narrow the read index range  add it to the key   You can read a longer discussion about this here   https   use-the-index-luke com blog 2019-04 include-columns-in-btree-indexes

User · Answer

If the column is not in the WHERE JOIN GROUP BY ORDER BY  but only in the column list in the SELECT clause is where you use INCLUDE  The INCLUDE clause adds the data at the lowest leaf level  rather than in the index tree  This makes the index smaller because it s not part of the tree INCLUDE columns are not key columns in the index  so they are not ordered  This means it isn t really useful for predicates  sorting etc as I mentioned above  However  it may be useful if you have a residual lookup in a few rows from the key column s  Another MSDN article with a worked example

User · Answer

There is a limit to the total size of all columns inlined into the index definition  That said though  I have never had to create index that wide   To me  the bigger advantage is the fact that you can cover more queries with one index that has included columns as they don t have to be defined in any particular order  Think about is as an index within the index  One example would be the StoreID  where StoreID is low selectivity meaning that each store is associated with a lot of customers  and then customer demographics data  LastName  FirstName  DOB   If you just inline those columns in this order  StoreID  LastName  FirstName  DOB   you can only efficiently search for customers for which you know StoreID and LastName   On the other hand  defining the index on StoreID and including LastName  FirstName  DOB columns would let you in essence do two seeks- index predicate on StoreID and then seek predicate on any of the included columns  This would let you cover all possible search permutationsas as long as it starts with StoreID

User · Answer

Basic index columns are sorted  but included columns are not sorted  This saves resources in maintaining the index  while still making it possible to provide the data in the included columns to cover a query  So  if you want to cover queries  you can put the search criteria to locate rows into the sorted columns of the index  but then  include  additional  unsorted columns with non-search data  It definitely helps with reducing the amount of sorting and fragmentation in index maintenance

User · Answer

This discussion is missing out on the important point  The question is not if the  non-key-columns  are better to include as index-columns or as included-columns   The question is how expensive it is to use the include-mechanism to include columns that are not really needed in index   typically not part of where-clauses  but often included in selects   So your dilemma is always    Use index on id1  id2     idN alone or  Use index on id1  id2     idN plus include col1  col2     colN   Where  id1  id2     idN are columns often used in restrictions and col1  col2     colN are columns often selected  but typically not used in restrictions   The option to include all of these columns as part of the index-key is just always silly  unless they are also used in restrictions  - cause it would always be more expensive to maintain since the index must be updated and sorted even when the  keys  have not changed    So use option 1 or 2   Answer  If your table is rarely updated - mostly inserted into deleted from - then it is relatively inexpensive to use the include-mechanism to include some  hot columns   that are often used in selects - but not often used on restrictions  since inserts deletes require the index to be updated sorted anyway and thus little extra overhead is associated with storing off a few extra columns while already updating the index  The overhead is the extra memory and CPU used to store redundant info on the index   If the columns you consider to add as included-columns are often updated  without the index-key-columns being updated  - or - if it is so many of them that the index becomes close to a copy of your table - use option 1 I d suggest  Also if adding certain include-column s  turns out to make no performance-difference - you might want to skip the idea of adding them   Verify that they are useful    The average number of rows per same values in keys  id1  id2     idN  can be of some importance as well   Notice that if a column - that is added as an included-column of index - is used in the restriction  As long as the index as such can be used  based on restriction against index-key-columns  - then SQL Server is matching the column-restriction against the index  leaf-node-values  instead of going the expensive way around the table itself

User · Answer

An additional consideraion that I have not seen in the answers already given  is that included columns can be of data types that are not allowed as index key columns  such as varchar max    This allows you to include such columns in a covering index  I recently had to do this to provide a nHibernate generated query  which had a lot of columns in the SELECT  with a useful index

[sql-server] Why use the INCLUDE clause when creating an index?

Examples related to sql-server

Examples related to sql-server-2008

Examples related to sql-server-2005

Examples related to indexing