What do Clustered and Non clustered index actually mean

Question

I have a limited exposure to DB and have only used DB as an application programmer  I want to know about Clustered and Non clustered indexes  I googled and what I found was       A clustered index is a special type of index that reorders  the way   records in the table are physically   stored   Therefore table can have only   one clustered index  The leaf  nodes   of a clustered index contain the data   pages  A nonclustered index is a   special type of index in which  the   logical order of the index does not   match the physical  stored order of   the rows on disk  The leaf node of a    nonclustered index does not consist of   the data pages   Instead  the leaf   nodes contain index rows    What I found in SO was What are the differences between a clustered and a non-clustered index    Can someone explain this in plain English

User · Answer

Clustered Index  A clustered index determine the physical order of DATA in a table For this reason a table have only 1 clustered index     dictionary  No need of any other Index  its already Index according to words   Nonclustered Index  A non clustered index is analogous to an index in a Book The data is stored in one place  The  index is storing in another place and the index have pointers to the storage location of the data For this reason a table have more than 1 Nonclustered index     Chemistry book  at staring there is a separate index to point Chapter location and At the  END  there is another Index pointing the common WORDS location

User · Answer

With a clustered index the rows are stored physically on the disk in the same order as the index  Therefore  there can be only one clustered index   With a non clustered index there is a second list that has pointers to the physical rows  You can have many non clustered indices  although each new index will increase the time it takes to write new records    It is generally faster to read from a clustered index if you want to get back all the columns  You do not have to go first to the index and then to the table   Writing to a table with a clustered index can be slower  if there is a need to rearrange the data

User · Answer

Clustered Index  Primary Key constraint creates clustered Index automatically if no clustered Index already exists on the table  Actual data of clustered index can be stored at leaf level of Index    Non Clustered Index  Actual data of non clustered index is not directly found at leaf node  instead it has to take an additional step to find because it has only values of row locators pointing towards actual data  Non clustered Index can t be sorted as clustered index  There can be multiple non clustered indexes per table  actually it depends on the sql server version we are using  Basically Sql server 2005 allows 249 Non Clustered Indexes and for above versions like 2008  2016 it allows 999 Non Clustered Indexes per table

User · Answer

In SQL Server  row-oriented storage both clustered and nonclustered indexes are organized as B trees    Image Source  The key difference between clustered indexes and non clustered indexes is that the leaf level of the clustered index is the table  This has two implications   The rows on the clustered index leaf pages always contain something for each of the  non-sparse  columns in the table  either the value or a pointer to the actual value   The clustered index is the primary copy of a table   Non clustered indexes can also do point 1 by using the INCLUDE clause  Since SQL Server 2005  to explicitly include all non-key columns but they are secondary representations and there is always another copy of the data around  the table itself   CREATE TABLE T   A INT  B INT  C INT  D INT    CREATE UNIQUE CLUSTERED INDEX ci ON T A  B  CREATE UNIQUE NONCLUSTERED INDEX nci ON T A  B  INCLUDE  C  D   The two indexes above will be nearly identical  With the upper-level index pages containing values for the key columns A  B and the leaf level pages containing A  B  C  D  There can be only one clustered index per table  because the data rows themselves can be sorted in only one order   The above quote from SQL Server books online causes much confusion In my opinion  it would be much better phrased as   There can be only one clustered index per table because the leaf level rows of the clustered index are the table rows   The book s online quote is not incorrect but you should be clear that the  quot sorting quot  of both non clustered and clustered indices is logical  not physical  If you read the pages at leaf level by following the linked list and read the rows on the page in slot array order then you will read the index rows in sorted order but physically the pages may not be sorted  The commonly held belief that with a clustered index the rows are always stored physically on the disk in the same order as the index key is false  This would be an absurd implementation  For example  if a row is inserted into the middle of a 4GB table SQL Server does not have to copy 2GB of data up in the file to make room for the newly inserted row  Instead  a page split occurs  Each page at the leaf level of both clustered and non clustered indexes has the address  File  Page  of the next and previous page in logical key order  These pages need not be either contiguous or in key order  e g  the linked page chain might be 1 2000  lt - gt  1 157  lt - gt  1 7053 When a page split happens a new page is allocated from anywhere in the filegroup  from either a mixed extent  for small tables or a non-empty uniform extent belonging to that object or a newly allocated uniform extent   This might not even be in the same file if the filegroup contains more than one  The degree to which the logical order and contiguity differ from the idealized physical version is the degree of logical fragmentation  In a newly created database with a single file  I ran the following  CREATE TABLE T          X TINYINT NOT NULL       Y CHAR 3000  NULL       CREATE CLUSTERED INDEX ix   ON T X    GO  --Insert 100 rows with values 1 - 100 in random order DECLARE  C1 AS CURSOR           X  AS INT  SET  C1   CURSOR FAST FORWARD FOR SELECT number     FROM   master  spt values     WHERE  type    P             AND number BETWEEN 1 AND 100     ORDER  BY CRYPT GEN RANDOM 4   OPEN  C1   FETCH NEXT FROM  C1 INTO  X   WHILE   FETCH STATUS   0   BEGIN       INSERT INTO T  X        VALUES          X          FETCH NEXT FROM  C1 INTO  X    END  Then checked the page layout with SELECT page id         X         geometry  Point page id  X  0  STBuffer 1  FROM   T        CROSS APPLY sys fn PhysLocCracker     physloc      ORDER  BY page id  The results were all over the place  The first row in key order  with value 1 - highlighted with an arrow below  was on nearly the last physical page   Fragmentation can be reduced or removed by rebuilding or reorganizing an index to increase the correlation between logical order and physical order  After running ALTER INDEX ix ON T REBUILD   I got the following  If the table has no clustered index it is called a heap  Non clustered indexes can be built on either a heap or a clustered index  They always contain a row locator back to the base table  In the case of a heap  this is a physical row identifier  rid  and consists of three components  File Page  Slot   In the case of a Clustered index  the row locator is logical  the clustered index key   For the latter case if the non clustered index already naturally includes the CI key column s  either as NCI key columns or INCLUDE-d columns then nothing is added  Otherwise  the missing CI key column s  silently gets added to the NCI  SQL Server always ensures that the key columns are unique for both types of indexes  The mechanism in which this is enforced for indexes not declared as unique differs between the two index types  however  Clustered indexes get a uniquifier added for any rows with key values that duplicate an existing row  This is just an ascending integer  For non clustered indexes not declared as unique SQL Server silently adds the row locator into the non clustered index key  This applies to all rows  not just those that are actually duplicates  The clustered vs non clustered nomenclature is also used for column store indexes  The paper Enhancements to SQL Server Column Stores states  Although column store data is not really  quot clustered quot  on any key  we decided to retain the traditional SQL Server convention of referring to the primary index as a clustered index

User · Answer

A very simple  non-technical rule-of-thumb would be that clustered indexes are usually used for your primary key  or  at least  a unique column  and that non-clustered are used for other situations  maybe a foreign key   Indeed  SQL Server will by default create a clustered index on your primary key column s   As you will have learnt  the clustered index relates to the way data is physically sorted on disk  which means it s a good all-round choice for most situations

User · Answer

Let me offer a textbook definition on  clustering index   which is taken from 15 6 1 from Database Systems  The Complete Book      We may also speak of clustering indexes  which are indexes on an attribute or attributes such that all of tuples with a fixed value for the search key of this index appear on roughly as few blocks as can hold them    To understand the definition  let s take a look at Example 15 10 provided by the textbook      A relation R a b  that is sorted on attribute a and stored in that   order  packed into blocks  is surely clusterd  An index on a is a   clustering index  since for a given a-value a1  all the tuples with   that value for a are consecutive  They thus appear packed into   blocks  execept possibly for the first and last blocks that contain   a-value a1  as suggested in Fig 15 14  However  an index on b is   unlikely to be clustering  since the tuples with a fixed b-value   will be spread all over the file unless the values of a and b are   very closely correlated      Note that the definition does not enforce the data blocks have to be contiguous on the disk  it only says tuples with the search key are packed into as few data blocks as possible   A related concept is clustered relation  A relation is  clustered  if its tuples are packed into roughly as few blocks as can possibly hold those tuples  In other words  from a disk block perspective  if it contains tuples from different relations  then those relations cannot be clustered  i e   there is a more packed way to store such relation by swapping the tuples of that relation from other disk blocks with the tuples the doesn t belong to the relation in the current disk block   Clearly  R a b  in example above is clustered    To connect two concepts together  a clustered relation can have a clustering index and nonclustering index  However  for non-clustered relation  clustering index is not possible unless the index is built on top of the primary key of the relation    Cluster  as a word is spammed across all abstraction levels of database storage side  three levels of abstraction  tuples  blocks  file   A concept called  clustered file   which describes whether a file  an abstraction for a group of blocks  one or more disk blocks   contains tuples from one relation or different relations  It doesn t relate to the clustering index concept as it is on file level   However  some teaching material likes to define clustering index based on the clustered file definition  Those two types of definitions are the same on clustered relation level  no matter whether they define clustered relation in terms of data disk block or file  From the link in this paragraph       An index on attribute s  A on a file is a clustering index when  All tuples with attribute value A   a are stored sequentially    consecutively  in the data file   Storing tuples consecutively is the same as saying  tuples are packed into roughly as few blocks as can possibly hold those tuples   with minor difference on one talking about file  the other talking about disk   It s because storing tuple consecutively is the way to achieve  packed into roughly as few blocks as can possibly hold those tuples

User · Answer

A clustered index means you are telling the database to store close values actually close to one another on the disk  This has the benefit of rapid scan   retrieval of records falling into some range of clustered index values   For example  you have two tables  Customer and Order   Customer ---------- ID Name Address  Order ---------- ID CustomerID Price   If you wish to quickly retrieve all orders of one particular customer  you may wish to create a clustered index on the  CustomerID  column of the Order table  This way the records with the same CustomerID will be physically stored close to each other on disk  clustered  which speeds up their retrieval   P S  The index on CustomerID will obviously be not unique  so you either need to add a second field to  uniquify  the index or let the database handle that for you but that s another story   Regarding multiple indexes  You can have only one clustered index per table because this defines how the data is physically arranged  If you wish an analogy  imagine a big room with many tables in it  You can either put these tables to form several rows or pull them all together to form a big conference table  but not both ways at the same time  A table can have other indexes  they will then point to the entries in the clustered index which in its turn will finally say where to find the actual data

User · Answer

Clustered Index - A clustered index defines the order in which data is physically stored in a table  Table data can be sorted in only way  therefore  there can be only one clustered index per table  In SQL Server  the primary key constraint automatically creates a clustered index on that particular column   Non-Clustered Index - A non-clustered index doesn   t sort the physical data inside the table  In fact  a non-clustered index is stored at one place and table data is stored in another place  This is similar to a textbook where the book content is located in one place and the index is located in another  This allows for more than one non-clustered index per table It is important to mention here that inside the table the data will be sorted by a clustered index  However  inside the non-clustered index data is stored in the specified order  The index contains column values on which the index is created and the address of the record that the column value belongs to When a query is issued against a column on which the index is created  the database will first go to the index and look for the address of the corresponding row in the table  It will then go to that row address and fetch other column values  It is due to this additional step that non-clustered indexes are slower than clustered indexes  Differences between clustered and Non-clustered index    There can be only one clustered index per table  However  you can create multiple non-clustered indexes on a single table  Clustered indexes only sort tables  Therefore  they do not consume extra storage  Non-clustered indexes are stored in a separate place from the actual table claiming more storage space  Clustered indexes are faster than non-clustered indexes since they don   t involve any extra lookup step    For more information refer to this article

User · Answer

Find below some characteristics of clustered and non-clustered indexes  Clustered Indexes  Clustered indexes are indexes that uniquely identify the rows in an SQL table  Every table can have exactly one clustered index  You can create a clustered index that covers more than one column  For example  create Index index name col1  col2  col        By default  a column with a primary key already has a clustered index   Non-clustered Indexes  Non-clustered indexes are like simple indexes  They are just used for fast retrieval of data  Not sure to have unique data

User · Answer

I realize this is a very old question  but I thought I would offer an analogy to help illustrate the fine answers above  CLUSTERED INDEX If you walk into a public library  you will find that the books are all arranged in a particular order  most likely the Dewey Decimal System  or DDS   This corresponds to the  quot clustered index quot  of the books  If the DDS  for the book you want was 005 7565 F736s  you would start by locating the row of bookshelves that is labeled 001-099 or something like that   This endcap sign at the end of the stack corresponds to an  quot intermediate node quot  in the index   Eventually you would drill down to the specific shelf labelled 005 7450 - 005 7600  then you would scan until you found the book with the specified DDS   and at that point you have found your book  NON-CLUSTERED INDEX But if you didn t come into the library with the DDS  of your book memorized  then you would need a second index to assist you  In the olden days you would find at the front of the library a wonderful bureau of drawers known as the  quot Card Catalog quot   In it were thousands of 3x5 cards -- one for each book  sorted in alphabetical order  by title  perhaps   This corresponds to the  quot non-clustered index quot   These card catalogs were organized in a hierarchical structure  so that each drawer would be labeled with the range of cards it contained  Ka - Kl  for example  i e   the  quot intermediate node quot    Once again  you would drill in until you found your book  but in this case  once you have found it  i e  the  quot leaf node quot    you don t have the book itself  but just a card with an index number  the DDS   with which you could find the actual book in the clustered index  Of course  nothing would stop the librarian from photocopying all the cards and sorting them in a different order in a separate card catalog   Typically there were at least two such catalogs  one sorted by author name  and one by title   In principle  you could have as many of these  quot non-clustered quot  indexes as you want

User · Answer

Clustered Index  Clustered indexes sort and store the data rows in the table or view based on their key values  These are the columns included in the index definition  There can be only one clustered index per table  because the data rows themselves can be sorted in only one order   The only time the data rows in a table are stored in sorted order is when the table contains a clustered index  When a table has a clustered index  the table is called a clustered table  If a table has no clustered index  its data rows are stored in an unordered structure called a heap   Nonclustered  Nonclustered indexes have a structure separate from the data rows  A nonclustered index contains the nonclustered index key values and each key value entry has a pointer to the data row that contains the key value  The pointer from an index row in a nonclustered index to a data row is called a row locator  The structure of the row locator depends on whether the data pages are stored in a heap or a clustered table  For a heap  a row locator is a pointer to the row  For a clustered table  the row locator is the clustered index key   You can add nonkey columns to the leaf level of the nonclustered index to by-pass existing index key limits  and execute fully covered  indexed  queries  For more information  see Create Indexes with Included Columns  For details about index key limits see Maximum Capacity Specifications for SQL Server   Reference  https   docs microsoft com en-us sql relational-databases indexes clustered-and-nonclustered-indexes-described

[sql-server] What do Clustered and Non clustered index actually mean?

Examples related to sql-server

Examples related to performance

Examples related to indexing

Examples related to clustered-index

Examples related to non-clustered-index