Understanding Database Indexes – (Part – 1)
Author: Basit A. Farooq
Editor’s Note: In this two part article series, you will learn about SQL Server indexes, and how they help to achieve optimal query performance while reducing the overall response time.
As databases administrator (DBA) or developer, your one of the most important goal is to ensure that query times are consistent with the service level agreement (SLA) and are meeting user expectations. Along with other performance enhancement techniques, creating indexes for your queries on underlying tables is one of the most effective and common way to achieve this objective.
The indexes of the underlying relational tables are very similar in purpose to an index section in the back of the book. For example, instead of flipping through each page of the book, you use the index section in the back of the book to quickly find the particular information or topic within the book. In the same way, instead of scanning each individual row on the data page, SQL Server uses indexes to quickly find the data for the qualifying query. Therefore, by indexing an underlying relational table, you can significantly enhance the performance of your database.
Indexing affects processing speed for both OLTP (online transaction processing) and OLAP (online analytical processing) and help you achieve optimum query performance and response time.
Cost associated with indexes
As mentioned earlier, SQL Server uses indexes to optimize overall query performance. However, there is also a cost associated with indexes, that is, indexes slow down insert, update, and delete operations. Therefore, it is important to consider cost and benefits associated with indexes when you plan your indexing strategy.
How SQL Server uses indexes
A table that doesn’t have a clustered index is stored in a set of data pages called a heap. Initially, the data in heaps are stored in the order in which the rows are inserted into the table. However, SQL Server Database Engine moves the data around the heap to store the rows efficiently. Therefore, for heaps, you cannot predict the order of the rows because data pages are not sequenced in any particular order. The only way to guarantee the order of the rows from a heap is to use SELECT statement with ORDER BY clause.
Access without an index
When you access the data, SQL Server first determines if there is a suitable index available for the submitted SELECT statement. If no suitable index is found for the submitted SELECT statement, SQL Server retrieves the data by scanning the entire table. The database engine begins scanning at the physical beginning of the table, and scans through the full table page-by-page and row-by-row to look for qualifying data specified in the submitted SELECT statement. Then extracts and returns rows meeting the criteria in the format specified in the submitted SELECT statement.
Access with an index
The process is improved when indexes are present. If the appropriate index available, SQL Server uses it to locate the data. An index improves the search process by sorting data on key columns. The database engine begins scanning at the first page of the index and only scans those pages that potentially contain qualifying data, based on the index structure and key columns. Finally retrieves the data rows or pointers that contain the locations of the data rows to allow direct row retrieval.
Indexes structure
In SQL Server, all indexes except full-text, XML, in-memory optimized and columnstore indexes are organized as a balanced tree (B-tree). This is because, full-text indexes use its own engine to manage and query full-text catalog, XML indexes are stored as internal SQL Server tables, in-memory optimized indexes uses Bw-tree structure, and columnstore indexes utilize SQL Server in-memory technology.
In B-tree structure, each page is called a node. The top page of the B-tree structure is called root node. Non-leaf nodes are also referred to as the intermediate levels are hierarchical tree nodes that comprise the index sort order. Non-leaf nodes point to other non-leaf nodes one step down in the B-tree hierarchy until reaching the leaf nodes. The leaf nodes are at the bottom of the B-tree hierarchy. The following figure illustrates the typical B-tree structure.
Index Types
In SQL Server 2014, you can create following types of indexes:
Clustered Indexes
A clustered index sorts table or views rows in order based on the clustered index key column values. In short, a leaf node of clustered index contains data pages, and scanning them will return the actual data rows. Therefore, table can have only one clustered index. Unless explicitly specified as Nonclustered, SQL Server automatically creates the clustered index when you define a PRIMARY KEY constraint on a table.
When to have a clustered index on a table?
Although it is not mandatory to have a clustered index per table, but, according to the MSDN article “Clustered Index Design Guidelines”, with few exceptions, every table should have a clustered index defined on the column or columns that used as follows:
- The table is large and does not have nonclustered index. Having clustered index improves performance. Because, without it, all rows of the table should be read to find any row.
- Column or columns are frequently queried, and data is returned in sorted ordered. Having clustered index on the sorting column or columns prevents sorting operation and returns the data in sorted order.
- Column or columns are frequently queried, and data is grouped together. As data must be sorted before it is grouped, having clustered index on the sorting column or columns prevents sorting operation.
- Column or columns data that are frequently used in queries to search data ranges from the table. Having clustered indexes on the range column will avoid sorting entire table data.
Nonclustered indexes
A nonclustered does not sort or store data of the underlying table. This is because, the leaf nodes of the nonclustered indexes are index pages containing pointer to data rows. SQL Server automatically creates nonclustered index, when you define a UNIQUE KEY constraint on a table. A table can have up to 249 nonclustered indexes.
You use CREATE INDEX statement to create clustered and nonclustered indexes. A detailed discussion of the CREATE INDEX statement and its parameters is beyond the scope of this chapter. For help with this, refer to SQL Server Books Online topic: CREATE INDEX (Transact-SQL).
SQL Server 2014 also supports new inline index creation syntax for a standard disk based database tables, temp tables, and table variables. For more information, refer to SQL Server Books Online topic: “CREATE TABLE (SQL Server)”.
Single column index
As its name implies, single column indexes are based on a single key column. You can define it as either clustered or nonclustered. You cannot drop the index key column, or change the data type of the underlying table column without dropping the index first. Single column indexes are useful for queries that are searching the data based on single column value.
Composite index
A composite index is an index that includes two or more columns from the same table. You can define composite index as either clustered or nonclustered. You use composite indexes when you have two or more columns that would be searched together. You typically place the most unique key (the key with the highest degree of selectivity) first in the key list.
For example, examine the following query that returns the list of account numbers and name from Purchasing.Vendor table where name and account number starts with character ‘A’:
USE [AdventureWorks2012];
SELECT [AccountNumber] ,
[Name]
FROM [Purchasing].[Vendor]
WHERE [AccountNumber] LIKE ‘A%’
AND [Name] LIKE ‘A%’;
GO
If you look at the execution plan of this query without modify the existing indexes of the table you will notice that SQL Server query optimizer is using table’s clustered index to retrieve query result:
As our search is based on Name and AccountNumber columns, having the following composite index will improve the query execution time significantly:
USE [AdventureWorks2012];
GO
CREATE NONCLUSTERED INDEX [AK_Vendor _ AccountNumber_Name]
ON [Purchasing].[Vendor] ([AccountNumber] ASC, [Name] ASC) ON [PRIMARY];
GO
Now, examine the query execution plan of this query once again after creating the above composite index on Purchasing.Vendor table:
As you can see, SQL Server is now doing seek on this composite index to retrieve the qualifying data.
Continue to Part-2…