Editorials

Index Optimization

While considering the different options for primary keys, and the validation of Composite Keys Ravi asks the question regarding how indexing may work. He writes:

I would want to know your thoughts on the clustered indexes. I would create my clustered index

  • a) on a transactional table
    • column/columns which is heavily queried upon, I would prefer it to be a numeric and sequential most of the time and does not allow nulls.
    • on a date/time column if the table is kind of logging table
  • on a static/lookup table
    • I would primarily do it on a column which is very frequently queried upon and does not allow nulls. I would be okay even if the column/columns are non-numeric.


Because of the Microsoft’s recommendation on ID (auto increment) column on every table, people often considers the ID column to be the column for clustered index because there is no need to worry about fragmentation during insert because it is sequential. But my question is,


  • insert happens only once, what about all the SELECT/UPDATE/DELETE s which needs to seek a row, Do we not want to consider the clustered index being on a heavily queried column on?

  • What about the fill factor we leave on each page to avoid page split just in case if the data is not sequential?


Please provide me your thoughts on the clustered index column selection and correct me if I am wrong.

I have a lot of thoughts for you to consider on your index implementation. Today I’m going to hold them allowing others to share their experience first with this one exception. You mentioned in your description of a clustered index based on a sequential value as resolving the issue of fragmentation. This is not a correct assertion.

Fragmentation occurs when table data is not contained on contiguous disk allocations. When a table is stored on a heap it is possible for an 8k page to contain data from multiple tables, resulting in fairly granular fragmentation. When there is a clustered index on a table, data is stored in that clustered index. A clustered index will acquire complete pages to store data. Therefore, table having a clustered index will not share a page with another table.

However, even though each page is completely allocated to a specific table, it does not mean as pages are allocated for a table that they are contiguous. Tables can be fragmented on a page level as well, sometimes requiring defragmentation. Defragmentation is performed by re-building the clustered index. Non-clustered indexes may become fragmented as well, and require rebuilding in order to optimize them for performance.

I have some high volume databases with millions of records where fragmentation becomes painful, and I really don’t have a window in which to defragment the data. For this situation I have found that creating a distinct file for that single table allows it to acquire contiguous space. It does not become fragmented because it does not share any of the disk allocation with any other table. I give up a lot of free disk space in order to reduce fragmentation. I gain high performance during data entry and retrieval without having to rebuild the index.

In extreme situations I have even separated out database files for non-clustered indexes on large tables. Even though the files are on a SAN sharing the same spindles, the performance increase is very visible due to the reduction in fragmentation.

Do you have additional insight for Ravi’s question? Drop me a note to btaylor@sswug.org or leave a comment here online.

Cheers,

Ben