Editorials

What’s Your Cluster?

Webcast: SQL Service Broker Advanced Performance Tips and Tricks
We will be looking at some of the advanced features of SQL Service Broker as well as some of the advanced techniques which can be used to maximize the performance of the SQL Service Broker.

> Register Now
> Live date: 11/3/2010 at 12:00 Pacific

New SelecTViews Show Posted
SelectViews – William Pearson joins us on the show today – talking about learning SQL Server, experiences and more.
[Watch the Show Here]

Featured Article(s)
Cross Join, Cartesian Join, and the non-Join
A cross join, as it is most commonly named, is one such types of query that many steer clear of; assuming that it is a faulty method to go after data. This is not always the case.

What’s Your Cluster?

The documented best practice for a clustered index has changed over the years.

When I was studying SQL Server 6.0 (and earlier) the documented best practice for a clustered index was to use a column that had a low degree of selectivity. For example, in a customer table the state column would be a good candidate because it had a low number of possible options in the United States. Even including territories, etc there would be less than 100 options. This was one of the questions I still remember from the SQL Server 2000 certification exam. I worked hard to pass that exam so it stuck with me.

The best practice, since the release of SQL Server 2005, is to use a highly unique, and preferably incremental value for a clustered index such as an Identity column. In this case, all of the data is added to the right most node of the BTree which would have created a hot spot in SQL Server 2000.

In SQL Server 2000, the thought was that if you used a non-unique clustered index, then your inserts would spread out more evenly in the BTree, resulting in less need for re-balancing. Especially if you did not use a fill factor of 100%.

Non-clustered indexes now had problems with the clustered index not having a unique value. They required something unique allowing them to point to the original record. So, the engine provides a hidden uniquifier (same kind of thing as an identity column) that is not visible to the end user. This was combined with the data on which you created the clustered index in order for non-clustered indexes to have pointers to the physical data.

So, why add the extra hidden column? Why not just use the identity column in the first place? When you create a primary key on an identity column (sometimes a surrogate key) then the result is optimum according to MS current best practices. This is the most common data design in the SQL Server world today, the use of surrogate keys.

Using the SQL Server visual table designer automatically creates a primary key if the user specifies one. Supporting that primary key, it creates a unique clustered index. This results in an index that supports the unique requirement for the primary key, and also provides the clustered index so that when tables are joined, the underlying data is located in the quickest fashion possible.

What happens when you use a natural data element instead of a surrogate key? The designer uses whatever you choose as your primary key to generate a unique clustered index showing up as CONSTRAINT [pk name] PRIMARY KEY CLUSTERED (Column List).

When you use a composite set of columns, as is common for a many to many intersection table, each of the key values is now part of the clustered index. If you have non-clustered indexes on this table, each of those columns is now also a part of that non-clustered index.

Simply because an identity column is a unique sequential value makes it a great candidate for a clustered index. Often we don’t include them in tables where natural keys are used because the uniqueness would be redundant. Using an identity column on a table performing many to many joins makes it possible to have non-unique values, without an additional unique index.

Some designers included the identity column in many to many tables while not making them a primary key. In this design, the identity column can be used for a clustered index. Making it unique allows it to be used as the pointer for all non-clustered indexes without requiring a uniquifier. The unique index still remains on the natural, composite key, and it still performs as the primary key. So, the argument that an identity column violates data integrity is no longer an issue. The fact that the identity column is unique allows it to be used as a secondary key, and searched quickly by a single value, or used for auditing.

So, what about the hot spot? Did MS make modifications to the storage engine so that we no longer need to re-balance the BTree due to hot spots? Or is that just the price we pay for having more efficient non-clustered indexes? When you start working with data warehousing or other highly changing data stores, you really need to take this into consideration. Having a hot spot can really slow down imports.

What are your experiences? Want to share your insight or history? Send me a note at btaylor@sswug.org. I’d love to share your experiences and insights with the rest of our readers.

Cheers,

Ben