Editorials

Is Fill Factor Important?

In SQL Server, data is stored in a B-Tree structure. If you are not familiar with a B-Tree you may want to take a look at Wikipedia for more details.

As you can see in the example of the image below (Copied from the Wikipedia entry referenced above), not every node is completely filled in with values. The less dense the values are in every node, the easier it is to insert new data. Whenever a node becomes completely full, and a new value needs to be introduced into that node, then the tree needs to be re-balanced in order to make room for the new data.

Each table can have one clustered index. It is still stored in a B-Tree structure. The difference is that the actual table data is stored in the clustered B-Tree. For this reason a table may only have one clustered index. It also means that if a clustered index does not have room on a node for new data, it must be re-balanced, and the re-balancing effort not only impacts the key for the index, but all the data contained in that clustered index. In short, re-balancing a clustered index is a much heavier load than re-balancing a non-clustered index.

For this reason, Microsoft by default recommends you create a clustered index based on a sequential value such that every new record always falls at the right most node in the B-Tree. This means that the index rarely has to rebalance the upper nodes, and all low level nodes simply fill up and a new nodes are started as needed. Because this is the nature of a sequential key, then the clustered index may have a very high percentage fill factor, usually 100%. This means that no space is retained in the index for new data to be inserted into the middle of the B-Tree somewhere. The resulting B-Tree is very dense with the highest amount of data in the smallest footprint so select queries are faster.

But not all data is sequential, dealing with an artificially generated key. This is usually the case in non-clustered indexes, or clustered indexes derived from composite key values. In this case, data is added into the B-Tree all over, not simply at the last node. When this happens, if each node is fully populated, fill factor of 100%, then for every insert the B-Tree must be re-balanced. Since non-clustered indexes may be smaller this may be an acceptable value. However, on a system with a high degree of insert activity, it sometimes makes sense to reduce the select performance in place of more efficiency by having a lower fill factor allowing room for new records to insert in the middle of the B-Tree structure.

It is also for this reason that most enterprise systems separate reporting from transaction processing. The needs of the two systems have very different index requirements. By having a reporting database you are not as interested in performance for inserts, because your customer is not waiting for them to complete. So, you can have non-clustered indexes that have a high fill factor, causing a lot more re-balancing of the trees, but having even higher performance when selecting data.

The separation also allows for your transactions to be entered quickly, while allowing a slower time for the data to be optimized for reporting.

Is fill factor working for you? Share you comments with us by leaving your notes below, or drop me an email at btaylor@sswug.org.

Cheers,

Ben