Editorials

Fill Factor

We have touched on Fill Factor a couple of times recently. This is a concept I thought would prove useful to dig into a little deeper for some of our readers who are not as familiar with how data is stored in SQL Server.

As I stated earlier, data is stored physically inside a table that is designed internally as a b-tree. If you have taken a data structures course you may remember that a b-tree is one of the most efficient data structures for storing and retrieving constantly changing data. Here’s an introduction to b-trees with a little math to help you understand how they work for GeeksForGeeks. All indexes in SQL Server are formed in a b-tree. If a table has a CLUSTERED index, then that index is also organized as a b-tree. However, the data is physically contained in the CLUSTERED b-tree structure. All other indexes are organized by the keys, and then have a pointer to the physical location of the complete record stored in the CLUSTERED index.

Fill factor is a technique used to allow rapid changes the keys in a b-tree to be completed more quickly. If your b-tree nodes allow 10 keys, then once you have added 10 items to a node, and another records needs to be inserted, then the tree must be re-balanced to make room for the new key value. GeeksForGeeks has another article demonstrating re-balancing a tree. Tree balancing can be a costly process. While it is occurring, your database isn’t doing anything else (in theory). So, in order to reduce the number of re-balancing events on a table, a fill factor setting was designed, allowing you to reserve spaces on nodes to add new values. If you have a 90% fill factor on an index, and it allows a maximum of 10 key values per node, then the table will start with only 9 of the key values populated when the index is created.

You can set the fill factor to a smaller value. The appropriate value will depend on the kind of traffic your table/index experiences. If you are adding a lot of data consistently, then a lower fill factor may work better, unless you are using a sequential value as the key in your index. I worked on a very high volume OLTP system where we used a random key generation technique to spread data across the index. We then used an 80% fill factor, resulting in high performance reads and writes. This is not an easy solution to implement, and is most likely not even necessary with hardware available today.

Another thing about fill factor; it can be used with any index, not just the clustered index. In this case, your data may very well be more distributed. Even though non-clustered indexes take less work to balance, if they don’t have additional include columns, a fill factor of less than 100 percent may reduce how often non-clustered indexes need to be re-balanced.

Take a look into fill factor, if you have a database that generates a lot of data. If the data is introduced in a bulk format, fill factor has limited value. It may not be useful in data warehousing. We’ll talk a little more about that tomorrow.

Cheers,

Ben