Editorials

Final Thoughts on Composite Keys

Today Tai adds his insight to our composite key example. He brings up 4 points which I interact with inline. My intention is not to discount the thoughts of others, but to add some counter point and stimulate thought and discussion.

1) The article doesn’t take the amount of data in the table as a factor. Surely, we don’t want a transaction table that has millions of rows, and has CustomerId, RegionId, StateId, StoreId, and TransactionId as composite keys. That will kill data insertion.

This is a great point. You always want great performance. Is insertion the highest activity in your database, then it may be important. Having a composite key you have only one index to manage including the Customer, Region, State and Store. If they are simply other columns in the table, you must create separate indexes supporting those additional relationships. In my example, one index takes the place of five. Which one will perform better for inserts?

My experience shows one clustered composite index performs better than one clustered index and four non-clustered indexes.

2) Having said that, it means the look up tables such as regions, states, etc. have only a handful of data. Would there be a significant differences between using and not using composite keys?

I agree that the other tables are simple facts. The composite keys are not as useful for searching within them as there will not be millions of records. What pays off is that they can be joined at any point within the hierarchy just as transaction tables.

3) If someone wants to report on the Store table by region only, does the table need another index on RegionId to prevent table scan? In my experience, that would be a yes; and if that is the case, the composite is not helping search performance at all.

Since the data is hierarchical, it is unlikely a region will be known without also knowing the Customer to which the region belongs. A search on a clustered index with Customer and Region ID will be extremely fast as a composite clustered index. Indeed, the additional data will be stored in that node of the btree. You will actually gain better database performance because you have one index that has high performance, rather than having to maintain multiple indexes for special purposes.

What will not perform well would be searching for all transactions that were from a specific store without supplying all the ancestor keys (customerID, RegionID, StateId) as well.

4) The medical database is an example of bad design or design needed to be refactored due to new business changes. For our example, I would have CustomerId and StoreId in the transaction table instead of StoreID only.

I agree. This medical database example would benefit from de-normalization and including the key values for ancestor tables. It is generally considered a bad practice to allow for circular relationships as you suggest. However, if the additional columns are passed through in a single composite foreign keys you are not creating circular relationships.

To your point, in a data warehouse, using a star schema, the method you recommend would more likely be one implementation. Another implementation would be to have a single fact table with all four location attributes included, much like a time dimension table would include year, quarter, month, day, hour, minute, as a single fact.

I’m sure many of you have different experiences and opinions. Feel free to add your thoughts online. Tomorrow we are going to pick up a different topic in the newsletter. If you have a different topic you’d like to see considered in our daily editorials, drop a note to btaylor@sswug.org.

Cheers,

Ben