Editorials

A Conversation on Composite Keys

Aaron and I have ended up with a bit of a conversation regarding composite keys. With his permission I am including more of that conversation today as a followup from the editorial yesterday.

Aaron:

Thanks for considering my opinion yesterday. I think one of the other issues I’ve found with composite keys is the composite key needs to be included in multiple tables, which amounts to the same data in multiple tables. In my experience duplicate data can be as dangerous as duplicate code. If we want to change the data type of one of the columns in the composite keys, we have multiple tables that will need to have their columns modified.

Ben:

I’m confused. The purpose for having id columns is to have a key that is system assigned, and never changes. A table with a composite key often has the keys defined as system assigned IDs from more than one parent table.

To help communicate my confusion I am including a simple database design using composite keys. There is a hierarchy of Company/Region/State/Store. The following ERD demonstrates this hierarchy, each child table having the primary key of the parent above. As you can see, as you go deeper into the hierarchy, each table assigns it’s own identifier, while including the ID(s) of the direct parent. As a result, the lowest level table, Store, may be joined to the highest level table, Customer, without joining any of the intermediate tables.

Aaron has a point about having to join on multiple columns. In the case of joining a Store to a State, the immediate parent, one would have to join on the CustomerId, RegionId and StateId values.

In tests I have run, the additional load for including the additional key values in the join is insignificant when compare to the extra requirement to join store to state to region simply for the purpose of joining the Store to the Company. This may be a frequent need in many systems, and the need to go through that entire set of relationships so that I may join a Store to a Company is disappointing.

The reality is that on many more complicated databases, this series of joins can be exacerbated having only surrogate keys. One medical claims database I worked on required the join of 24 table in order to gather the necessary 4 required tables, which, if composite keys had been used, would readily be joined excluding 20 other tables.

You smart readers are wondering why we don’t simply include the columns from the ancestor tables so that we may join back to them, but not include them in the primary key? You could do that. In the Store table you could simply have the CustomerId and RegionId columns as foreign keys. However, probably the best clustered index on the Store table would be based on an index of CustomerId, RegionId, StateId, StoreID, resulting in organization of the data based on the same hierarchy defining it, and most likely the way it will be retrieved. So the one clustered index supports both a primary key and a clustered index. It also enforces the relationships to all ancestors with a single relationship.

Additionally, were you to maintain independent relationships to ancestor tables, it is possible to have misalligned relationships with ancestors. For example, a state may be related to both a Customer and a Region independently. However, the Region it relates to may have a relationship with a different Customer than the Customer in the State Relationship. Using composite keys that is an impossible condition, and the point of a referential database.

The point that there may be data issues defies the concept of surrogate keys. A surrogate key lives for the life of the record and is never changed, and it’s data type is never modified. As a result, referential integrity resolves issues of data corruption.

The one area where I agree fully is that there are a lot more columns included in the join of these tables. A join of the State to the Store table requires three columns instead of just one. So, if that is the only kind of join you are ever going to perform within your database, it can make sense. However, if you find you need to jump levels in your joins, the composite key greatly simplifies your queries and improves your performance.

Aaron, thank you so much for your collegial responses. You certainly have plenty more room available to you if I have not understood your position correctly.

Like Aaron you can be a part of this or any conversation here on SSWUG by leaving your thoughts online, or sending an Email to btaylor@sswug.org.

Cheers,

Ben