Editorials

Denormalization – Just Say No…?

When I wrote about denormalizing your tables, it was with an eye to optimizing things for different data use cases. While do I do think there are cases where it can be helpful, one of the comments received was important as you consider how you want to address this.

From Jim:

"***[E]very*** time I have denormalized, I have later regretted it.

When changes to the database schema are undertaken, at best, things are more complicated. At worst, it can be a nightmare.

I was a senior DBA for a large financial services company and early on for a particular high volume (billions of rows), high availability (never down) application was optionally going to record text data. Since it was optional, it would make sense that it would be a one-to-many logical child having an identifying relationship to the main table. But no, I was persuaded for performance reasons to include the text columns in the main table. After all, one big I/O is better than several smaller ones, correct?

The problem is, we were at the ragged edge of the number of rows to the database page. Add a new column, and you would slip to the next lower number of rows to a page. This would necessitate a restructuring of all the data before the users/processes would be able to access the data.

Just for one seemingly innocuous two or three byte column. Needless to say, adding a new column was out of the question.

So I ask your readers, was it really worth it? Think about it and count the real cost to your business and to your mental health."

Great points. Clearly this points to testing – and to other options that could be considered as well. I have to wonder if there is a better way to go after the necessary denormalization you see with some data warehouse applications and solutions. I think perhaps setting up views that do the work might be just the trick. While the underlying database tables are still doing the work managing the relationships, indexes and so-on, perhaps views are the right way to abstract data in support of BI systems.

What do you think? Is there ever a time when you should denormalize or are the costs simply too high?

My email: swynk@sswug.org – let me know what you think.