Can Normalization be a Bad Thing?
Today I have the opportunity to share the second of David’s un-votes from the survey of “Worst Things to Do in a Database.” David does not agree that De-normalizing a database for ease of consumption by a business layer should automatically be considered a de-facto "Bad thing". He does a good job of explaining some of the complications we experience when developing software.
David Writes:
More heresy, I know. Making it easy for the business layer to consume data is DESIRABLE. Phone numbers are a great example. If I have a contact record, yes…theoretically a contact could have N-number of phone numbers, and a highly normalized model would put all phone numbers in their own table. But if the use-case requires that contacts be retrieved with phone numbers, having those numbers in the contact record could in fact be a better solution.
Reasons include:
- A phone number is more like an attribute of a contact than a real-world entity
- N-number of phone numbers, where N < 3 or so is the norm. The complexity and overhead of architecting, storing, joining, retrieving, documenting, and creating reports for a separate phone number table is not warranted
- Greater understandability of data: A fax number in a FaxNumber field is far easier to understand than a PhoneNumber with PhoneType=3 in a separate phone table
- Ambiguity in meaning of foreign key: if ContactA and ContactB have the same phone number, should they reference the same phone number record? If not, you still have duplication of data. But if so, you have a harder problem to solve when one person’s phone number changes, but not the other.
This data model is not self-describing or intuitive. Ideological purity should not trump usability: in fact usability
should win out, even at the cost of modestly-reduced purity or performance.
Editors Thoughts:
I’m really glad David chose the phone number example for his argument. This is one that most developers understand. Many times I have had to fight with the data when phone numbers are normalized or de-normalized in order to find the data I am looking for.
When finding a number in a normalized form, it is often difficult to know what is “THE” number to use on a report or screen…because that is not automatically stated. David also brings out the point that the schema allows for having say multiple cell phones. Which one is the phone number to be used? Free form text descriptions don’t help.
A common method of data presentation is to show all phone numbers for an entity by class of number such as work, home, cell, fax. Normalization requires the need to pivot data when returning all phone numbers for an individual. This is extra work.
When storing phone numbers in multiple columns of a single row you experience different issues. What if a person, like me, doesn’t have a land line anymore? Do you put my cell phone in the home phone number field…it isn’t my home phone. When you want to print out a report with my phone number, do you have to do gymnastics to find the first phone number that is populated? There is nothing more frustrating than having to write a query with a case statement or coalesce in order to find the first populated phone number.
What do you do when you need to record multiple phones of the same type? Personally, I have three cell phones used for different purposes. There are times I wish for people to know all three. Do we simply have Phone1, Phone2, Phone3 in our schema? How many phones are too many or are enough? How do we know (if we need to) which type of connectivity that number represents?
Either model has difficulties for storage and/or consumption. All of the issues with having multiple phone numbers can be resolved as business rules. Implement those business rules in stored procedures, functions, etc. if you must.
I have found an application business layer to be the great equalizer. Data storage design, at this point, becomes a non issue. If you want to restrict your objects to having three phone numbers such as work, home, and cell, then implement that design in your business objects and business rules.
If your database doesn’t force you into that business rule, then when you get a new requirement to allow for tracking another number, your database schema is not impacted; only your business layer. In my experience, it is a lot easier to change business objects and logic than it is to change the schema of a production database.
Regarding the last point about the uniqueness of a phone number, and it being shared by multiple people; I agree. Unless I had a unique situation, I wouldn’t normalize to the point of having a many to many relationship between different entities and phone numbers with which they may be associated.
Historically SSWUG does not respond to reader comments. It is my desire that providing a response here demonstrates sincere appreciation to David’s comments, and that the difficulties of managing and consuming data are demonstrated for either technique.
SQL Server Trivia
Microsoft SQL Server didn’t always have a CASE function. When was the CASE function introduced? How did you implement CASE logic without a CASE function?
Coming Soon:
The TOP (10) Worst Things to Do in a Database Survey Results.
David’s Last Un-Vote…"Embed non-structured data with structured data in the same table and/or database"
Your comments and experiences are always valued. Please feel free to share any time by sending an Email to btaylor@sswug.org.
Cheers,
Ben
$$SWYNK$$
Featured Article(s)
What Is Going On? (Part 2 of 2)
Have you ever wondered what is happening on your server? Did the backup run? Did it fail? Why is Outlook installed on my Sql Server? I don’t like MAPI. Get is off my Sql Server! SMTP who? Can I use SMTP to send my mail? Let’s use this new feature called Database Mail. But wait, what is this profile? I did not get my email. How can I tell if my email has been sent? When and where should I use email? Come learn about Database Mail.
Featured White Paper(s)
Upgrading to Microsoft SQL Server 2008 R2 from Microsoft SQL Server 2008, SQL Server 2005, and SQL Server 2000
More than ever, organizations rely on data storage and analysis for business operations. Companies need the ability to deploy… (read more)
Featured Script
dba3_sys_ProceduresInDBMSCacheViewScript
generates DDL for a view on procedures in DBMS cache… (read more)