Editorials

Normalizing to Need

There are a lot of symptoms that appear when your database hasn’t been normalized according to your needs. Notice I didn’t say it has been normalized to 3rd nrmal form, or boyce-codd normal form, which is a good place to start any database. Usually you will find issues with missing indexes or statistics, or complicated queries. Sometimes you find yourself creating covering indexes or indexed views. We’ll talk about those in future editorials.

Today I want to talk about a database that is not normalized efficiently. Remember that I said you should start at third normal form, when creating a database schema. Take a tour on the forms of normalization if this is a new term for you. There are plenty of good examples to be found in a Google search. I have often seen tables in second normal form, where all of the data in the table is dependent on the primary key, but there are repeating columns. Some examples are medical records with multiple family members in the same table. Another is an entity with multiple phone numbers in the same table.

Let’s take the multiple phone number example. If you have a person table with three phone numbers, home phone, work phone, an cell phone. What if you want to find any person with the phone number [some phone number]. The query will have in the where clause:

Where HomePhone = [Some Phone Nnumber]
OR WorkPhone = [Some Phone Nnumber]
OR CellPhone = [Some Phone Nnumber]

Your query performance degraes as you add more and more people. The reason is that you have statistics to maintain on three columns. Additionally, you will ultimately require indexes on HomePhone, WorkPhone and CellPhone. Your qury plan will show that you have to make three passes through your person table, once for each column.

A more performant schema would be to have a person phone number table. This way, each person may have zero or many phone numbers; not just three. Additionally, there is only one set of statistics to maintain, and one index. Finding that person by their phone will be a breeze.

Lets go back to the statement that the best structure depends on how you use the data. What if you don’t search by phone number, and do not require the ability to have zero or more phone numbers. Perhaps a table with three phone numbers performs better for you. Instead of having to pivot three phone records into three phone columns for your report, the table already contains just three columns. If you don’t have to use an OR operator to search on all three columns, this form (often used in data marts) may be your better choice.

A good rule of thumb is to start with third normal form, and then denormalize or normalize as your requirements evolve.

Cheers,

Ben