Database in Need of Normalization - We Like It That Way

Webcast: Understanding the new culture of SharePoint
You have a sponsor for SharePoint development, you have a budget, now where are all the users? Online Collaboration powered by SharePoint is not only a new technology, but a new culture of doing work. In this session we’ll explore change management (the fluffy kind) strategies that can be tailored to your user base. Attract users, and keep your budget!

Presented by: Adam Levithan

> Register Now
> Live date: 7/28/2010 at 12:00 Pacific

Last Friday I opened the topic of Database Normalization. My goal is to help our community recognize situations they experience in their day to day development or maintenance tasks that can be resolved through database normalization.

I’m one of those poor people who learn better from mistakes than success. I know there are a lot of others like myself. So, I’m looking for those problems that you have solved, or would like to solve through database normalization.

I had some interesting feedback from the Friday newsletter. In that I shared how one of the rules of normalization (no repeating columns) expresses itself with problems you have to solve. Here are a few great responses.

Don writes:

"…the "repeating columns" thing can be a bit of a grey area: are a phone number and fax number both members of the same domain? Depending on your business model it might be perfectly valid to store these as separate columns in one row. On the other hand, if the values do indeed come from the same domain and they are simply suffixed (phone1, phone2, phone3…) then I would normalize that out, but that’s because I like to reserve null specifically for the semantic value "unknown" (which I think is a valid use of null)."

David writes:

"Your phone number case is interesting; we’ve wrestled with the same thing. We have ended up almost always storing them as flat "denormalized" fields.

We think you have to consider user interface in the application when designing a system, not just design data structures in a vacuum. What will the user’s experience on a screen be…. will they simply have 4 fields to enter "home phone, work phone, cell phone, and fax", or will they have a master-detail interface with the ability to enter N number of phones (and choose a Type on each one). Master-detail is a more complex interface but more flexible… if it meets user needs, then normalized is great.

But if users don’t need that complexity and you decide to just put 4 fields on the screen, then the overhead of translating between that and a pure normal structure gets pretty ugly… tracking history of changes, triggers that do things on data changes, importing data, handling multi-user concurrency issues, etc.

If, every time you read or write the parent record, you’re going to treat it as if it were a column in a flat table, then just go down that path."

Dave writes:

"I totally agree with your sentiments about normalization. I keep running into an interesting problem with migrating old data….

We recently completed the migration of 5 old systems (AS400/DB2, Unix/Oracle, Windows/Oracle) and countless spreadsheets, access databases and other manual workarounds onto a single SQL Server database designed to move the company forward for the next 5 years (or that is the plan).

Every so often I have developers (who are always looking to prove me wrong) asking me about one of our main tables (Contract); they ask because when they look at the data they see the same value repeated in 4 columns:

– MembershipReference
– CardReference
– LoadReference
– ExternalReference

The first thing they say is “You got that design wrong, you only need one column”…..The answer to this is twofold:

1 – yes, we started with one column
2 – no, we have downstream systems that can’t be changed and upstream customers who won’t change

It is the really the second point that is the key: There was no possible way to map the data from the source systems into one column and have only a minimal impact on the upstream customers (who supply data to us) and the downstream systems who consume the data.

So, we created explicit columns to support the various needs and, for new business created on our system, we duplicate the ExternalReference into the other fields – wasteful? Maybe; denormalised? Definitely; essential? Yes, without a doubt.

Anyway, I’d be interested to hear your thoughts…"

Well, since you asked for thoughts, I’ll give some feedback. Others are sure welcome to get into the conversation too.

There is one basic thought that seems to come from all three of these responses…some other system treats the data differently. In this case, since you are in control of the data and the schema, it may make sense to normalize and have the expense of converting that data into the form needed for presentation or conversion to other systems.

My question to you would be, "Do you want external requirements such as user presentation or other systems driving your database design?" I could see treating phone and fax as two different entities or attributes, and therefore simply separate columns. In this case, you are saying that these are not repeating attributes, and that can work as a design choice.

One of the things that helps with this dissonance between the database and the business requirements is to have a separate software layer responsible for converting a well tuned and optimized for performance database. If that layer were present, you would easily pivot phone numbers into for columns in a row. Pivots have been in report tools for years. It’s easy to do in a business layer if you don’t want your report jockeys having to do the work (Anyone want to send me some tool hints here I’d be glad to post them in our newsletter).

Regarding the Reference numbers. You are developing a new system. So, make if extensible. It’s not hard to pivot/unpivot the data coming in and going out. Your system is now extensible, the other systems are not. So, when changes are requested to add another reference number, your estimate comes in way below all the other systems because you have a design that is extensible. You simply update your pivot/upivot queries. Or better yet, this isn’t a database problem. Your ETL tools should handle this for you.

A second option that works in tandem with this is a business rule you establish that maintains a data mart so that you don’t have to repeatedly pivot the data from 4 rows of phone numbers or reference numbers, into a single row with multiple columns. So, pre-process for your presentation.

One final thought. Normalizing out repeating columns has benefits:

when you audit change…your repeating data is audited separtely from the parent table. It doesn’t slow down the parent table audit, and since it is more granular, auditing your normalized data is faster.
when you search for two or more instances belonging to the same parent
when you require indexing to optimize search capabilities…one index works for all repeating column values that are normalized
when you want to speed up the maintenance of the data in the parent table…Insert, Update and delete are simplified because it doesn’t have to handle nulls and it doesn’t have to check these columns for change when updating data. I acknowledge there is extra work maintaining the relationships
when you require uniqueness, normalizing the data simplifies the process

Narrow, normailzed tables generally perform better in an OLTP system. There have been instances when I have had to use different methods to optimize peformance or meet unique requirements; those intances have been very few.

Do you have an opinion, brainstorm, examples of other systems that are solved by database normalization, or tips for solving some of the issues that occur with normaliztion? Get into the discussion. Send your thoughts to me at btaylor@sswug.org.

Cheers,

Ben

PS. Don’t forget to take a look at the upcoming workshop from Kalen Delaney on July 23rd. That’s next Friday. It will be a great virtual workshop to help you update or increase your knowledge of database indexing, especially as applied to SQL Server 2008. Go to www.vconferenceonline.com/shows/workshops/indexing.asp for more information or to register.

Featured Article(s)
DBA Learning Strategies (Part 2 of 4)
Part 2 in the series of things to think about as a DBA.

Database in Need of Normalization – We Like It That Way

Recent Posts

Debugging Multi-Cloud Performance

Mixing Flavors of SQL Server

July Spotlight – Db2 LUW: Types of I/O

Part II: Overview of B- Tree and B+ Tree

Is it undermining or rude to email the boss to ask him to get his act together?

Debugging Multi-Cloud Performance

Mixing Flavors of SQL Server

July Spotlight – Db2 LUW: Types of I/O

Part II: Overview of B- Tree and B+ Tree

Is it undermining or rude to email the boss to ask him to get his act together?

Getting Started With Deep Learning in Your Browser Using TensorFlow.js

SQL Server Collation Overview and Examples

MySQL Escaping on the Client-Side With Go

VS Code Gets New Python Language Server, Named After Monty Python Character

How Hello World! changed – top level statements and functions (C# 9)

SSWUG.ORG