Editorials

Data Normalization Again!

Data Normalization Again!
I find it interesting that so many have written regarding the editorial yesterday. What I find most interesting is the respect shown by everyone to differing opinions. This is definitely an area that has a huge impact on many individuals, and I have seen discussion become personl to the point of a complete communication breakdown. My thanks to the prefessionalism from all who have written.

Here are some responses representing extremes of thought from our readers.

Thys Writes:
That was an interesting read from a number of aspects, the most intriguing of which is that the subject is still being debated.

Back in the mid-90’s I wrote a CRM system which kept track of products, licensing and financial information on customers. We already had a proliferation of phone numbers and email addresses. I put all those single-field contact points into one table. The columns included a local key, a key to the contact *and the company tables (since a company may have central phone numbers etc), and a name for the contact point. I totally agree – no secret codes like “PhoneType3=girlfriend’s phone”. There was also a field so preferred contact point could be identified.

We could search on any contact-point information without first having to identify what the nature of the data was. Who cares whether you are entering a phone number or an ISBN or email address or a nickname – just take the entry and search.

There is no need to differentiate email addresses from phone numbers from nicknames etc etc and place them in separate tables. It is certainly not smart to produce multiple columns for these things in the main record table. Either of these tactics would lead to multiple searches to find something.

David Writes:
We wrote a theoretically-pure, perfectly normalized contact management system for a client many years ago: one person, multiple company records, multiple addresses (some inherited from the companies, some not), multiple phone numbers (some inherited from the companies, some not), etc. etc. etc. The client screen and the data structure both supported all these features. It was great.

But it was unusable. The number of clicks the user had to go through to enter data, and the amount of training they needed, killed it.

Since then, we keep our contact systems flat and simple.

Editor Notes:
This has been great feedback. It shows that many of you have experienced the same kind of problems in systems you wrote, purchased, or support.

Taking all of these responses, and trying to fit them into a continuum I find a number of positions along the line of options.

DB Not DB Partially DB Fully
Normalized Normalized Normalized
|————————————————————|

If your user interface is tied to a partially or fully normalize database you experience the kind of nightmare David writes about.

If you write a completely de-normalized solution, you end up with dirty data, or data that is difficult to query in certain scenarios. For example, if you want to find the person associated with a phone number, and you don’t know if that number is a land line, mobile or fax (a real scenario) you must then look for that value in all three phone number columns. This requires Three Passes through three separate columns…and you must have three indexes on that table for each phone number to support any performance.

If you write a completely un-normalized database as described, then you can experience scalability issues. Basically a good chunk of your data is stored in a single table. At least with that model you have few indexes allowing performance for certain circumstances.

So what do we do? One option is to not make this a database problem. A middle tier business object may store phone numbers in a collection readily. That same business object may have a set of properties returning a Home Number, Work Number, Mobile Number, and Fax Number of some entity. Searching for phone numbers is also easy…simple iteration.

So, here’s my list for today. A list of the benefits of pushing this work to a middle tier:

  • The Pivoting of data is something readily done through properties
  • Since the data is physically contained in memory in a collection, it is easily persisted in a normalized database
  • Searching for data is no longer constrained to a specific structure since business objects readily support any structure
  • When the need to store additional number occurs, the database schema is not modified, You simply add a new property to the business object…which is readily replaced in a production system.
  • Many different business objects may be defined with a different number of phone numbers…but you have one, completely re-usable persistence method completely tested and perfected…allowing a jagged number of attributes to be stored.
  • The end user experience is simplified, not constrained to your physical schema
  • If you choose to move to a completely normalized database, as David described, having a single phone number table where the phone number is the primary key, and all entities associated with that number are done through relationships, you could do that without any requirement of the end user to know that is your physical storage. They enter the number as they know it…your database persistence layer handles the normalization
  • If you choose to use a completely de-normalized data storage that is no longer an issue for the business objects or the code that consumes them
  • Most modern reporting and user interface development tools support working with business objects just as easily as connecting directly to a database

I’m out of room. I’ll have to follow up with the TOP (10) survey results and the trivia question from yesterday in the next newsletter.

As always, share your thoughts or questions with us at SSWUG. Write to me at btaylor@sswug.org. Or, go to the SSWUG Facebook page to leave questions for Stephen to answer in a future SelecTViews or editorials.

Don’t forget you can write for SSWUG too! See the web site for more details.

Cheers,

Ben

SelecTViews
With Stephen Wynkoop
All the news of the day that counts is provided here.. also, find out about normalization, what types of skills might be good for specialization and upcoming learning events…We keep you informed and up to speed on tips tricks and more.
[Watch the Show]

$$SWYNK$$

Featured Article(s)
Game-changing Features in the Silverlight 5 Beta (Part 2)
Explore another feature called Multiple Window Support, where Michael Crump, MPCD, will review its abilities within a Silverlight 5 application.

Featured White Paper(s)
Upgrading to Microsoft SQL Server 2008 R2 from Microsoft SQL Server 2008, SQL Server 2005, and SQL Server 2000
More than ever, organizations rely on data storage and analysis for business operations. Companies need the ability to deploy… (read more)