Editorials

Addresses Around The World

Today I thought I want to continue the topic of data structures for multiple regions. Let’s consider physical addresses, as that is one topic having the greatest variance when it comes to storage.

The biggest problem, when it comes to addresses and multiple regions, is that the address elements vary considerably. Often, you want your system to use referential integrity to manage accurate data. In the United States of America you have the following address properties:

Adressee – the person/organization represented by the address

Street Address – The address number and street name. This could also be a post office box

Street Address 2 – An overflow line if all the address information cannot be fit in the Street Address

City

State – On of the Fifty States, or territories

Zip Code – 9 Digit Identifier that provides a numeric representation of the State/City, and sometimes portion of a city

Given a Zip Code, you can identify the City and State for an Address. So, using referential integrity, you can validate the accuracy of an address quite well. When you start to work with other addressing methods, these properties may need to be named different. Perhaps the term state could be named prefect, etc. However, there are other mail stops that are defined with a completely different set of properties.

One technique I have found works with international systems is to have a database containing the different properties as they are defined for a country. They are normalized as needed to allow for referential, and thus data integrity. However, to enable reporting and all the rules that put together address information, I use a de-normalized table for the purpose of reporting. So, the normalized table stored the data according to the data structure needed for the addressee. The de-normalized structure contains the data in a common form that works for any destination.

The de-normalized structure simply contains generic properties of Line1, Line2, Line3, Line4, Line5. Any destination can fit into 5 lines. So, the rules regarding what properties make up an address are handled in one structure, while the final results of those rules are handled in the generic structure.

Using the USA example again:

  • Line1 = Addressee
  • Line2 = Street Info
  • Line 3 = Additional Street info
    Or
    City, State ZipCode
  • Line 4 = City, State ZipCode if Line3 Contains Additional Street Info
    or
    USA
  • Line5 = USA if Line3 Contains Additional Street Info

When I was in Africa my address was:

  • Line1 = Ben Taylor
  • Line2 = BP 240
  • Line3 = Bangui, CAR (City, Country)

You can see, with these two examples, how the generic storage simplifies reporting. The different rules that determine what properties are stored in each line are executed when the data is stored. Then, when you report, or utilize the address information, you don’t have to compute the different rules every time the address is utilized. However, if you wanted to find all the addresses in the country CAR, this structure would be horrible.

This violates data normalizing rules, where a data property is stored in only one place. However, this duplication benefits the ability to mine data at a later time, validate the accuracy based on locality, and produces enhanced performance when consuming data where the rules of address configuration are not relevant, such as printing reports.

You can add your experience or thoughts in comments here, or drop an Email to btaylor@SSWUG.ORG.

Cheers,

Ben