In any database in order to have consistency it is important to have one ultimate source of truth. Many times we want to have that data in different forms. So, do we maintain the data in different forms, or derive it from a single form?
For example, you have data in relational tables in a database. You wish to see that data in an XML document form. Do you store both versions, or do you store only one version, and transform the data in the persisted version into the desired format.
You could store the data in an XML data type in your database. Then, indexing the documents you could have reasonable performance just as you would in relational tables. Another approach is to normalized the data, and generate an XML document if so desired. In reality any one of these transformations may be done inside of an engine such as SQL Server, or externally in a procedural code language.
Another form of transformation that is often performed would be converting the normarlized data into data marts or data warehouse schemas, again the definitions of which may be quite different. However, in those cases I would be less concerned with having duplication of the base truth since those tables always derive from the root source. The only difference is that instead of deriving the results when requested, they tend to be derived at a snapshot of time.
One source of truth has been a strongly held principle for many years. Those who violate the principle find themselves dealing with issues when external processes outside of their design impact only a single source of truth if multiple definitions are persisted.
Why not share your wisdom or war stories on this topic? Leave a note here, or drop an Email to btaylor@sswug.org.
Cheers,
Ben
