Editorials

Thinking Through Single Source of Data with SQL Server (and other systems)

Featured Article(s)
Data mining: The New Gold Fever.
Information today more than ever, has an incredible value in the market. The race for getting the best and more accurate data for business intelligence purposes resembles the famous gold fever when everybody and their dogs moved to where the action exploded.

Thinking Through Single Source of Data with SQL Server (and other systems)
I’ve had some really comprehensive and thoughtful responses to the idea of having a "single source" of information – wanted to start with Ralph’s thoughts:

"I have worked in situations where the primary purpose of the database was to house someone else’s data. I have also worked in situations where the data was what I guess you would call proprietary in that it was either entered, derived, or created internally for internal use. This has led to what I refer to as The Rule of Data Ownership. Briefly stated, The Rule of Data Ownership is that only the owner of the data, the person or entity which has created the data, has the right to modify the data or to delegate, in specific terms, that right to another person or entity.

In other words, if my system houses (or, in effect, warehouses) someone else’s data for whatever purposes (whether processing, displaying, or anything else), then my system must accept that the data supplied by that other person or entity is, by definition, "correct". Also, my system should not make any modifications that have not been specifically agreed to, either via contract or some other form of authorization, by that other person or entity. Because my system only warehouses the information, it is incumbent upon the person or entity supplying the data to determine the accuracy of the data. However, I may choose to provide a means for that person or entity to perform data analysis that identifies inconsistencies in the data.


On the other hand, if the data is entered, derived, or created internally to my organization, then my organization has the need to ensure the integrity of the data. That probably means all manner of cross-checking, data integrity rules in the database, and policies and procedures that target making sure that the data that is entered, derived, or created is accurate and complete.

In either case, though, the users of that data are, in effect, forced to accept (until proven otherwise) that the data is accurate and complete.

Now, having worked with both kinds of data and having seen the inconsistencies and inaccuracies, to put it politely, of the data within systems, to say nothing of data sources such as Wikipedia articles, I tend to not immediately accept that the data within any database is accurate, much less complete. As databases proliferate and data sharing becomes more and more rampant, I believe that the inherent lag in the sharing process will begin to introduce greater and greater discrepancies between databases. (This is one of my concerns with the idea of the universal health data sharing plans.)

However, perhaps of greater concern is the Wikipedia Effect. While Wikipedia is a great tool for some things, its greatest strength is actually also its greatest weakness. Essentially, the strength of the Wikipedia is that anyone who really knows about a topic can post information that others can find and use. The greatest weakness of Wikipedia is that anyone, whether they actually know anything about a topic or not, can post information that others can find and use. The standard response when this weakness is mentioned is that there is a consensus mechanism that will prevent invalid information from remaining in the Wikipedia. However, if there had been a Wikipedia when Copernicus started talking about a solar centric rather than a terra centric planetary system, wouldn’t his "outrageous" theory have been suppressed? In other words, if enough people are wrong in the same general way, then the correct information will be suppressed and the incorrect information will be propagated."

Featured White Paper(s)
Keeping it Simple: How to Protect Growing VMware Environments
This whitepaper explains why you don’t need to understand VCB internals or labor over complex scripts to create comprehensive… (read more)