
Automation of Data Source Validation

Featured Article(s)
Troubleshooting SQL Server 2005 User-Defined Functions
In this article, Alexander Chigrik explains some problems that you can have when you work with SQL Server 2005 user-defined functions. He also tells how you can resolve these problems.

Watch the Latest SQL Server Video Program
SQL Server 2008 Features: Chris Shaw takes a look at some of the new features of SQL Server – and how they work. This is a special airing of his session from the June Virtual Conference. Find out about key features, what benefits they offer, how they could apply to your work and your upgrade efforts and more.

[Watch Show]

Data Recovery in Minutes – To the Minute of Failure!
Did you know there’s a new tool that will actually help you create a disaster recovery plan, set up your system to execute on it and make sure you can recover things as needed when the time comes? Be sure to take a look at Acronis’ system – point-of-failure recovery, automated wizards to get things set up and a lot more – all make this a really unique approach to getting your system covered. Take a look here.

Automation of Data Source Validation
Caroline wrote in with a great summary and overview of what’s happening in the world of data at this point. I’ll share this and then I think we’ve run the full range of feedback…

"The assumption that information coming out of a database can be trusted is fundamentally false. Its initial validation could have failed. It could have been migrated multiple times through multiple instances of an application (life insurance data is particularly prone to this as it is a) complex and b) long-lived). It could have been messed up by various processes, either operationally or as part of its movement through the business intelligence life cycle.

But I think it’s a dubious proposition to make the end user responsible for verifying the source and integrity of the data.

Hence the growth of the whole area of data quality and master data management, something where MS does not, as far as I am aware, have an offering. The big players are IBM (particularly after their purchase of Ascential), Trillium, & SAP, with Informatica also a good product offering.

Data cleansing, standardisation, deduplication and consolidation, as well as the implementation of column-level audit trails and fuzzy-logic matching, trust/rank allocation and merging capability, is becoming an industry in itself as businesses realise that poor-quality data and information leads to poor decision support. It grew out of CRM/CDI (customer data integration), because many large companies still have separate silos of customer data which may or may not tie up properly. It’s also vital for accurate mining, predictive analytics etc, and the ultimate goal is a transactional model where data inputs in operational systems are cleansed and matched before being created in operational databases.

Most organisations are way behind this goal, though – those that have actually started the process have seldom got beyond the cleanse-per-silo model, and if they are starting to match and merge, it’s as part of preparation for warehousing and the data is not accessible by operational systems.

It is really fascinating work, usually involving computing the probability that two data items actually refer to the same entity, allowing the computer to decide which items are the same, which are not, and which need to be referred to a (human) data steward, and then, if the decision is made to merge two rows, computing the probability that an attribute from one row is more accurate than the same attribute from another row. Not for the faint-hearted, and typically only done in high-end installations because of the difficulty and expense (it’s not a trivial exercise to automate this sort of stuff), but one of the most interesting areas in data management today."

I agree.

Featured White Paper(s)
Get in the Groove With SQL Server Consolidation
Is your business a victim of SQL Server sprawl? In Quests new white paper, Get in the Groove With SQL Server Consolidation, S… (read more)

The Shortcut Guide to SQL Server Infrastructure Optimization
In The Shortcut Guide to SQL Server Infrastructure Optimization, the new eBook from Realtime Publishers, leading IT author Do… (read more)