Editorials

Saving Data – It just is not that simple

I mentioned in the post yesterday that I think it’s getting more and more common and more important to store more information in its “root” or most basic form. If you’re processing information from different sources (flows of information, different data sources, etc.) when you consider the storage costs, it can be far more beneficial to have that information later. My point initially was one of “keep it around in case you find new things to do with it.”

But, as with so many things we do, it’s just not that simple. And it’s getting more complex all the time.

Specifically, compliance, corporate responsibility and the legal system are morphing right in front of our eyes. AZJim mentioned the compliance angle specifically – how responsible are we collectively for having the “proof” of how we got our information? While regulations aren’t there on the whole quite yet, it’s pretty clear to see that that will simply have to change.

If you make decisions on information that is derived from a flow of data or from different sources or other types of things we each face every day, it’s naive to think that compliance, archive and data responsibility requirements aren’t going to change to include the archive of that source information. If you thought email servers were tough to manage because of compliance regulations, just think about the implications of an instrument data flow – it could be 1000’s or more in different data points.

The issue of course is that if you’re making choices based on information derived from that, it’s only a matter of time before people need to see the basis of those choices. They’re going to need to see the root information. The whole data use and “chain of evidence” issue is right back in our face for managing the archive of information.

What this means is that you, as a data professional, are going to need to be quite aware of the use of information, what source that information has, where it lives in archive format and so-on. You’ll have to be able to produce that raw data in case of a dispute or even just someone looking to vet a decision in your company.

This is a big deal. I suspect we can all figure out how to store information. That’s not so bad. But I think the tough thing that happens is knowing the data USES. What elements of information out there are based on which bits of data… it’s really almost going to need to be a map of uses.

Of course it’s not that simple. Even today we don’t really know exactly all the uses of data. From worksheets to departmental databases to that lastest database in the cloud that aggregates information from our corporate network. It all counts and, in the right legal or discovery situation, it all matters and has to be known.

What I don’t know is if liability can be limited as it is with email – can you have policies in place that indicate that you’ll archive for a period of time and the delete information? Is that the necessary answer? I’m not an attorney, but it would seem that email systems and the archive policies around them provide a starting point, if not a full-out precedent for handling this. I also suspect it’s not yet on the radar of a lot of corporations.

Perhaps it’s time to start thinking about and planning for the “when” it happens vs. the “if.”