Editorials

Data Hoarding Feedback

Data Hoarding Feedback
I received some great feedback from people facing down the data hoarding issues that plague so many systems. The original question was essentially "we all hoard data… great. But what do you do about it??"

It seems that the successes for people come from a multi-step process of addressing too much information stored.

First, figure out the data that’s really more valuable in a summarized form. It might be that you can, for example, summarize information by city (like sales data for example). It’ll still give you information by state by summarizing cities and you can drill down into information to the city level. But you can remove the detail of the sales, still providing the reporting you’ll need.

Next, build the data warehouses of the information at the summarized level and update your reporting to use that summary information, rather than the details.

Sounds pretty straightforward, but I have to say it makes me nervous. What if you choose wrong? What if you summarize at too hight a level? There isn’t any going back once you’ve made the change and removed data. And, given that many times this summary infroamtion is reviewed less frequently, the time that may have passed since the information was originally collected may prevent you from using a backup to restore that original detail information. Ugh.

I guess the net-net is to start with small strokes of summarization. Very small, focused areas that you can pull together the things you propose summarizing. Then, let your users work against that information for a bit – make sure they get the right elements at the right levels to survive their reporting and analysis requirements.

From there, you should both learn what works and what doesn’t, and find out about any holes you may not have anticipated. It’s not going to be a quick process, but if you have an established data set, you may not have many choices.

What do you think? Let me know…

swynk@sswug.org