Editorials

Retaining Stale Data

Data is one of those resources that tends to stick around long after its usefulness has diminished or been depleted completely. In my observation this is because it isn’t too expensive to keep around, and it many of the technique for reducing the cost or retention require a redesign of your systems to move or access it.

Often times nothing is done to reduce stale data from front line production systems until performance degrades. You can pay a higher penalty the longer you wait because it seems that performance doesn’t degrade in a linear fashion. Things roll along nicely and then in a short period degradation picks up speed dramatically until action is taken. Now the system resources are being reduced just to keep the lights on, and there is little left for you to reduce address the issue.

What strategies have you found effective for handling stale data? I have seen strategies that include:

  1. Deleting old data altogether
  2. Summarizing old data in data marts or summary tables, and only retaining detail data in contemporary values
  3. Moving the data to reporting databases as normalized data or data marts
  4. Using partitioned tables allowing older data to be available but have a lesser impact on contemporary data
  5. Having storage with different levels of performance (such as ssd, high speed disk, lower performing disk, DVD) and placing data on slower devices as it ages

I’m sure this list is only representative of the creativity found in the real world. Is there something that works well for you you’d like to share? Maybe you don’t find the topic to be relevant in your experience. Leave a comment here or drop an email to btaylor@sswug.org.

Cheers,

Ben