An interesting issue is showing up as I’ve talked with a couple of different people about their systems, and as we’ve been working with our own data and storage profiles.
With Big Data on the scene, the emphasis is on getting information through, processed and reported on – looking for trends, looking for usable information from the raw data. But one of the things that sometimes is overlooked when putting together systems is managing that data over time.
In a couple of cases now, we’ve been surprised by the need to come back and review information – be it the more understandable transactional information or other more esoteric information like you might be collecting from devices or as base-data that is later massaged and made usable.
The reason some of these slip through the “keep it – we may need it” cracks is because this information is often presented as raw data – not very useful in it’s base state – and then massaged just a bit to make it usable. This might be transactional (individual sales of a specific sub-set SKU) or other items that are seemingly only helpful when taken as the entire transaction.
With Big Data elements, the lines are even fuzzier – so much information may be coming through that is just extremely trivial. On the whole it presents an interesting picture that you can act on. But individually it’s really quite meaningless.
What has tripped us up is that later we’ll discover new uses for aggregated information taken from those raw bits. Things that we didn’t think we could glean from the data that later we figure out how to get. It’s critical to have those raw bits available, otherwise we won’t be doing much analysis except at the higher levels.
The moral to the story is to become a bit of a packrat – data warehouses set up to manage and hold the discreet information bits. Sure, it comes at a cost – and perhaps “costs” that aren’t obvious (more tomorrow) but it can save you huge trouble in the future.
Actually, it’s less saving you in the future and more about keeping your options both open and available as you discover new uses for the data. With storage costs and such so reasonable, it can be worth considering holding on to those initial bits just a bit longer.