Editorials

How Do We Handle the Volume?

How Do We Handle the Volume?
I read Seth Godin’s blog pretty regularly (me and about a billion other people) and his recent post about volumes of information really struck a nerve.

You can read the short post here.

The gist of the post is that the incredibly increasing volume of information simply isn’t going to be slowing down any time soon. Given that fact, there are implications for us as data professionals.

First, storage. Storing all that information has to start with (and end with) databases. Not many other options out there, at least not today. So storage requirements are going to continue to skyrocket. I think we’re going to have to start thinking about not only "master data management" in the sense of your own information stores, but what about more globals senses of the process? Would we ever consider normalizing data across data sources? Is there a longer-term consideration there? How do we handle having copy after copy of information in our collective systems? If we centralize, do we lose our edge? If we centralize, do we impair privacy necessarily?

Second, processing that information becomes a real challenge. How do we handle those volumes of information? If we ignore them, we risk not learning from the information. If we try to take it all in, we are forced to pick and choose what we pay attention to at the risk of missing key bits of information. I wrote on my facebook page that I wonder if the answer is, for lack of a better term, the Borg? Is it a collective processing (crowd-sourcing) of data and information? That seems like a real possibility. Sort of like massive grid computing (like the World Community Grid) — would it work to submit grander projects for data aggregation and analysis?

I can feel the privacy knee-jerk too. I do get it, but what’s the answer?

Drop me a note – let me know how you see it playing out.

Email me here.

Featured Script
Extract and compare date "YYMM"
Posted May 4, 2004 Here’s a script I use in a DTS package. I need to extract data from a table based on current year and pr… (read more)