Right on Cue: Facebook and Hadoop (Data Hoarding)
Now, I’m not saying data on Facebook is "hoarding" but consider what’s there. They are setting a fine example of how to manage a massive store, make it readily available and dealing with exponential growth, all in one… they’ve just moved their Hadoop cluster to a new facility and, I have to say, shown off some of the proficiencies of their architecture in the process.
I’ve talked in the past about data partitioning – I think the information on the architecture requirements is fascinating. Imagine doing that move without downtime and looking to grow you capacities (which means further data partitioning going forward) without giving up performance.
Read more about the environment here.
So, in closing out the data hoarding theme, hearing the success of this environment, you have to wonder. Are we headed to an environment where the technology solutions outpace the need to decide what data to keep and what data to get rid of? Sure, this solution isn’t cheap. But at some point. there is a cross-over between the costs and risks of data management and the ability to "just keep it all and make it searchable…"
How’s that for flip-flopping on the whole topic? I think most shops come down to budget and disk space realities.
What do you think? Let me know…