The article goes on to talk about the challenges facing the defense industry in this type of situation – from battlefield intelligence to working with normal requirements for watching over things that happen or that shouldn’t happen. Pretty interesting, at least so I thought. I’ve talked about the huge, vast amounts of data and the processing requirements around that and how much work we have to do there.
What was interesting about this piece on the drones is that, when I started researching it, it turns out that they (the defense world) have been talking about this issue literally for years. I found articles in industry magazines and such going back several years. We still haven’t knocked this one.
I ran a Map/Reduce job today on some logs that were created (IIS and cloud access logs) in trying to distill them down to meaning information. I didn’t run it on massive boxes, but the process took 9 machine hours – 9! It was only 4 days of data for a virtual event. That’s pretty incredible that it could take that much work to distill to meaningful information. I was surprised, but it sure shows one of the challenges with these levels of information.
Yes, you can analyze in real time, in "flow" mode – watch data streaming in – but your analysis of that information in real-time will necessarily be limited because you have other things to be doing (crunching the data, for example), rather than trying to pay attention to all of the comparative details. While you’ll certainly gain from the analysis of the stream, IMHO, the really extreme value comes from a step or two back – looking over things with a wider lens (particularly when it comes to trending and analysis of broader data flows).
We clearly have some work to do in processing the data and making it available in both the "stream" mode and the bigger picture mode.
What do you see in your own work? What works, what doesn’t? Are you faced with making decisions on this type of solution?