Data Overload - I was going to write about drones and too much data...

Really funny thing happened. I had seen an article talking specifically about drones and the huge amounts of data, both video and otherwise, being produced. The issue, it seems, is that so much information is coming in that they can’t get through it all in a comprehensive way, especially not in "real" time.

The article goes on to talk about the challenges facing the defense industry in this type of situation – from battlefield intelligence to working with normal requirements for watching over things that happen or that shouldn’t happen. Pretty interesting, at least so I thought. I’ve talked about the huge, vast amounts of data and the processing requirements around that and how much work we have to do there.

What was interesting about this piece on the drones is that, when I started researching it, it turns out that they (the defense world) have been talking about this issue literally for years. I found articles in industry magazines and such going back several years. We still haven’t knocked this one.

I ran a Map/Reduce job today on some logs that were created (IIS and cloud access logs) in trying to distill them down to meaning information. I didn’t run it on massive boxes, but the process took 9 machine hours – 9! It was only 4 days of data for a virtual event. That’s pretty incredible that it could take that much work to distill to meaningful information. I was surprised, but it sure shows one of the challenges with these levels of information.

Yes, you can analyze in real time, in "flow" mode – watch data streaming in – but your analysis of that information in real-time will necessarily be limited because you have other things to be doing (crunching the data, for example), rather than trying to pay attention to all of the comparative details. While you’ll certainly gain from the analysis of the stream, IMHO, the really extreme value comes from a step or two back – looking over things with a wider lens (particularly when it comes to trending and analysis of broader data flows).

We clearly have some work to do in processing the data and making it available in both the "stream" mode and the bigger picture mode.

What do you see in your own work? What works, what doesn’t? Are you faced with making decisions on this type of solution?

Data Overload – I was going to write about drones and too much data…

Recent Posts

Debugging Multi-Cloud Performance

Mixing Flavors of SQL Server

July Spotlight – Db2 LUW: Types of I/O

Part II: Overview of B- Tree and B+ Tree

Is it undermining or rude to email the boss to ask him to get his act together?

Debugging Multi-Cloud Performance

Mixing Flavors of SQL Server

July Spotlight – Db2 LUW: Types of I/O

Part II: Overview of B- Tree and B+ Tree

Is it undermining or rude to email the boss to ask him to get his act together?

Getting Started With Deep Learning in Your Browser Using TensorFlow.js

SQL Server Collation Overview and Examples

MySQL Escaping on the Client-Side With Go

VS Code Gets New Python Language Server, Named After Monty Python Character

How Hello World! changed – top level statements and functions (C# 9)

SSWUG.ORG