Editorials, Ethics

Data Responsibility and Data Architecture

There have been a few posts now talking about a corporate thirst for big data (I love that analogy – because it IS a thirst) and the fact that companies have this all-out push to get as much information as possible.

I don’t mean to vilify companies or any of that.  It’s the promise of using information to better serve customers with tighter and tighter margins and so much on the line in terms of costs and such, optimizing that process just makes sense.  Plus, like I mentioned yesterday, it is what customers are expecting, even if it’s not what they state they want.

One reader yesterday made a good point and brought up a different moving in addition to Minority Report:

The article mentions Minority Report (2002) but could have also mentioned The Circle (2017). This more recent film has a darker feel to it then the earlier one and really makes one wonder about the things we are doing with technology[.]

The Circle, if you haven’t seen it, takes big data to the extreme.  It saves everything about you.  Your health, your location, all of it.  Not just creepy stuff, but good things that can really be used to make life better.  But of course it keeps going after that and the movie becomes a movie in that it shows the darker side of usage of this type of information.  They do talk about data center requirements as well.  (Not at a technical level, but just the vast storage and management of it that would be required.)  In that case, wearable and placeable cameras capture all sorts of stuff and there are good and bad aspects to this.

But the data.  The data is an issue.  There are privacy concerns, use concerns (talk about search warrant issues) and even crowd-sourcing challenges that come from having all of this information.  “Wisdom of the Crowd,” a current TV series, talks to this too.  And they keep talking about doubting individual data points, but not the “wisdom of the crowd” – once you start averaging out the things people are submitting (in that case to solve crimes), the answers tend to fall out and the noise is replaced with more real information.

I propose that, as database people, we need to figure out how to store information and then present it in layers.  We need to create the stored procedures, the views and the protections to keep from going around these types of things, so that information can be managed and kept and analyzed, but not abused.  Boy does that sound easy in a single sentence.

We need to think about things like whether it’s going to live in SQL on Azure or on-premise.  Things like searchability (both legally and in the course of business) come into play – and protecting information from more questionable uses is going to be tested and retested.  At what point are you, as DBA or Data Science person, responsible to say “no” to requests to pull information?  At what point is the information tipping over from useful in managing your work or your customers or whatever, and into just too much?  Is there even a line?

There are incredible uses for this information.  I suspect that we have our work cut out for us to carefully design systems that will support what is on the very near horizon, protect information, limit unwarranted access and yet provide for the great utility of the information.  This is not easy.  We’ll have to be extremely actively involved with development projects, we’ll have to insert ourselves into the management roles and goals processes and we’ll have to walk very fine lines between too much, and too little information.

Plus, the mechanics of all of this will be huge.  Storage.  Access controls.  Performance.  Tools.  IoT input and data flows.  Analysis.  Reporting… Sound familiar?  Call it data science, being a DBA, whatever.  The fact is that your design and management skills are desperately needed to be applied here.