It’s the dream, really, to know more about your customers. To know how they use your services or goods and how you can better serve them. To have some sort of predictive knowledge about what they’ll want and to be able to be right there, right on time, just as they need, so they’ll never stray. This requires information – data – on those customers and pulling it together in a way that is actionable both on an individual basis and on the aggregate for planning purposes overall.
But this is messy. Especially with GDPR and other data ownership issues that are both here now and coming soon to data systems near you. Those requirements mean you have to provide for not only the protection of the data but controls on how it’s used, the ability to scrub it system-wide and at their core, where that data actually lives.
We’ve been working with several SQL Server customers that are approaching this in a distributed manner. Back in the day (cue the old-guard music), when databases got “really big” there were some ingenious options to split the database so different segments resided on different servers, different storage systems. We saw this deployed as social media took hold and servers and server farms grew exponentially at a seemingly non-stop rate. It was a good solution, tough to maintain at the extreme fringes of growth, but doable.
Now, people are doing a similar thing with their databases. Setting up server instances to handle functions and analysis of particular information. Call them data warehouses, lakes, repositories, or just applications – all depending on how you’re doing what you’re doing. But this is introducing some real challenges in a few different areas.
First, dealing with the data ownership issue – you have to have a way to track down the references to John and Mary Smith in your database. They want out, you have to honor that. So we’ve been working with different people to assure that this is possible and that that data ownership chain is accessible by the company. This has strong implications for how that information is used. In some cases, we can anonymize that information and move it between systems at the summary level, with no personally identifiable information retained.
At other locations and uses, we need that information (hard to send a coupon out if you don’t know the address and name information, for a simple example).
I saw this post talking about whether it’s even feasible now to have a truly comprehensive look at your customers. I think it’s more difficult, but I think with some architecture, it’s certainly possible. It’s also possible that it’s not worth the effort and taking a step back from detailed information and sort of pre-processing it for the data you really need may be a solution.
Second, systems are becoming more distributed, not less. Really, the only way to lock this down and provide functionality and protection at the same time is to be highly restrictive on the specifics, but less so on the summarized information. Let’s face it, for that quarterly sales report by district, you don’t really need personal information in the tables underlying the report.
This summarization will be a real challenge to define. It certainly has been on the systems we’ve been working on. We end up setting up SQL Server instances that serve a particular type of information (sales, traffic, etc.) and then work from there to pull things from those aggregated sources, rather than the detailed level information. This lets us control the personal information, have fewer locations that have that data, and allow for reporting.
Retrofitting those data sources though when you miss a bit of information in the planning is difficult and potentially messy.
You also run the risk of silos of information that get “lodged” in the various data summary segments. Silos that aren’t getting updated, aren’t getting shared as they should be, or outright orphaned and left to digitally rot if they’re not maintained over time.
No doubt it’s a challenge, but the planning now will mitigate your exposure, the risk, the legal exposure and should still provide (perhaps “most”) functionality that you could need. Careful work and thought about the protection, ownership, and management of personal information will pay off big rewards going forward.