A few people have written to me asking about different things that can be done to protect information in the systems. The fact is that companies are indeed thirsty for every single bit of information they can get their hands on, whether they have a use for it now or not. The thought is that it may be helpful/useful later as different reporting and analytics pieces come online.
There are some very clear things that we can start spearheading though.
I take some inspiration from the fact that you have all sorts of different storage classes available in the cloud. Please bear with me, even if you’re not using a cloud provider – the provider isn’t important, but perhaps the idea will help. With these different storage classes available, you can determine what types of access you need. Immediate, infrequent, delayed, even significantly delayed.
I think this is a powerful model we could easily (!) apply to the data we’re all tasked with managing. With these storage models, the idea is that you’d store the information in the “immediate” “zone” while you’re actively using it. Then, as the information ages (the files, objects, whatever) you move them down the tree into the other areas, probably eventually deleting it as it ages out of usefulness.
I think it’s important that we start approaching our own information stores this same way. But we have some interesting tools to apply. If you think about the data in our systems, we need to start thinking about aging out data elements that are no longer “active.” Some ideas:
Use views to control access to “current” information – this lets you decide what information is seen, who sees it, etc. You can modify the views to support what information is returned, etc. This is pretty typical of most systems today – same operation.
As information ages into the next phase, mask information. Keep the last 4 of the credit card number, mask the rest if you have to have it to look up for warranty, etc. You can even apply the mask in the database tables (rather than just the queries) and update columns, protecting the information in a big way that in reality, you’ll likely never HAVE to have. You have other ways of getting it (like asking the customer for the last 4 and then moving from there). Masking is a feature of SQL Server (read more here) that you can apply and take a big bite out of the liability in your database, while still providing great reporting. Still need to know the sales by region or whatever? Great. Keep “State” and keep “ZipCode” but get rid of the specific address or other information that is really not of as much use. (You’ll have to tailor to what’s important of course).
As information ages yet again into the next phase, REMOVE column content that is now meaningless. Get it out of the database. Clean the tables so they can be used for statistical review (how many orders in Arizona) but really, the detail will be much less important. As soon as you can say it’s not important or helpful, get rid of it. This is a scary thing to do (“But what if we need it later”) the reality is you won’t. You know when information is no longer valid. You can create new views that show aggregate and summarized information that keeps the utility of the information but removes the specifics. Move summaries into a data warehouse.
If you apply the same archival progression to your data, you won’t impair the usefulness, you will decrease the risk to your customers, your company, and you will do those whose information you’re storing a big favor. Along the way, you make your systems and information less tasty to nefarious types and increase the security and lower the liability of the information you are keeping. All of this can be automated based on usage dates and the models of your business and how it really uses information.
In short, you’ll be a hero.