Editorials

Word to the Wise – Old Rules Still Apply On Testing Systems

I was talking with someone today that was under the impression that backups and restores were a thing of the past. Yikes. I never really anticipated someone feeling that way.

Their systems are largely on-premise but they do use the cloud for some bigger workloads, storage and such. Their feeling was that things were so stable any more, and that systems in the cloud were “automatically managed” to such a level that they didn’t need to worry about knowing and understanding the process to restore/recover systems.

I kept asking questions because I was so surprised by this thought process – but it was genuine. They felt like the likelihood of systems failing has gone down so dramatically, yadda, yadda, yadda.

I’m here to tell you that you need to understand how to recover your systems, and you need to have tested that process WITH your current setup and configuration. I know I’ve worked with others too that have done all the documentation, the testing, the restores, all of that – but NOT since they split systems up into the cloud, or even between cloud providers. They did all that work when the systems were in-house, so surely they’re prepared.

Yikes.

One of the gotchas in all of this wonderful “overflow” to the cloud or using services where they’re best deployed and available or whatever your rules of engagement are when it comes to systems configuration – is that mixed systems can be much more complex to recover. You can have one side of a system fail, the other side not fail, and you’re faced with the process to recover and get things back to a known state.

That “known state” is the key – you have to know that things are back talking, that systems on all sides are where they should be both independently and in concert with the other systems in your configuration. This step is something that I’ve seen more than a few people leave out. The consideration too often is just the ability to restore on one side or the other of the “where do things live” equation.

I’ve always suggested you have a process of testing after a restore, so you know things are back working correctly. This is still true. Be sure you know the process of restoring for your entire environment (and for independent pieces of it) and that you know what “done” is – how do you know it’s working correctly?

Many times this is a report, or some other sort of aggregate processing that touches on the different pieces to get its results. But know this ahead of time, and yes, you still need to know how to recover and restore and bring systems back.

Stuff happens. Be prepared. Still.