The first time this happened, I didn’t believe it was real – someone deleted their production environment… oops – and no recovery process in sight?
But it’s happened again… at least apparently. Perhaps it’s on of those “fake news” stories used for publicity or whatever, in which case it’s just something to make sure can’t happen to you. If it’s for real, just… wow.
You may recall that there was a prior story some time ago about a hosting company that dropped all of their web site accounts from their servers, presumably with a single command line typo. No backups. I’m fairly certain that company is not hosting any more.
But this is similar, and happening… again? It seems to me that incremental backups would be available, it seems like there would be protections in place. I get that, with the right privileges, you can indeed drop massive amounts of data and destroy systems.
I think at some point in nearly every database person’s life, a where clause has been missed or entered incorrectly. OK, certainly not ME, but, you know, others. Those are the times when you realize that ice can instantly flow through your veins, that you can panic in a flash and realize that the delete or update statement you just issued had more substantial impact than intended.
But backups people! And knowledge of how to restore them and the ability to bring things back online. To be fair, they mention that they do have backups. And the fact that they’re still offline as of this writing probably indicates that they’re triaging the issue and trying to see what they can recover without having to drop back to those backups.
I would suggest, though, that this is a case of a “how much can you lose” window misestimation. Clearly it was a big deal, clearly the window of data loss was punitive enough that they didn’t just restore and be happy with the results. They are working (I assume, I have no insider information) to mitigate the losses and the backups are the last resort, it seems.
I’m not pointing fingers too badly, and I’m not one to throw stones. It’s just that there is a huge lesson to learn here. It’s critical and important and just plain a key responsibility to think through your recovery window and what you can, and cannot, lose. Make sure you can recover. Make sure you can recover in a way that makes sense to your environment, your users, your stakeholders. If you cannot, make sure they know it, and that you have plans.
From my experience, transparency is extremely important here. If your users and stakeholders know what to expect, and you know how to execute, if and when the time comes, you won’t be left scrambling trying to make up for unexpected results. Instead, you’ll be executing on the plan you have in place and people will know what to expect. That might be a data entry situation to recover the last little bit of information lost in a transactional environment, or it might be a longer recovery time to allow for incremental restores up to a very short recovery window. Either way, transparency and planning and testing are your friends.
Of course it could also lead to more budget to have a better solution in place.