Alert Numbness – How Do You Cope?
When you get that shiny new monitoring software set up, often the first few steps you’ll take include setting up new alerts. "Let the system watch things and let me know how it’s going…" is the thought.
This is great, but you can quickly be faced with a stead flow of alerts, both informational and critical. Of course the general alert volume may be from informational alerts – but you don’t want to miss any. The first step? Set up a rule that funnels the alerts off into a folder for review and monitoring.
Now you’re monitoring the monitor folder. Ugh. This gets old pretty quickly. It gets to the point where you’re no longer monitoring alerts because there’s so much "noise" in the queue. It’s nearly impossible to get useful information, it’s times to re-evaluate how you’re doing this.
First, you want to avoid alert numbness. You do this by trimming alerts to things that require attention. In management, it’s sometimes called "managing by exception." You want to know when something really needs to be addressed, corrected, reviewed or some other action needs to be taken.
Kevin Kline was the presenter and he mentioned that he sometimes turns off the notification that backups completed successfully. This takes it out of the logs so you don’t have to stare at the constant message that "everything is ok." The same is true for the alerts. You’ll want to avoid "everything’s fine, nothing to see here" type alerts. Focus instead on failures. The "All Clear" alerts are great comfort but you’ll quickly grow tired of them. If you have them now, make sure you go through your alerts before you clean things up. Make sure you’ve seen what you need to see to this point, then you can start working through what you really want to be notified of going forward.
If you’re interested, that webinar, which is free for all to attend, is available here on-demand for a week.
Second, make sure you have truly critical items alerting, that you have evidence-style information gathered and logged (what happened? is an easier question to answer if you have logs in place and can look for blocking or other issues you may be working through). Make sure you have a system in place that will send you meaningful information about what’s happening and provide the tools to understand what is going on. This way, when you get a notification that a job failed, you can investigate, look at the situation with a full view and figure out what is needed.
So – make sure you trim your alerts to "need to know" type information and make sure you are gathering up the right bits of information along the way. There are some great tools out there that help in this area – you don’t have to go it alone. Monitoring, alerting and even suggested solutions are all available.