Editorials

There’s Monitoring and then There’s Monitoring

I think I need to clarify some terms when it comes to monitoring your systems. I find there may be two different kinds of monitoring.

  1. Monitoring for your system is coming down. User activities are very slow, or dropped on the floor
  2. Monitoring the continuous operations of your ENTIRE system, from the users perspective

There are great tools, as David Eaton shares, from third party vendors which monitor things like server CPU capacity, queries gone bad, what processes are causing issues, memory allocation, disk utilization, and much, MUCH more. All of the tools he presents are focused on Databases and Database services.

The issue is that a user doesn’t care if your database is humming along nicely, if they can’t use their application due to some other reason.

There are any number of components, usually on multiple servers, that can go down or lose performance. Routers, Switches, Encryption Devices, Load Balancers, Network Cards, Servers, Disk Drives, Anti Virus, Denial Of Service Attempts, Application Software, internet connectivity if you have a world facing application, and the list goes on. All of these things (if present) impact the final user experience.

You need both kinds of monitoring when possible for these different kinds of devices. It is essential to monitor trends on you entire application infrastructure. Trends are the things that you mine in order to be proactive as you see signs of degradation at any tier of your application. This is a monitoring tool. It is not an alerting tool. You have to look at the results, identify trending down statistics, and determine if they need to be addressed.

An example of a reactive type of monitoring would be a bot that checks to see if it can get to your web site on a scheduled basis. It doesn’t tell you if your site is getting slower in response time. It tells you if it was able to connect to your site before timing out. This is essential monitoring, and you want it to notify you immediately.

In contrast, you can benefit from a bot that exercises you web site, and monitors the duration for response times. Tracking these times results in a trend allowing you to start looking for what adjustments need to be made. If you have monitoring on the different layers, you can quickly identify the bottleneck and begin remediation.

You need both kinds of monitoring. They aren’t as black and white as I have described them, sometimes overlapping in capabilities.

Am I missing something here? I’m not the expert on everything, so please feel free to share your perspective in comments. I really appreciate constructive criticism. So, get into the conversation if you think I’m off base here.

Cheers,

Ben