Increasingly, as we’re working with people to help out with infrastructure and with setup and considerations for their servers and solutions (wow, that’s a long breath), the trend is about handing off to cloud providers…
That’s all well and good, but there’s a bit a devil in the details, so to speak. One of the things that seems to be missed in so many of the plans and processes to get things either on- or off-premise and so-on is monitoring. It seems like so many people are moving things to this or that server, to this or that platform and forgetting that, once it’s set up, it needs to be managed and monitored. There is so much effort that goes into bringing up that big data processing monster application, or that analytics tool or those reporting solutions… to have them up and running, then lose them because things broke down would be a real issue.
With cloud, it’s a completely different game when it comes to monitoring and proactive response when compared to things you do on-premise. On-premise, there are native and third-party tools that you can use – we’ve done several sessions on options and have talked about all sorts of different ways you can approach it. When it comes to your cloud provider, there are built-in tools, and there are some third-party tools, but the variables seem to come in when you have to take into consideration the type of monitoring and tools you want.
Monitoring a VM-based instance is pretty straightforward. But once you move to a “platform as a service” solution or a mixed solution, things get interesting. Add in that you may actually be using more than one application in your overall environment (different databases in our specific cases) and things get interesting. The tools provided by the providers seems to be focused on instance health, and have other facets as well, but you need to set them up, and you need to understand what they’re reporting on, and what it means.
We’ve been bit a few times not knowing that new specific points of monitoring (new vs. on-premise) were needed to keep track of things like performance, utilization and such. You find out by trial and error and sometimes a flat-out slap upside the head when you miss a specific point of information.
The point in all of this is that it’s critical to make sure you understand the tools provided by your hosting solutions. It’s also important to figure out what the levers are, and what the limiters are for your chosen solutions. Do they lock down connections? Memory? CPU utilization? Storage? How does your solution respond when these resources start red-lining? How does your solution respond if some related thing goes down?
One example of this is when S3 has gone down at Amazon. So many things rely on it for storage; it can be surprising that it’s critical to your solutions, but having an alert that something is wrong might be really helpful and let you start responding or notifying your stakeholders more quickly.
Knowing what happens and how the provider responds if your solution starts overpowering what you have in place will help you drive toward things you want to monitor. If CPU, storage and connections are the determiners, you can use that information to set up appropriate monitoring.
The issue we’ve been seeing is that people assume the cloud is “set it and forget it” – it’ll just run. While that’s a lot of what you’re paying for – not having to manage it – at the same time, stuff happens. Flash crowds, surprise workloads, bugs (gasp!) and more. It’s just as important to have the right things in place for your on-premise infrastructure as it is for your off-premise infrastructure.