Editorials

Resilience Replaces Reliability

While watching the presentations today from Microsoft Virtual Training Academy on Modern Cloud Architectures one key word was emphasized; Resilience.

In terms of software architecture, the need for Resilience has overtaken the need for Reliability. Resilience is the ability to continue operations when something fails. Reliability is the ability to not fail. They demonstrated the concept by comparing the performance metrics we used previously to those used today. Historically we measured how long a server had remained operational. Today we measure instead how long it takes to handle failure.

The difference is a point of view. Historically we tried to keep things from failing. Currently, we expect things to fail, and build an architecture based on the expectation of failure. That doesn’t mean that we want things to break. It means that we don’t expect things to not break, and we are prepared to handle those instances.

A fully resilient system is not feasible. The more redundancy and failover you build into a system the greater the cost in development and maintenance. In order to manage the overwhelming concept of complete failover, the presentation demonstrated a technique to identify, weigh, and build an architecture with resilience that meets the business needs with the least cost and complexity.

You can find out more about the methodology for defining requirements for a Resilient system as presented in their white paper at http://aka.ms/resiliency.

As architectures continue to have different systems performing different tasks, resilience is going to become increasingly important. It will be worth your time to look more into the practice and decide how it impacts you and your company.

Cheers,

Ben