Redundancy, Resilience, Recoverability

Redundancy, Resilience and Recoverability of Cloud Workloads

Aug 24, 2023

Recently, I had a talk with a tech expert about this topic and realised that there are many choices available. It's easy to get confused by all these terms. You might set up really strong backup systems (and pay a lot for them), but if you forget to set up the system in a way that it can keep running smoothly or recover on its own, then the service might not work so well.

Think of redundancy, resilience, and recoverability as parts of a plan to keep your business going. You need to consider all of them together to make sure your service works like it should and you can get back lost data.

Redundancy (sometimes referred to as Recovery Point Objective or RPO) guarantees the presence of backup components or processes to forestall unacceptable data loss. Greater redundancy corresponds to reduced data loss during the service recovery phase.

Resilience (alternatively labelled as Availability or governed by Service Level Agreements) ensures that your system can gracefully endure adverse conditions without incurring significant downtime. Augmented resilience correlates with heightened service uptime.

Recoverability (alternatively termed as Recovery Time Objective or RTO) guarantees that in the event of a worst-case scenario, automated mechanisms are in place to reinstate normal operations. Examples encompass setting up High Availability Amazon RDS instances or employing Azure Virtual Machine Scale Sets to automatically restore services upon hardware failure.

Collectively, these three concepts are fundamental to ensuring business continuity of cloud workloads. In the broader context, Business Continuity Plan (BCP) is an essential part of the planning process for CIOs and CTOs. That means service continuity requirements for any cloud workloads (or better Applications) are already defined.

The complexity is always in the connection between higher level Enterprise Architecture and BCP from one side and actual configuration of workloads on the other side. With the rapidly changing technology landscape and progression towards agile enterprise the responsibility for ensuring business continuity is also shifting left towards product teams. There are no tools that can ensure end-to-end visibility and governance of the BCP.

One of the key roles of cloud architects is to support development teams to choose the right approach and architecture to balance security, cost, availability and other aspects of the service.

Further reading:

Would you like to leave comments or share your likes? You can do so on the version of this article published on LinkedIn.

© 2023-08-24, Farid Gurbanov