Redundancy is the linchpin of resilience. Organizations commonly implement redundant IT infrastructure to ensure the availability of applications in the event of hardware failure or network outage. The same principles apply to cloud applications. Although public cloud services are highly reliable, major cloud providers have had recent service interruptions lasting several hours or more, such as the Amazon Web Services (AWS) outage in December 2021. In that event, several web service providers that run their applications in the AWS cloud experienced service interruptions.
A multi-hour outage could be devastating to an organization that cannot afford any downtime, with limited compensation from the service provider. Cloud providers’ service-level agreements (SLAs) typically apply to their services, not their customers’ applications. Distributing cloud applications across multiple virtual machines (VMs), availability zones or regions can reduce the risk of downtime in the event of a service provider outage. There are multiple ways to design a distributed cloud service, with varying costs and levels of protection.