Both disaster recovery (DR) and high availability (HA) are designed to give you (and your customers and workers) almost continuous access to your data and services. You probably already know the difference between them, as they rely on different underlying policies and technologies. When executives look at budget priorities, however, they very likely see DR and HA as two different words for the same thing—and then choose only one priority to invest in.
If you focus on high availability without investing in disaster recovery, your system will buckle during unplanned downtime. On the other hand, investing in disaster recovery without investing in high availability means that it will be harder to meet your mission-critical service-level agreements (SLAs). As such, it’s good to have this explanation in your back pocket.
Disaster Recovery and High Availability Focus on Different Problems
In a nutshell, disaster recovery is a technology that’s based primarily on storage, while high availability is a technology that’s based mostly on networking.
With disaster recovery, you’re assuming that one day something terrible will happen to your primary data storage, whether it’s at an offsite data center or in your home office. As such, you want to create tightly-compressed (“cold”) copies of your most important data and applications stored somewhere offsite. In the event of a massive failure, the hope is that this compressed storage will be out of range of whatever calamity—asteroid, earthquake, forest fire—happens to strike.
In addition, businesses rely on disaster recovery systems that help them hit recovery time objectives–basically, the amount of time a system is allowed to be down. For many systems, this time is just seconds. This means that in addition to offsite backups, businesses need to store “hot” data—barely-compressed duplicates of data that were created just minutes or seconds ago during the application’s runtime.
High availability systems often make use of the same “hot” backups as disaster recovery. Life happens to servers—they overheat, their parts break, someone accidentally trips over the power cable, etc. Similar things happen to networks. Shutting down a malfunctioning workload and restoring from a recent backup is much faster than waiting for a technician to diagnose and fix an issue. This can also help rectify small errors that have the potential to cascade and cause outages.
If one server goes down, then the HA system spins up another server running the same workload. If a network goes down, then the HA system finds a different route going the same place. The goal is to use hardware redundancies (RAID arrays, uninterruptible power supplies) and software redundancies (decomposed applications, self-healing orchestration) to ensure that users and customers never notice an interruption in service.
To summarize, disaster recovery systems operate from the standpoint of a network or data center that has experienced an outage, with the goal of getting back up and running as fast as possible. Meanwhile, high availability operates from the standpoint of a network or data center that is running, with the goal of mitigating errors before they turn into outages.
High Availability and Disaster Recovery Reinforce One Another
To put it a different way, disaster recovery often makes use of the systems underpinning high availability, and vice versa. Any high availability system must be able to fail over, and it usually does so from stored assets within the disaster recovery solution. Meanwhile, any disaster recovery solution will rely on high availability systems in order to meet its recovery time objective.
Ideally, companies would create solutions where HA and DR reinforce one another, as opposed to being relegated to separate teams competing for budget. How does this look?
One way is to use ITAM tools to map a data center and its hardware and application dependencies. Because mission-critical systems can be highly complicated, it’s important to capture every component when you back them up. Mapping the data center allows you to do that. You also want to plan for failures in systems such as HVAC, cooling, and power. ITAM lets you understand whether these systems are as redundant as you hope they are.
Meanwhile, application dependencies can also be complicated. Systems that fail need to be restored in a specific order and using specific procedures. Mapping the ways that your applications rely on each other and their underlying infrastructure can help you create and automate procedures that will get them back online that much faster, letting you meet your critical SLAs.
For more information on Device42 and how we can help you create and improve your disaster recovery and high availability systems, feel free to download our 30-day free trial today!