Creating a disaster recovery plan with Azure in 5 steps

Ben Franklin | 21 Jun 2021

Planning for the worst is one of the best assurances for protecting your data from attack or loss. But it can easily get deprioritised as leadership focuses on growth. To help you get back to basics, we’ve put together a five step how-to for creating a disaster recovery plan with Azure.

Turn it on

We put this first because enabling disaster recovery in Azure is something that has to be done manually, like flipping a digital switch. So before anything else happens, make sure some kind of backup is happening, even if a fully fleshed out disaster recovery plan is not yet in place. That way, at least you’ll have something to work with in case of an emergency while you’re working on a more structured plan.

Backups are automated in applications, which is why you can recover files and data you were working on a moment ago when the app crashes. But on virtual machines, you have to turn it on. So, step one: check to make sure backup is enabled for your VMs. In Azure, it’s done in VM Settings.

Make a protocol

As soon as the most basic backup is turned on, it’s time to get a proper plan in place. Relying on having the backup enabled and thinking the job’s done is a mistake. A specific, coherent recovery plan that is documented and tested regularly is a must to ensure business continuity and to preserve mission critical aspects in the event of a service outage.

If the term disaster recovery feels too dramatic for you, call it IT risk mitigation. If something - anything at all - causes your database to fail, your disaster recovery plan will ensure that data losses are minimal and service downtime is as short as possible. But the key takeaway here is to have a plan, to communicate it to everyone who needs it in the company, and to test it regularly, at least annually.

Just as with any risk mitigation protocol (such as a fire evacuation plan), your IT disaster recovery plan should be written down, communicated clearly to responsible managers, and stored securely. That way, in the event of needing it, everyone will know what to do, and any disruption will be as brief as possible. But disaster recovery isn’t just about what actions to take once something happens. It’s a series of ongoing actions. It’s something to do now, and to continue doing all the time.

One of the most important first steps in the plan is to determine which aspects of the business are mission critical. In other words, which parts of your IT, if lost, would be catastrophic for the business? This could be irretrievable data loss, a security breach, a shutdown of software on the factory floor—anything that would grind things to a halt, possibly forever. Once identified, mitigating actions that specifically target them can be put in place.

Schedule and storage

The plan should account for how frequently you back up the database, how much of it you back up, where that backup is stored and for how long. The point of the backups is to allow you to return to the last point before the service outage occurred and restore everything to how it was at that time. If you are only backing up once a week, say on a Friday, and something happens on Thursday night, you’ll lose an entire week’s worth of data. If you have clients, you’ll lose their data, as well.

A better system is to divvy things up. Web assets can be backed up daily, as they don’t change as often as the database. But the database should have full, transaction and log backups more often, even hourly, to allow you to restore everything almost to the moment that things went wrong.

Smart backup storage accounts for unpredictability, and it lets you return the database to a relatively precise moment in time. If a natural disaster cuts power, it might be obvious at what time things went awry. But if you’ve had a malicious attack from malware or infection, it may take days before you realise. In a case like this, you will still want to restore the database to a point in time before the problem began. So you need to ensure you’re storing backups for long enough, at least two weeks’ worth.

Let Azure do the heavy lifting

Full server backups are huge, which means storing them on the cloud is the only practical possibility. Additionally, because restoring the full server from a backup disk would take a very long time, a better option is to set up virtual machine failover. This replicates your VM to another server somewhere else so that, in the event of a server dropping severely or cutting out completely - catastrophic natural disaster or human attack - Azure can power up a copy of the virtual machine on another server.

Azure’s global reach really comes to the fore in server failover. Azure’s massive network of servers allows it to divert traffic all the time for load balancing purposes, which helps keep performance running at peak. Failover is similar but instead of diverting traffic to various places, the entire site is replicated elsewhere.

Failover can be set up so that servers are geographically near each other - in the same Azure zone - but housed separately to limit the likelihood that both are lost. But there is also the possibility to set up zone-to-zone failover. This pushes the backup a bit further afield, while still keeping it within the geographical region. Additionally, Azure offers region-to-region failover. For example, an international company operating in both Germany and Dubai could have their virtual machines operating in one region and replicated in the other. This keeps things close to either team, which can benefit performance. You can configure the failover to route the way you want.

Don’t make promises you can’t keep

Lastly, try not to get carried away with marketing your uptime. No one can guarantee anything 100%, even with a great disaster recovery plan. Even Azure doesn’t promise 100% uptime.

It’s better to be realistic about what’s possible both with your clients and with your team. It can be a driving factor for evolving the disaster recovery plan each year at its review. It’s also a reminder to establish user roles and controls in place on the system, and to properly train team members on the software to help prevent human error.

Need more assistance with your disaster recovery plan in Azure? Speak to Ben.