Building a Microsoft SQL Server Always On Availability Group on Compute Engine

This white paper shows you how to build a Microsoft SQL Server Always On Availability Group on Google Compute Engine. This paper might be of interest if you’re a database administrator, Windows administrator, or developer, and you want to build a Microsoft SQL Server environment that’s highly available by using Always On Availability Groups.

In this paper, you'll learn how to:

  • Build your first Availability Group on Compute Engine.
  • Test your work with four failure simulations.
  • Tell whether your databases will work well on Compute Engine.

This paper skips a lot of common SQL Server setup tasks that don't change when installing on a virtual machine (VM) in the cloud.

You'll use PowerShell to accomplish some of the setup and configuration tasks, but you don’t need to be familiar with PowerShell in order to follow along.

About the architecture

You will build a 2-node, Always On Availability Group that uses two SQL Server virtual machines in the same zone.

To keep it simple, you're not going to deal with network security, VPNs, application servers, and so on, and you're not going to expose SQL Server to the Internet.

Always On Availability Groups (AGs) can make sense in the cloud for a few reasons:

  • Automatic failover of multiple databases.​ This is a common high-availability requirement, and on premises, DBAs usually solve this with failover-cluster instances (FCIs). Unfortunately, that requires shared storage, which isn’t usually available in today’s infrastructure-as-a-service (IaaS) cloud. There are some workarounds involving third-party software or UNC paths, but neither of those are great options in IaaS. Always On Availability Groups can pull this off without shared storage.
  • Automatic repair of corrupt data pages.​ Each replica has its own copy of the database’s data pages. Only logged transactions are sent from one replica to another, not data pages. When a replica encounters a corrupt data page, it can request a clean copy of the page from another replica, and heal itself live. It doesn’t protect from every kind of corruption, but it can be a real lifesaver in many situations.
  • Potentially lower downtime for patching.​ For minimal downtime, patch the secondary replica, make sure it patched successfully, then take a brief outage to fail over from the primary replica to the newly patched secondary. After that, you can patch the former primary instance. If something goes wrong when patching either replica, simply build a new one to replace it and join it into the AG.
  • Possibly easier scalability with lower downtime​. When you want to switch to a bigger instance type, you can provision a new one, add it into the AG, fail over to it, and then remove the previous instance. You can scale up and down to handle load with this approach, although this isn't commonly used on a scripted basis to handle daily peaks and valleys. It’s more of a seasonal approach.

Our 2-node Availability Group design can come with a few drawbacks:

  • Higher cost.​ Having a warm standby means basically doubling the Compute Engine costs.
  • More complexity​. This is a little harder to manage than a conventional single SQL Server VM and requires Windows clustering knowledge.
  • No disaster recovery​. If you want the ability to fail over to another zone or region, you need to add additional VMs, but you can do that later.

To read the full white paper, click the button:

Download the PDF

Send feedback about...