Deploy a cold recoverable web server with regional persistent disks

Last reviewed 2021-08-04 UTC

The document describes how to deploy a cold failover topology for a web server by using a managed instance group with regional persistent disks. This document is intended for architects and people who work in operations and administrative teams.

You create a managed instance group that runs a single VM with a regional persistent disk that stores data. An external Application Load Balancer directs users to the VM that runs in the managed instance group, as shown in the following diagram:

An external Application Load Balancer directs users to a single VM that runs in a managed instance group, and a regional persistent disk is attached to the VM.

If there's an instance failure, the managed instance group tries to recreate the VM in the same zone. If the failure is at the zone level, Cloud Monitoring or similar can let you know there's a problem and you manually create another managed instance group in another, working zone. In either failover scenario, the platform attaches the regional persistent disk to the new VM in the instance group.

In this document, you use the external IP address of the VM or the load balancer to view a basic page on the web server. This approach lets you test the cold failover pattern if you don't have a registered domain name, and without any DNS changes. In a production environment, create and configure a Cloud DNS zone and record that resolves to the external IP address assigned to the load balancer.

This scenario balances the cost difference of running multiple VMs with maintaining a certain level of data protection. Your costs are higher as you run a regional persistent disk that provides continuous replication of data between two zones in a region, but you minimize the risk of data loss if there's a failure at the zone level. To reduce your storage costs, consider deploying a cold recoverable web server using persistent disk snapshots instead.

The following table outlines some high-level differences in data protection options for cold recoverable approaches that use regional persistent disks or persistent disk snapshots. For more information, see High availability options using persistent disks.

	Regional persistent disks	Persistent disk snapshots
Data loss - recovery point objective (RPO)	Zero for a single failure, such as sustained outage in a zone or network disconnect.	Any data since the last snapshot was taken, which is typically one hour or more. The potential data loss depends on your snapshot schedule that controls how frequently snapshots are taken.
Recovery time objective (RTO)	Deployment time for a new VM, plus several seconds for the regional persistent disk to be reattached.	Deployment time for a new VM, plus time to create a new persistent disk from the latest snapshot. The disk create time depends on the size of the snapshot, and could take tens of minutes or hours.
Cost	Storage costs double as the regional persistent disk is replicated continuously to another zone.	You only pay for the amount of snapshot space consumed.
	For more information, see Disks and images pricing.

Objectives

Create a managed instance group to run a VM with a regional persistent disk.
Create an instance template and startup script.
Create and configure an external Application Load Balancer.
Test the cold web server failover with a replacement managed instance group.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

Before you begin

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

Enable the Compute Engine API.

Enable the API

Install the Google Cloud CLI.

To initialize the gcloud CLI, run the following command:

gcloud init

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

Enable the Compute Engine API.

Enable the API

Install the Google Cloud CLI.

To initialize the gcloud CLI, run the following command:

gcloud init

You can run the Google Cloud CLI in the Google Cloud console without installing the Google Cloud CLI. To run the gcloud CLI in the Google Cloud console, use the Cloud Shell.

Prepare the environment

In this section, you define some variables for your resource names and locations. These variables are used by the Google Cloud CLI commands as you deploy the resources.

Throughout this document, unless otherwise noted, you enter all commands in Cloud Shell or your local development environment.

Replace PROJECT_ID with your own project ID. If required, provide your own name suffix for resources, such as app.

Specify a region, such as us-central1, and two zones within that region, such as us-central1-a and us-central1-f. These zones define where the regional persistent disk and initial managed instance group is deployed and where you can manually fail over to if needed.
```
PROJECT_ID=PROJECT_ID
NAME_SUFFIX=app
REGION=us-central1
ZONE1=us-central1-a
ZONE2=us-central1-f
```

Create a VPC and subnet

To provide network access to the VMs, create a Virtual Private Cloud (VPC) and subnet. As the managed instance group works across zones within a single region, only one subnet is created. For more information on the advantages of the custom subnet mode to manage IP address ranges in use in your environment, see Use custom mode VPC networks.

Create the VPC with a custom subnet mode:
```
gcloud compute networks create network-$NAME_SUFFIX \
    --subnet-mode=custom
```
If you see a Cloud Shell prompt, authorize this first request to make API calls.

Create a subnet in the new VPC. Define your own address range, such as 10.1.0.0/20, that fits in your network range:

gcloud compute networks subnets create subnet-$NAME_SUFFIX-$REGION \
    --network=network-$NAME_SUFFIX \
    --range=10.1.0.0/20 \
    --region=$REGION

Create firewall rules

Create firewall rules to allow web traffic and health checks for the load balancer and managed instance groups:

gcloud compute firewall-rules create allow-http-$NAME_SUFFIX \
    --network=network-$NAME_SUFFIX \
    --direction=INGRESS \
    --priority=1000 \
    --action=ALLOW \
    --rules=tcp:80 \
    --source-ranges=0.0.0.0/0 \
    --target-tags=http-server

gcloud compute firewall-rules create allow-health-check-$NAME_SUFFIX \
    --network=network-$NAME_SUFFIX \
    --action=allow \
    --direction=ingress \
    --source-ranges=130.211.0.0/22,35.191.0.0/16 \
    --target-tags=allow-health-check \
    --rules=tcp:80

The HTTP rule allows traffic to any VM where the http-server tag is applied, and from any source using the 0.0.0.0/0 range. For the health check rule, default ranges for Google Cloud are set to allow the platform to correctly check the health of resources.

To allow SSH traffic for the initial configuration of a base VM image, scope the firewall rule to your environment using the --source-range parameter. You might need to work with your network team to determine what source ranges your organization uses.

Caution: We don't recommend using a broad 0.0.0.0/0 range that would allow all traffic. To scope traffic to a single IP address, use a /32 network mask, such as 35.230.62.163/32.

Replace IP_ADDRESS_SCOPE with your own IP address scopes:
```
gcloud compute firewall-rules create allow-ssh-$NAME_SUFFIX \
    --network=network-$NAME_SUFFIX \
    --direction=INGRESS \
    --priority=1000 \
    --action=ALLOW \
    --rules=tcp:22 \
    --source-ranges=IP_ADDRESS_SCOPE
```

After you create the firewall rules, verify that the three rules have been added:

gcloud compute firewall-rules list \
    --project=$PROJECT_ID \
    --filter="NETWORK=network-$NAME_SUFFIX"

The following example output shows the three rules have been correctly created:

NAME                    NETWORK      DIRECTION  PRIORITY  ALLOW
allow-health-check-app  network-app  INGRESS    1000      tcp:80
allow-http-app          network-app  INGRESS    1000      tcp:80
allow-ssh-app           network-app  INGRESS    1000      tcp:22

Create a regional persistent disk and VM

A regional persistent disk provides continuous replication of data between two zones in a region. Managed instance groups, that run in the same two zones as the regional persisted disk, can then attach the disk to a VM.

Create a 10 GiB SSD. Understand your storage needs and the associated costs of paying for the provisioned space, not consumed space. For more information, see persistent disk pricing.
```
gcloud compute disks create disk-$NAME_SUFFIX \
    --region $REGION \
    --replica-zones $ZONE1,$ZONE2 \
    --size=10 \
    --type=pd-ssd
```

Create a base VM with the attached regional persistent disk:

gcloud compute instances create vm-base-$NAME_SUFFIX \
    --zone=$ZONE1 \
    --machine-type=n1-standard-1 \
    --subnet=subnet-$NAME_SUFFIX-$REGION \
    --tags=http-server \
    --image=debian-10-buster-v20210721 \
    --image-project=debian-cloud \
    --boot-disk-size=10GB \
    --boot-disk-type=pd-balanced \
    --boot-disk-device-name=vm-base-$NAME_SUFFIX \
    --disk=mode=rw,name=disk-$NAME_SUFFIX,device-name=disk-$NAME_SUFFIX,scope=regional

You use parameters defined at the start of this document to name the VM and connect to the correct subnet. Names are also assigned from the parameters for the boot disk and data disk.

To install and configure the simple website, first connect to the base VM using SSH:
```
gcloud compute ssh vm-base-$NAME_SUFFIX --zone=$ZONE1
```

In your SSH session to the VM, create a script to configure the VM in an editor of your choice. The following example uses Nano as the editor:

nano configure-vm.sh

Paste the following configuration script into the file. Update the NAME_SUFFIX variable to match the value set at the start of this document, such as app:

#!/bin/bash

NAME_SUFFIX=app

# Create directory for the basic website files
sudo mkdir -p /var/www/example.com
sudo chmod a+w /var/www/example.com
sudo chown -R www-data: /var/www/example.com

# Find the disk name, then format and mount it
DISK_NAME="google-disk-$NAME_SUFFIX"
DISK_PATH="$(find /dev/disk/by-id -name "${DISK_NAME}" | xargs -I '{}' readlink -f '{}')"

sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard $DISK_PATH
sudo mount -o discard,defaults $DISK_PATH /var/www/example.com

# Install Apache
sudo apt-get update && sudo apt-get -y install apache2

# Write out a basic HTML file to the mounted persistent disk
sudo tee -a /var/www/example.com/index.html >/dev/null <<'EOF'
<!doctype html>
<html lang=en>
<head>
<meta charset=utf-8>
    <title>HA / DR example</title>
</head>
<body>
    <p>Welcome to a test web server with regional persistent disks!</p>
</body>
</html>
EOF

# Write out an Apache configuration file
sudo tee -a /etc/apache2/sites-available/example.com.conf >/dev/null <<'EOF'
<VirtualHost *:80>
        ServerName www.example.com

        ServerAdmin webmaster@localhost
        DocumentRoot /var/www/example.com

        ErrorLog ${APACHE_LOG_DIR}/error.log
        CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>
EOF

# Enable the Apache configuration file and reload service
sudo a2dissite 000-default
sudo a2ensite example.com.conf
sudo systemctl reload apache2

Write out the file and exit your editor. For example, in Nano you use Ctrl-O to write out the file, then exit with Ctrl-X.
Make the configuration script executable, then run it:
```
chmod +x configure-vm.sh
./configure-vm.sh
```
Exit the SSH session to the VM:
```
exit
```
Get the IP address of the VM and use curl to see the basic web page:
```
curl $(gcloud compute instances describe vm-base-$NAME_SUFFIX \
 --zone $ZONE1 \
 --format="value(networkInterfaces.accessConfigs.[0].natIP)")
```
The basic website is returned, as shown in the following example output:
```
<!doctype html>
<html lang=en>
<head>
<meta charset=utf-8>
 <title>HA / DR example</title>
</head>
<body>
 Welcome to a test web server with regional persistent disks!
</body>
</html>
```
This step confirms that Apache is configured correctly, and the page is loaded from the attached regional persistent disk. In the following sections, you create an image using this base VM and configure an instance template with a startup script.

Create a VM image and instance template

To create identical VMs that can be automatically deployed without additional configuration required, you use a custom VM image. This image captures the OS and Apache configuration. Each VM created in the managed instance group in the next steps uses this image.

Before you can create an image, you must stop the VM:

gcloud compute instances stop vm-base-$NAME_SUFFIX --zone=$ZONE1

Create an image of the base VM configured in the previous section:

gcloud compute images create image-$NAME_SUFFIX \
    --source-disk=vm-base-$NAME_SUFFIX \
    --source-disk-zone=$ZONE1 \
    --storage-location=$REGION

Create an instance template that defines the configuration for each VM:

gcloud compute instance-templates create template-$NAME_SUFFIX \
    --machine-type=n1-standard-1 \
    --subnet=projects/$PROJECT_ID/regions/$REGION/subnetworks/subnet-$NAME_SUFFIX-$REGION \
    --tags=http-server \
    --image=image-$NAME_SUFFIX \
    --region=$REGION \
    --metadata=^,@^startup-script=\!\#\ /bin/bash$'\n'echo\ UUID=\`blkid\ -s\ UUID\ -o\ value\ /dev/sdb\`\ /var/www/example.com\ ext4\ discard,defaults,nofail\ 0\ 2\ \|\ tee\ -a\ /etc/fstab$'\n'mount\ -a

The image created in the previous step is defined as the source for each VM, and a startup script is also defined that mounts the regional persistent disk.

Create a managed instance group

A managed instance group run the VMs. The managed instance group runs in a defined zone, and monitors the health of the VMs. If there's an instance failure and the VM stops running, the managed instance group tries to recreate another VM in the same zone and attaches the regional persistent disk. If the failure is at the zone level, you must manually perform the cold failover and create another managed instance group in a different zone. The same custom image and instance template automatically configures the VM in an identical way.

Create a health check to monitor the VMs in the managed instance group. This health check makes sure the VM responds on port 80. For your own applications, monitor the appropriate ports to check the VM health.
```
gcloud compute health-checks create http http-basic-check-$NAME_SUFFIX --port 80
```

Create a managed instance group, initially with zero VMs, that uses the instance template created in the previous step.

gcloud compute instance-groups managed create instance-group-$NAME_SUFFIX-$ZONE1 \
    --base-instance-name=instance-vm-$NAME_SUFFIX \
    --template=template-$NAME_SUFFIX \
    --size=0 \
    --zone=$ZONE1 \
    --health-check=http-basic-check-$NAME_SUFFIX

Create a single VM in the managed instance group and attach the regional persistent disk. If there's a failure of this VM, the managed instance group tries to recreate it in the same zone and reattach the persistent disk.
```
gcloud compute instance-groups managed create-instance instance-group-$NAME_SUFFIX-$ZONE1 \
    --instance instance-vm-$NAME_SUFFIX \
    --zone=$ZONE1\
    --stateful-disk device-name=disk-$NAME_SUFFIX,source=projects/$PROJECT_ID/regions/$REGION/disks/disk-$NAME_SUFFIX
```
Note: You can't have more than one VM running in the managed instance group at the same time. The regional persistent disk can only be connected to one VM at a time.

For this cold recoverable application scenario, don't create autoscale rules to increase the number of VMs that run in the managed instance group.

Create and configure a load balancer

For users to access your website, you need to allow traffic through to the VMs that run in the managed instance group. You also want to automatically redirect traffic to new VMs if there's a zone failure in a managed instance group.

In the following section, you create an external load balancer with a backend service for HTTP traffic on port 80, use the health check created in the previous steps, and map an external IP address through to the backend service.

For more information, see How to set up a simple external HTTP load balancer.

Create and configure the load balancer for your application:

# Configure port rules for HTTP port 80
gcloud compute instance-groups set-named-ports \
    instance-group-$NAME_SUFFIX-$ZONE1 \
    --named-ports http:80 \
    --zone $ZONE1

# Create a backend service and add the managed instance group to it
gcloud compute backend-services create \
    web-backend-service-$NAME_SUFFIX \
    --protocol=HTTP \
    --port-name=http \
    --health-checks=http-basic-check-$NAME_SUFFIX \
    --global

gcloud compute backend-services add-backend \
    web-backend-service-$NAME_SUFFIX \
    --instance-group=instance-group-$NAME_SUFFIX-$ZONE1 \
    --instance-group-zone=$ZONE1 \
    --global

# Create a URL map for the backend service
gcloud compute url-maps create web-map-http-$NAME_SUFFIX \
    --default-service web-backend-service-$NAME_SUFFIX

# Configure forwarding for the HTTP traffic
gcloud compute target-http-proxies create \
    http-lb-proxy-$NAME_SUFFIX \
    --url-map web-map-http-$NAME_SUFFIX

gcloud compute forwarding-rules create \
    http-content-rule-$NAME_SUFFIX \
    --global \
    --target-http-proxy=http-lb-proxy-$NAME_SUFFIX \
    --ports=80

Get the IP address of the forwarding rule for the web traffic:

IP_ADDRESS=$(gcloud compute forwarding-rules describe http-content-rule-$NAME_SUFFIX \
    --global \
    --format="value(IPAddress)")

Use curl, or open your web browser, to view the website using the IP address of the load balancer from the previous step:
```
curl $IP_ADDRESS
```
It takes a few minutes for the load balancer to finish deploying and to correctly direct traffic to your backend. An HTTP 404 or 502 error is returned if the load balancer is still deploying. If needed, wait a few minutes and try to access the website again.

The basic website is returned, as shown in the following example output:
```
<!doctype html>

<html lang=en>
<head>
<meta charset=utf-8>
 <title>HA / DR example</title>
</head>
<body>
 Welcome to a test web server with regional persistent disks!
</body>
</html>
```

Simulate a zone failure and recovery

Review the resource deployments before simulating a failure at the zone level. All of the resources have been created to support the environment shown in the following image:

An external Application Load Balancer directs users to a single VM that runs in a managed instance group, and a regional persistent disk is attached to the VM.

One VM with runs in a managed instance group, with a regional persistent disk attached to it that stores a basic website.
A startup script is applied to an instance template so any VMs created in the managed instance group mount the regional persistent disk.
A health check monitors the status of the VM inside the managed instance group.
The external Application Load Balancer directs users to the VM that runs in the managed instance group.
If the VM fails, the managed instance group tries to recreate a VM in the same zone. If the failure is at the zone level, you must manually create a replacement managed instance group in another, working zone.

In a production environment, you might get an alert using Cloud Monitoring or other monitoring solution when there's a problem. This alert prompts a human to understand the scope of the failure before you manually create a replacement managed instance group in another, working zone. An alternative approach is to use your monitoring solution to automatically respond to outages with the managed instance group.

When you or your monitoring solution determine the most appropriate action is to fail over, create a replacement managed instance group. In this document, you manually create this replacement resource.

To simulate a failure at the zone level, delete the load balancer backend and managed instance group:

gcloud compute backend-services remove-backend \
    web-backend-service-$NAME_SUFFIX \
    --instance-group=instance-group-$NAME_SUFFIX-$ZONE1 \
    --instance-group-zone=$ZONE1 \
    --global

gcloud compute instance-groups managed delete instance-group-$NAME_SUFFIX-$ZONE1 \
      --zone=$ZONE1

When prompted, confirm the request to delete the managed instance group.

In a production environment, your monitoring system generates an alert to now prompt for cold failover action.

Use curl or your web browser again to access the IP address of the load balancer:
```
curl $IP_ADDRESS --max-time 5
```
The curl request fails as there are no healthy targets for the load balancer.

To simulate the cold failover, create a managed instance group in a different zone:

gcloud compute instance-groups managed create instance-group-$NAME_SUFFIX-$ZONE2 \
    --base-instance-name=instance-vm-$NAME_SUFFIX \
    --template=template-$NAME_SUFFIX \
    --size=0 \
    --zone=$ZONE2 \
    --health-check=http-basic-check-$NAME_SUFFIX

gcloud compute instance-groups managed create-instance instance-group-$NAME_SUFFIX-$ZONE2 \
    --instance instance-vm-$NAME_SUFFIX \
    --zone=$ZONE2 \
    --stateful-disk device-name=disk-$NAME_SUFFIX,source=projects/$PROJECT_ID/regions/$REGION/disks/disk-$NAME_SUFFIX

The VM image, instance template, and regional persistent disk maintain all the configuration for the application instance.

Update the load balancer to add the new managed instance group and VM:

gcloud compute instance-groups set-named-ports \
    instance-group-$NAME_SUFFIX-$ZONE2 \
    --named-ports http:80 \
    --zone $ZONE2

gcloud compute backend-services add-backend \
    web-backend-service-$NAME_SUFFIX \
    --instance-group=instance-group-$NAME_SUFFIX-$ZONE2 \
    --instance-group-zone=$ZONE2 \
    --global

Use curl or your web browser one more time to access the IP address of the load balancer that directs traffic to the VM that runs in the managed instance group:
```
curl $IP_ADDRESS
```
It takes a few minutes for the VM to finish deploying and attach the regional persistent disk. If needed, wait a few minutes and try to access the website again.

The following example response shows the web page correctly running on the VM:
```
<!doctype html>
<html lang=en>
<head>
<meta charset=utf-8>
 <title>HA / DR example</title>
</head>
<body>
 Welcome to a test web server with regional persistent disks!
</body>
</html>
```

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

To delete the individual resources created in this document, complete the following steps.

Delete the load balancer configuration:

gcloud compute forwarding-rules delete \
    http-content-rule-$NAME_SUFFIX --global --quiet

gcloud compute target-http-proxies delete \
    http-lb-proxy-$NAME_SUFFIX --quiet

gcloud compute url-maps delete web-map-http-$NAME_SUFFIX --quiet

gcloud compute backend-services delete \
    web-backend-service-$NAME_SUFFIX --global --quiet

Delete the managed instance group and health check:

gcloud compute instance-groups managed delete instance-group-$NAME_SUFFIX-$ZONE2 \
    --zone=$ZONE2 --quiet

gcloud compute health-checks delete http-basic-check-$NAME_SUFFIX --quiet

Delete the instance template, image, base VM, and persistent disk:

gcloud compute instance-templates delete template-$NAME_SUFFIX --quiet

gcloud compute images delete image-$NAME_SUFFIX --quiet

gcloud compute instances delete vm-base-$NAME_SUFFIX --zone=$ZONE1 --quiet

gcloud compute disks delete disk-$NAME_SUFFIX --region=$REGION --quiet

Delete the firewall rules:

gcloud compute firewall-rules delete allow-health-check-$NAME_SUFFIX --quiet

gcloud compute firewall-rules delete allow-ssh-$NAME_SUFFIX --quiet

gcloud compute firewall-rules delete allow-http-$NAME_SUFFIX --quiet

Delete the static external IP address, subnet, and VPC:

gcloud compute networks subnets delete \
    subnet-$NAME_SUFFIX-$REGION --region=$REGION --quiet

gcloud compute networks delete network-$NAME_SUFFIX --quiet

What's next

For an alternative approach to reduce your storage costs, see Deploy a cold recoverable web server using persistent disk snapshots.
To learn how how to determine the best approach for your own applications and which recovery method to use, see the Disaster recovery planning guide.
To see other patterns for applications, such as cold and hot failover, see Disaster recovery scenarios for applications.
For more ways to handle scale and availability, see the Patterns for scalable and resilient apps.
Explore reference architectures, diagrams, and best practices about Google Cloud. Take a look at our Cloud Architecture Center.