This interactive tutorial shows how to use autohealing to build highly available apps on Compute Engine.
Highly available apps are designed to serve clients with minimal latency and downtime. Availability is compromised when an app crashes or freezes. Clients of a compromised app can experience high latency or downtime.
Autohealing lets you automatically restart apps that are compromised. It promptly detects failed instances and recreates them automatically, so clients can be served again. With autohealing, you no longer need to manually bring an app back to service after a failure.
Objectives
- Configure a health check and an autohealing policy.
- Set up a demo web service on a managed instance group.
- Simulate health check failures and witness the autohealing recovery process.
Costs
This tutorial uses billable components of Google Cloud including:- Compute Engine
Before you begin
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Make sure that billing is enabled for your Google Cloud project.
Enable the Compute Engine API.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Make sure that billing is enabled for your Google Cloud project.
Enable the Compute Engine API.
If you prefer to work from the command line, install the Google Cloud CLI.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
App architecture
The app includes the following Compute Engine components:
- Health check: an HTTP health check policy used by the autohealer to detect failed VM instances.
- Firewall rules: Google Cloud firewall rules let you allow or deny traffic to your instances.
- Managed instance group: A group of instances running the same demo web service.
- Instance template: A template used to create each instance in the instance group.
How the health check probes the demo webservice
A health check sends probe requests to an instance using a specified protocol, such as HTTP(S), SSL, or TCP. For more information, see how health checks work and health check categories, protocols, and ports.
The health check in this tutorial is an HTTP health check that probes the HTTP
path /health
on port 80. For an HTTP health check, the probe request passes
only if the path returns an HTTP 200 (OK)
response. For this tutorial, the
demo web server defines the path /health
to return an HTTP 200 (OK)
response
when healthy or an HTTP 500 (Internal Server Error)
response when unhealthy.
For more information, see
success criteria for HTTP, HTTPS, and HTTP/2.
Create the health check
To set up autohealing, create a custom health check and configure the network firewall to allow health check probes. You can use either a regional or a global health check. Regional health checks reduce cross-region dependencies and help to achieve data residency. Global health checks are convenient if you want to use the same health check for MIGs in multiple regions. In this tutorial, you create a global health check.
Console
Create a health check.
In the Google Cloud console, go to the Health checks page.
Click Create health check.
In the Name field, enter
autohealer-check
.Set the Scope to
Global
. For autohealing, you can use either a regional or a global health check.For Protocol select
HTTP
.Set Request path to
/health
. This indicates what HTTP path the health check uses. For this tutorial, the demo web server defines the path/health
to return either anHTTP 200 (OK)
response when healthy or anHTTP 500 (Internal Server Error)
response when unhealthy.Set the Health criteria:
- Set Check interval to
10
. This defines the amount of time from the start of one probe to the start of the next one. - Set Timeout to
5
. This defines the amount of time that Google Cloud waits for a response to a probe. This value must be less than or equal to the check interval. - Set Healthy threshold to
2
. This defines the number of sequential probes that must succeed for the instance to be considered healthy. - Set Unhealthy threshold to
3
. This defines the number of sequential probes that must fail for the instance to be considered unhealthy.
- Set Check interval to
Click Create at the bottom.
Create a firewall rule to allow health check probes to make HTTP requests.
In the Google Cloud console, go to the Create firewall rule page.
For Name, enter
default-allow-http-health-check
.For Network, select
default
.For Targets, select
All instances in the network
.For Source filter, select
IP ranges
.For Source IP ranges, enter
130.211.0.0/22
and35.191.0.0/16
.In Protocols and ports, select tcp and enter
80
.Click Create.
gcloud
Create a global health check using the
health-checks create http
command.gcloud compute health-checks create http autohealer-check \ --global \ --check-interval 10 \ --timeout 5 \ --healthy-threshold 2 \ --unhealthy-threshold 3 \ --request-path "/health"
check-interval
defines the amount of time from the start of one probe to the start of the next one.timeout
defines the amount of time that Google Cloud waits for a response to a probe. This value must be less than or equal to the check interval.healthy-threshold
defines the number of sequential probes that must succeed for the instance to be considered healthy.unhealthy-threshold
defines the number of sequential probes that must fail for the instance to be considered unhealthy.request-path
indicates what HTTP path the health check uses. For this tutorial, the demo web server defines the path/health
to return either anHTTP 200 (OK)
response when healthy or anHTTP 500 (Internal Server Error)
response when unhealthy.
Create a firewall rule to allow health check probes to make HTTP requests.
gcloud compute firewall-rules create default-allow-http-health-check \ --network default \ --allow tcp:80 \ --source-ranges 130.211.0.0/22,35.191.0.0/16
What makes a good autohealing health check
Health checks used for autohealing should be conservative so they don't preemptively delete and recreate your instances. When an autohealer health check is too aggressive, the autohealer might mistake busy instances for failed instances and unnecessarily restart them, reducing availability.
unhealthy-threshold
. Should be more than1
. Ideally, set this value to3
or more. This protects against rare failures like a network packet loss.healthy-threshold
. A value of2
is sufficient for most apps.timeout
. Set this time value to a generous amount (five times or more than the expected response time). This protects against unexpected delays like busy instances or a slow network connection.check-interval
. This value should be between 1 second and two times the timeout (not too long nor too short). When a value is too long, a failed instance is not caught soon enough. When a value is too short, the instances and the network can become measurably busy, given the high number of health check probes being sent every second.
Set up the web service
This tutorial uses a web app that is stored on GitHub. If you would like learn more about how the app was implemented, see the GoogleCloudPlatform/python-docs-samples GitHub repository.
To set up the demo web service, create an instance template that launches the demo web server on startup. Then, use this instance template to deploy a managed instance group and enable autohealing.
Console
Create an instance template. Include a startup script that starts up the demo web server.
In the Google Cloud console, go to the Instance templates page.
Click Create instance template.
Set the Name to
webserver-template
.For Machine configuration select
micro
(e2-micro).Under Firewall, select the Allow HTTP traffic checkbox.
Click Management, security, disks, networking, sole tenancy to reveal advanced settings. Several tabs appear.
On the Management tab, find Automation and enter the following Startup script:
sudo apt update && sudo apt -y install git gunicorn3 python3-pip git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git cd python-docs-samples/compute/managed-instances/demo sudo pip3 install -r requirements.txt sudo gunicorn3 --bind 0.0.0.0:80 app:app --daemon
Click Create.
Deploy the web server as a managed instance group.
In the Google Cloud console, go to the Instance groups page.
Click Create instance group.
Set the Name to
webserver-group
.For Region, select
europe-west1
.For Zone, select
europe-west1-b
.For Instance template, select
webserver-template
.For Autoscaling, select Don't autoscale.
Set Number of instances to
3
.For Health check, select
autohealer-check
.Set Initial delay to
90
.Click Create.
Create a firewall rule that allows HTTP requests to the web servers.
In the Google Cloud console, go to the Create firewall rule page.
For Name, enter
default-allow-http
.For Network, select
default
.For Targets, select
Specified target tags
.For Target Tags, enter
http-server
.For Source filter, select
IP ranges
.For Source IP ranges, enter
0.0.0.0/0
.In Protocols and ports, select tcp and enter
80
.Click Create.
gcloud
Create an instance template. Include a startup script that starts the demo web server.
gcloud compute instance-templates create webserver-template \ --machine-type e2-micro \ --tags http-server \ --metadata startup-script=' sudo apt update && sudo apt -y install git gunicorn3 python3-pip git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git cd python-docs-samples/compute/managed-instances/demo sudo pip3 install -r requirements.txt sudo gunicorn3 --bind 0.0.0.0:80 app:app --daemon'
Create an instance group.
gcloud compute instance-groups managed create webserver-group \ --zone europe-west1-b \ --template webserver-template \ --size 3 \ --health-check autohealer-check \ --initial-delay 90
Create a firewall rule that allows HTTP requests to the web servers.
gcloud compute firewall-rules create default-allow-http \ --network default \ --allow tcp:80 \ --target-tags http-server
Simulate health check failures
To simulate health check failures, the demo web server provides ways for you to force a health check failure.
Console
Navigate to a web server instance.
In the Google Cloud console, go to the VM instances page.
Under the External IP column, click the IP address for any
webserver-group
instance. A new tab opens in your web browser. If the request times out or the web page is not available, wait a minute to let the server finish setting up and try again.
The demo web server displays a page similar to the following:
On the demo web page, click Make unhealthy.
This causes the web server to fail the health check. Specifically, the web server makes the
/health
path return anHTTP 500 (Internal Server Error)
. You can verify this yourself by quickly clicking the Check health button (this stops working after the autohealer has started rebooting the instance).Wait for the autohealer to take action.
In the Google Cloud console, go to the VM instances page.
Wait for the status of the web server instance to change. The green checkmark next to the instance name should change to a grey square, indicating that the autohealer has started rebooting the unhealthy instance.
Click Refresh at the top of the page periodically to get the most recent status.
The autohealing process is finished when the grey square changes back to a green checkmark, indicating the instance is healthy again.
gcloud
Monitor the status of the instance group. (When you have finished, stop by pressing
Ctrl+C
.)while : ; do \ gcloud compute instance-groups managed list-instances webserver-group \ --zone europe-west1-b \ ; done
NAME ZONE STATUS ACTION INSTANCE_TEMPLATE VERSION_NAME LAST_ERROR webserver-group-d5tz europe-west1-b RUNNING NONE webserver-template webserver-group-q6t9 europe-west1-b RUNNING NONE webserver-template webserver-group-tbpj europe-west1-b RUNNING NONE webserver-template
If any instances show a status that is not
RUNNING
, such asSTAGING
, wait a minute to let the instance finish setting up and try again.Open a new Cloud Shell session with the Google Cloud CLI installed.
Get the address of a web server instance.
gcloud compute instances list --filter webserver-group
Under the
EXTERNAL_IP
column, copy the IP address of any web server instance and save it as a local bash variable.export IP_ADDRESS=EXTERNAL_IP_ADDRESS
Verify the web server has finished setting up. The server returns an
HTTP 200 OK
response.curl --head $IP_ADDRESS/health
HTTP/1.1 200 OK Server: gunicorn/19.6.0 ...
If you get a
Connection refused
error, wait a minute to let the server finish setting up and try again.Make the web server unhealthy.
curl $IP_ADDRESS/makeUnhealthy > /dev/null
This causes the web server to fail the health check. Specifically, the web server makes the
/health
path return anHTTP 500 INTERNAL SERVER ERROR
. You can verify this yourself by quickly making a request to/health
(this stops working after the autohealer has started rebooting the instance).curl --head $IP_ADDRESS/health
HTTP/1.1 500 INTERNAL SERVER ERROR Server: gunicorn/19.6.0 ...
Return to your first shell session to monitor the instance group and wait for the autohealer to take action.
When the autohealing process has started, the
STATUS
andACTION
columns update, indicating that the autohealer has started rebooting the unhealthy instance.NAME ZONE STATUS ACTION INSTANCE_TEMPLATE VERSION_NAME LAST_ERROR webserver-group-d5tz europe-west1-b RUNNING NONE webserver-template webserver-group-q6t9 europe-west1-b RUNNING NONE webserver-template webserver-group-tbpj europe-west1-b STOPPING RECREATING webserver-template
The autohealing process has finished when the instance again reports a
STATUS
ofRUNNING
and anACTION
ofNONE
, indicating the instance is successfully restarted.NAME ZONE STATUS ACTION INSTANCE_TEMPLATE VERSION_NAME LAST_ERROR webserver-group-d5tz europe-west1-b RUNNING NONE webserver-template webserver-group-q6t9 europe-west1-b RUNNING NONE webserver-template webserver-group-tbpj europe-west1-b RUNNING NONE webserver-template
When you have finished monitoring the instance group, stop by pressing
Ctrl+C
.
Feel free to repeat this exercise. Here are some ideas:
What happens if you make all instances unhealthy at one time? For more information about autohealing behavior during concurrent failures, see autohealing behavior.
Can you update the health check configuration to heal instances as fast as possible? (In practice, you should set the health check parameters to use conservative values as explained in this tutorial. Otherwise, you may risk instances being mistakenly deleted and restarted when there is no real problem.)
The instance group has an
initial delay
configuration setting. Can you determine the minimum delay needed for this demo web server? (In practice, you should set the delay to somewhat longer (10%–20%) than it takes for an instance to boot and start serving app requests. Otherwise, you risk the instance getting stuck in an autohealing boot loop.)
View autohealer history (optional)
To view a history of autohealer operations use the following
gcloud
command:
gcloud compute operations list --filter='operationType~compute.instances.repair.*'
For more information, see viewing historical autohealing operations
Clean up
After you finish the tutorial, you can clean up the resources that you created so that they stop using quota and incurring charges. The following sections describe how to delete or turn off these resources.
If you created a separate project for this tutorial, delete the entire project. Otherwise, if the project has resources that you want to keep, only delete the specific resources created in this tutorial.
Deleting the project
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Deleting specific resources
If you can't delete the project used for this tutorial, delete the tutorial resources individually.
Deleting the instance group
console
- In the Google Cloud console, go to the Instance groups page.
-
Select the checkbox for
your
webserver-group
instance group. - To delete the instance group, click Delete.
gcloud
gcloud compute instance-groups managed delete webserver-group --zone europe-west1-b -q
Deleting the instance template
console
In the Google Cloud console, go to the Instance templates page.
Click the checkbox next to the instance template.
Click
Delete at the top of the page. In the new window, click Delete to confirm the deletion.
gcloud
gcloud compute instance-templates delete webserver-template -q
Deleting the health check
console
In the Google Cloud console, go to the Health checks page.
Click the checkbox next to the health check.
Click
Delete at the top of the page. In the new window, click Delete to confirm the deletion.
gcloud
gcloud compute health-checks delete autohealer-check -q
Deleting the firewall rules
console
In the Google Cloud console, go to the Firewall rules page.
Click the checkboxes next to the firewall rules named
default-allow-http
anddefault-allow-http-health-check
.Click
Delete at the top of the page. In the new window, click Delete to confirm the deletion.
gcloud
gcloud compute firewall-rules delete default-allow-http default-allow-http-health-check -q
What's next
- Try another tutorial:
- Learn more about managed instance groups.
- Learn more about designing robust systems.
- Learn more about building scalable and resilient web apps on Google Cloud.