To test that your regional managed instance group (MIG) is overprovisioned enough and can survive a zone outage, you can use the following example to simulate a zonal failure.
Before you begin
- If you want to use the command-line examples in this guide, install the Google Cloud CLI.
-
If you haven't already, then set up authentication.
Authentication is
the process by which your identity is verified for access to Google Cloud services and APIs.
To run code or samples from a local development environment, you can authenticate to
Compute Engine by selecting one of the following options:
Select the tab for how you plan to use the samples on this page:
gcloud
-
Install the Google Cloud CLI, then initialize it by running the following command:
gcloud init
- Set a default region and zone.
REST
To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.
Install the Google Cloud CLI, then initialize it by running the following command:
gcloud init
For more information, see Authenticate for using REST in the Google Cloud authentication documentation.
-
Use a script to simulate a zone outage
This script stops and starts Apache as the default scenario. If this doesn't apply to your application, replace the commands that stop and start Apache with your own failure and recovery scenario.
Deploy and run this script continuously in every VM in the group. You can do this by adding the script to the instance template or by including the script in a custom image and using the image in the instance template.
Simulate a zone failure by setting these two project metadata fields:
failed_zone
: Sets the zone where you want to simulate the outage (limit the failure to just one zone).failed_instance_names
: Choose the VMs to take offline by name (to limit the failure to only VM names containing this string).
You can set this metadata using the gcloud CLI. For example, the following command sets the zone outage to the
europe-west1-b
zone and affects VMs that have names starting withbase-instance-name
:gcloud compute project-info add-metadata --metadata failed_zone='europe-west1-b',failed_instance_names='base-instance-name-'
After you are done simulating the outage, recover from the failure by removing the metadata keys:
gcloud compute project-info remove-metadata --keys failed_zone,failed_instance_names
Here are some ideas for failure scenarios you can run using this script:
- Stop your application completely to see how the MIG responds.
- Make your VMs return as "unhealthy" on load balancing health checks.
- Modify iptables to block some of the traffic to and from the VM.
- Shutdown the VMs. By default, it will be recreated by the regional MIG shortly after but the new incarnation will immediately shutdown itself as soon as the script runs and as long as the metadata values are set. This will result in a crash loop.
What's next
- Learn how to build scalable and resilient web applications.
- Learn about disaster recovery on Google Cloud Platform.