Simulate a host maintenance event


This page describes how to test the effects of your virtual machine (VM) instance's host maintenance policy on your applications.

You might simulate a maintenance event on your VMs in the following situations:

  • You have VMs that are configured to live migrate during maintenance events, and you need to test the effects of live migration on your applications.
  • You have batch jobs running on preemptible VM instances, and you need to test how your applications handle preemption and shutdown of one or more instances.
  • Your instances are configured to stop and restart during maintenance events rather than live migrate, and you need to test how your applications handle this shutdown and restart process.
  • You want to test how workloads that are running on sole-tenant nodes behave during a host maintenance event, and see the effects of the sole-tenant VM's host maintenance policy on the applications running on the VMs.

Before you begin

  • Review the API rate limit for the simulate_maintenance_event_requests metric.
  • If you haven't already, set up authentication. Authentication is the process by which your identity is verified for access to Google Cloud services and APIs. To run code or samples from a local development environment, you can authenticate to Compute Engine as follows.

    Select the tab for how you plan to use the samples on this page:

    Console

    When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.

    gcloud

    1. Install the Google Cloud CLI, then initialize it by running the following command:

      gcloud init
    2. Set a default region and zone.

    REST

    To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.

      Install the Google Cloud CLI, then initialize it by running the following command:

      gcloud init

Limitations

  • If you try to simulate a host maintenance event on VMs that do not support live migration, they are either terminated or restarted depending on the configured host maintenance policy.

  • When you try to simulate a host maintenance event on a node group that has host maintenance policy set to migrate within node group, if the number of nodes specified is less than or equal to the total number of holdback nodes that are reserved, host maintenance event simulation runs for all the specified nodes simultaneously. Whereas, if the number of nodes specified is greater than the total number of reserved holdback nodes, the simulation fails.

  • To correctly simulate a maintenance event on a node group that has host maintenance policy set to migrate within node group, you need to trigger the maintenance event sequentially on each node.

Simulate host maintenance events

You can simulate a maintenance event on a VM using either the Google Cloud CLI or an API request.

During the simulation of the host maintenance event, the maintenance-event metadata key of the VM goes through the following changes:

  1. At the start of the simulation, the value of the maintenance-event metadata key changes from NONE to MIGRATE_ON_HOST_MAINTENANCE.
  2. Throughout the duration of the simulation event, the value remains as MIGRATE_ON_HOST_MAINTENANCE.
  3. After the simulation ends, the value returns to NONE.

To query the maintenance event key, see Query the maintenance event metadata key.

gcloud

Run the instances simulate-maintenance-event command to force an instance to activate its configured maintenance policy action:

gcloud compute instances simulate-maintenance-event VM_NAME \
    --zone ZONE

Replace the following:

  • VM_NAME: the name of the VM where you want to simulate the maintenance event.

    You can specify multiple VM names separated by single spaces to simulate maintenance events on more than one VM in the same zone. For example, instance-1 instance-2 instance-3.

  • ZONE: the zone where the instance is located.

REST

In the Compute Engine API, make a request to the compute.instances.simulateMaintenanceEvent method:

POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/simulateMaintenanceEvent

Replace the following:

  • PROJECT_ID: the project ID for this request.
  • VM_NAME: the name of the instance where you want to simulate the maintenance event.

    You can specify multiple instance names separated by single spaces to simulate maintenance events on more than one instance in the same zone. For example, instance-1 instance-2 instance-3.

  • ZONE: the zone where the instance is located.

Simulate host maintenance events on sole-tenant nodes

You can simulate a host maintenance event on sole-tenant nodes using either the Google Cloud CLI or an API request. During the simulation of the host maintenance event on a sole-tenant VM, the maintenance-event metadata key value doesn't change and remains NONE throughout the simulation.

gcloud

Run the sole-tenancy node-groups simulate-maintenance-event command to force sole-tenant nodes to activate their configured maintenance policy:

 gcloud compute sole-tenancy node-groups simulate-maintenance-event NODE_GROUP \
    --nodes=NODE_NAMES \
    --zone=ZONE \
    --async

Replace the following:

  • NODE_GROUP: the name of the node group where you want to simulate the maintenance event.

  • NODE_NAMES: the names of the nodes where you want to simulate the maintenance event. While specifying multiple node names, use comma-separated values, for example, node-1,node-2,node-3.

  • ZONE: the zone where the nodes are located.

REST

In the Compute Engine API, make a request to the compute.nodeGroups.simulateMaintenanceEvent method:

POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/nodeGroups/NODE_GROUP/simulateMaintenanceEvent

{
  "nodes": [NODE_NAMES]
}

Replace the following:

  • PROJECT_ID: the project ID for this request.

  • ZONE: the zone where the nodes are located.

  • NODE_GROUP: the name of the node group where you want to simulate the maintenance event.

  • NODE_NAMES: the names of the nodes where you want to simulate the maintenance event. Enclose the node name within double quotes, for example, "node-1". And, while specifying multiple node names, use comma-separated values, for example, "node-1","node-2","node-3".

What's next