Manage host events across VMs

This document explains how to use the host maintenance features that are available from the Cluster Director suite. It explains how to monitor, plan for, and perform scheduled maintenance on virtual machine (VM) instances. To manage maintenance on your reserved blocks of capacity, whether or not VMs are running on them, see instead Manage host events across reservations.

When you proactively manage upcoming maintenance host events on your VMs, you can minimize disruptions and maintain optimal performance.

Before you begin

Select the tab for how you plan to use the samples on this page:

Console

When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.

gcloud

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell

At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

REST

To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.

Install the Google Cloud CLI. After installation, initialize the Google Cloud CLI by running the following command:

gcloud init

If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

For more information, see Authenticate for using REST in the Google Cloud authentication documentation.

Required roles

To get the permissions that you need to manage host maintenance events across VMs, ask your administrator to grant you the following IAM roles:

Compute Admin (roles/compute.admin) on the project
For read-only access to System Event audit logs: Logs Viewer (roles/logging.viewer) on the project

For more information about granting roles, see Manage access to projects, folders, and organizations.

These predefined roles contain the permissions required to manage host maintenance events across VMs. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to manage host maintenance events across VMs:

To view the details of a VM: compute.instances.get on the project

You might also be able to get these permissions with custom roles or other predefined roles.

Overview

To optimize the maintenance of your VMs, complete the following steps:

Understand host maintenance. Learn about the frequency and maintenance behavior of your VMs based on their machine series. This information helps you minimize disruptions to your workloads.
Set up notification alerts. Create log-based alerts to receive notifications when maintenance for your VMs is scheduled, started, or completed. This approach helps you proactively plan your activities and avoid unexpected downtime.
Manage maintenance across VMs. View if maintenance is scheduled for your VMs. If needed, you can manually start maintenance across your VMs. This process helps you increase the resilience of your workloads to host events, prevent downtime, and maximize the availability of your applications.

Understand host maintenance

During the lifecycle of a Compute Engine instance, the host machine that your instance runs on undergoes multiple host events. A host event can include the regular maintenance of Compute Engine infrastructure, or in rare cases, a host error. Compute Engine also applies some non-disruptive lightweight upgrades for the hypervisor and network in the background.

The following table describes the host maintenance features for accelerator-optimized machine types:

Machine type	Maintenance frequency	Behavior	Advanced notification	On-demand maintenance	Simulate maintenance
A4	Minimum of 90 days	Terminates with Local SSD data persistence	90 days	Yes	No
A3 Ultra	Minimum of 90 days	Terminates with Local SSD data persistence	90 days	Yes	No

The maintenance frequencies in the table are approximations.

Compute Engine might perform maintenance more frequently.

Set up notification alerts for VMs

You can get notified about scheduled, started, or completed maintenance events for your VMs by creating log-based alerting policies.

To create an alert for the maintenance events of your VMs, complete the following procedure. Repeat this procedure for each alert that you want to create.

In the Google Cloud console, go to the Logs Explorer page:
Go to Logs Explorer

If you use the search bar to find this page, then select the result whose subheading is Logging.
Click the Show query toggle to the on position.

In the Query pane, build one of the following queries. These queries filter log entries to identify specific maintenance events. If you want to use multiple queries, repeat this procedure to create an unique alert for each query.

To receive alerts when maintenance for a VM is scheduled:

protoPayload.methodName="compute.instances.upcomingMaintenance" severity>=DEFAULT
protoPayload.status.message =~ "scheduled"

To receive alerts when the maintenance window for a VM has opened:

protoPayload.methodName="compute.instances.upcomingMaintenance" severity>=DEFAULT
protoPayload.status.message =~ "ongoing"

To receive alerts when maintenance for a VM has started:

protoPayload.methodName="compute.instance.terminateOnHostMaintenance" severity>=DEFAULT

To receive alerts when maintenance for a VM has completed:

protoPayload.methodName="compute.instances.upcomingMaintenance" severity>=DEFAULT
protoPayload.status.message =~ "completed"

To validate the query, click Run query. If the query is valid, then the Query results pane displays log entries that match the query.
In the Query results toolbar, click the Actions list, and then select Create log alert. The Create logs-based alert policy pane appears.
In the Alert details section, do the following:
1. In the Alert Policy Name field, enter a name for the policy.
2. In the Policy severity level list, select Warning (or a higher severity).
3. Click Next.
In the Choose logs to include in the alert section, click Next.
In the Set notification frequency and autoclose duration section, specify the following:
1. In the Time between notifications list, select how often you want to be notified.
2. In the Incident autoclose duration list, select after how long Cloud Logging stops sending notifications and automatically closes the incident.
3. Click Next.
In the Who should be notified? section, specify a notification channel for Logging to send notifications to.
Click Save.

To view examples of maintenance event notifications in the Logs Explorer, see Examples of maintenance notifications in the Compute Engine documentation.

Manage maintenance across VMs

You can view and control maintenance for your VMs by doing one or more of the following:

To check the state and scheduled time of upcoming maintenance for your VMs, view the maintenance state of VMs.
To immediately start maintenance on your VMs, rather than waiting for their scheduled maintenance time, manually start maintenance on VMs.

View the maintenance state of VMs

You can view the state and scheduled time of upcoming maintenance for your VMs by checking the value of the upcomingMaintenance field in the instance's metadata. If a VM doesn't contain the upcomingMaintenance field, then no host maintenance event is scheduled for the VM. For more information about the fields in upcomingMaintenance, see Maintenance status definitions in the Compute Engine documentation.

You can view the maintenance state for multiple VMs simultaneously or for individual VMs. For multiple VMs, use the Google Cloud console or REST API. For individual VMs, select any of the following options:

Console

In the Google Cloud console, go to the VM instances page.

Go to VM instances
In the Maintenance status column, Compute Engine displays the maintenance state of your VMs. If you don't see this column in the VM instances table, then click view_column Column display options, select the Maintenance status checkbox, and then click OK.

gcloud

To view the maintenance state of a VM, use the gcloud compute instances describe command with the --flatten=resourceStatus.upcomingMaintenance flag:

gcloud compute instances describe VM_NAME \
    --flatten=resourceStatus.upcomingMaintenance \
    --zone=ZONE

Replace the following:

VM_NAME: the VM name.
ZONE: the zone where the VM exists.

The output is similar to one of the following:

If a host maintenance event is scheduled for your VM, then the output is similar to the following:

---
canReschedule: true
latestWindowStartTime: '2024-12-01T19:00:00Z'
machineType: 'a4-highgpu-8g'
maintenanceStatus: 'PENDING'
type: 'SCHEDULED'
windowEndTime: '2024-12-01T22:00:00Z'
windowStartTime: '2024-12-01T19:00:00Z'

If a host maintenance event isn't scheduled for your VM, then the output is similar to the following:
```
---
null
```

REST

To view the maintenance state of your VMs, make one of the following GET requests. When you make a request, you must include the fields query parameter to only show the name, machine type, and upcoming maintenance for a VM. You must also include the filter query parameter to only filter VMs by a specific machine type.

To view VMs across all zones: instances.aggregatedList method.

GET https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/aggregated/instances?fields=items.name,items.machineType,items.upcomingMaintenance&filter=machineType%3AMACHINE_TYPE

To view VMs in a specific zone: instances.list method.

GET https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances?fields=items.name,items.machineType,items.upcomingMaintenance&filter=machineType%3AMACHINE_TYPE

Replace the following:

PROJECT_ID: the ID of the project where you created VMs.
ZONE: the zone where the VMs exist.
MACHINE_TYPE: the machine type that you want to filter the VMs by.

If a host maintenance event is scheduled for a VM, then the VM contains the upcomingMaintenance field:

{
  "items": [
    {
      "name": "vm-01",
      "machineType": "https://www.googleapis.com/compute/v1/projects/example-project/zones/europe-west1-b/machineTypes/a3-ultragpu-8g",
      "resourceStatus": {
        "upcomingMaintenance": {
          "canReschedule": true,
          "latestWindowStartTime": "2024-12-01T19:00:00Z",
          "machineType": "a3-ultragpu-8g",
          "maintenanceStatus": "PENDING",
          "type": "SCHEDULED",
          "windowEndTime": "2024-12-01T22:00:00Z",
          "windowStartTime": "2024-12-01T19:00:00Z"
        }
      }
    },
    ...
  ]
}

Optionally, to further narrow down a list of VMs, set the filter query parameter to a different filter expression.

Metadata server

To view the maintenance state of a VM, do the following:

If you haven't already, then connect to your Linux or Windows VM.

Query the metadata server as follows:

curl http://metadata.google.internal/computeMetadata/v1/instance/upcoming-maintenance?alt=json -H "Metadata-Flavor: Google"

If a host maintenance event is scheduled for your VM, then the output is similar to the following:

"Upcoming maintenance": {
  "can_reschedule": "true",
  "latest_window_start_time": "2024-12-01T19:00:01Z",
  "machineType": "a4-highgpu-8g",
  "maintenance_status": "PENDING",
  "type": "SCHEDULED",
  "window_end_time": "2024-12-01T21:00:01Z",
  "window_start_time": "2024-12-01T19:00:01Z"
}

If a host maintenance event isn't scheduled, then the output is similar to the following:

{ }

Manually start maintenance on VMs

You can manually start maintenance for your VMs instead of waiting for the scheduled time.

Depending on the maintenance state of a VM, the following occurs:

Maintenance state	Description	What you see
Scheduled	Compute Engine has scheduled maintenance for the VM. You can manually start maintenance before the scheduled time.	In the Google Cloud console, the maintenance state shows as Ready to run - will run on `DATE`. In the gcloud CLI or REST API, Compute Engine sets the `maintenanceStatus` field to `PENDING`.
In progress	Maintenance is underway. You can't reschedule it.	In the Google Cloud console, the maintenance state shows as Running. In the gcloud CLI or REST API, Compute Engine sets the `maintenanceStatus` field to `ONGOING`.
Complete	Maintenance is finished. Compute Engine has removed all maintenance notifications from the VM.	In the Google Cloud console, the maintenance state shows as Up-to-date. In the gcloud CLI or REST API, Compute Engine sets the `maintenanceStatus` field to `COMPLETE`.

You can manually start maintenance for multiple VMs simultaneously or for individual VMs. For multiple VMs, use the Google Cloud console or, for VMs located in the same zone, the gcloud CLI. For individual VMs, select any of the following options:

Console

In the Google Cloud console, go to the VM instances page.

Go to VM instances
Select the rows for the VMs where you want to start maintenance.
Click Run maintenance.
To confirm, click Run maintenance.

gcloud

To manually start maintenance for one or more VMs within the same zone, use the gcloud compute instances perform-maintenance command:

gcloud compute instances perform-maintenance VM_NAMES \
    --zone=ZONE

Replace the following:

VM_NAMES: a list of VM names separated by spaces; for example, vm-01 vm-02 vm-03.
ZONE: the zone where the VMs exist.

REST

To manually start maintenance for a VM, make a POST request to the instances.performMaintenance method:

POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/performMaintenance

Replace the following:

PROJECT_ID: the ID of the project where you created the VM.
ZONE: the zone where the VM exists.
VM_NAME: the VM name.

What's next

Query the metadata server for maintenance event notices

Manage host events across VMs Stay organized with collections Save and categorize content based on your preferences.

Before you begin

Console

gcloud

REST

Required roles

Required permissions

Overview

Understand host maintenance

Set up notification alerts for VMs

Manage maintenance across VMs

View the maintenance state of VMs

Console

gcloud

REST

Metadata server

Manually start maintenance on VMs

Console

gcloud

REST

What's next

Manage host events across VMs