This document explains how to monitor, plan for, and perform pending maintenance on the virtual machine (VM) instances that are running on Hypercompute Cluster. To learn more about host maintenance events, see About host events in the Compute Engine documentation.
By proactively managing upcoming maintenance in your VMs, you can minimize disruptions to your workloads and maintain optimal performance.
Before you begin
-
Select the tab for how you plan to use the samples on this page:
gcloud
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
REST
To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.
Install the Google Cloud CLI, then initialize it by running the following command:
gcloud init
For more information, see Authenticate for using REST in the Google Cloud authentication documentation.
Required roles
To get the permissions that you need to manage host maintenance events, ask your administrator to grant you the following IAM roles:
-
Compute Admin (
roles/compute.admin
) on the project -
For read-only access to System Event audit logs:
Logs Viewer (
roles/logging.viewer
) on the project
For more information about granting roles, see Manage access to projects, folders, and organizations.
These predefined roles contain the permissions required to manage host maintenance events. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to manage host maintenance events:
-
To view the details of a VM:
compute.instances.get
on the project
You might also be able to get these permissions with custom roles or other predefined roles.
Overview
To optimize the maintenance of your VMs and minimize disruptions to your workloads, complete the following steps:
Set up notification alerts. Create log-based alerts to receive notifications when maintenance is scheduled, started, or completed for your VMs. This helps you proactively plan your activities and avoid unexpected downtime.
For instructions, see Set up notification alerts in this document.
Manage maintenance across VMs. View and, optionally, manually start maintenance across your VMs. This helps you increase the resilience of your workload to host errors, prevent downtime, and ensure that your applications remain available.
For instructions, see Manage maintenance across VMs in this document.
Set up notification alerts
You can receive notifications when maintenance for your VMs is scheduled, started, ongoing, or completed by creating log-based alerting policies.
To create an alert for the maintenance events of your VMs, complete the following procedure. If you want to create multiple alerts, then repeat this procedure for each alert that you want to create.
-
In the Google Cloud console, go to the Logs Explorer page:
If you use the search bar to find this page, then select the result whose subheading is Logging.
Click the Show query toggle to the on position.
In the Query pane, build one of the following queries. These queries filter log entries to identify specific maintenance events. If you want to use multiple queries, repeat this procedure to create an unique alert for each query.
To receive alerts when maintenance for a VM is scheduled:
protoPayload.methodName="compute.instances.upcomingMaintenance" severity>=DEFAULT protoPayload.status.message =~ "scheduled"
To receive alerts when the maintenance window for a VM has opened:
protoPayload.methodName="compute.instances.upcomingMaintenance" severity>=DEFAULT protoPayload.status.message =~ "ongoing"
To receive alerts when maintenance for a VM has started:
protoPayload.methodName="compute.instance.terminateOnHostMaintenance" severity>=DEFAULT
To receive alerts when maintenance for a VM has completed:
protoPayload.methodName="compute.instances.upcomingMaintenance" severity>=DEFAULT protoPayload.status.message =~ "completed"
To validate the query, click Run query.
In the Query results toolbar, click the Actions list, and then select
Create log alert.The Create logs-based alert policy pane appears.
In the Alert details section, do the following:
In the Alert Policy Name field, enter a name for the policy.
In the Policy severity level list, select Warning (or a higher severity).
Click Next.
In the Choose logs to include in the alert section, click Next.
In the Set notification frequency and autoclose duration section, specify the following:
In the Time between notifications list, select how often you want to be notified.
In the Incident autoclose duration list, select after how long Cloud Logging stops sending notifications and automatically closes the incident.
Click Next.
In the Who should be notified? section, specify a notification channel for Logging to send notifications to.
Click Save.
To view examples of maintenance event notifications in the Logs Explorer, see Examples of maintenance notifications in the Compute Engine documentation.
Manage maintenance across VMs
You can view and control maintenance for your VMs by doing one or more of the following:
To check the state and scheduled time of upcoming maintenance for your VMs, view the maintenance state of VMs.
To immediately start maintenance on your VMs, rather than waiting for their scheduled maintenance time, manually start maintenance on VMs.
View the maintenance state of VMs
You can view the state and scheduled time of upcoming maintenance for your VMs
by checking the value of their upcomingMaintenance
field. If a VM doesn't
contain the upcomingMaintenance
field, then no host maintenance event is
scheduled for the VM. For more information about the fields in
upcomingMaintenance
, see
Maintenance status definitions
in the Compute Engine documentation.
You can view the maintenance state for multiple VMs simultaneously or for individual VMs. For multiple VMs, use the Google Cloud console or REST API. For individual VMs, select any of the following options:
Console
In the Google Cloud console, go to the VM instances page.
In the Maintenance status column, Compute Engine displays the maintenance state of your VMs.
gcloud
To view the maintenance state of a VM, use the
gcloud beta compute instances describe
command
with the --flatten=upcomingMaintenance
flag:
gcloud beta compute instances describe VM_NAME \
--flatten=upcomingMaintenance \
--zone=ZONE
Replace the following:
VM_NAME
: the VM name.ZONE
: the zone where the VM is located.
If a host maintenance event is scheduled, then the output is similar to the following:
---
canReschedule: true
latestWindowStartTime: '2024-12-01T19:00:00Z'
maintenanceStatus: 'PENDING'
type: 'SCHEDULED'
windowEndTime: '2024-12-01T22:00:00Z'
windowStartTime: '2024-12-01T19:00:00Z'
REST
To view the maintenance state of your VMs, make one of the following GET
requests using URL-encoded values for the filter
query parameter:
To view VMs across all zones: beta
instances.aggregatedList
method.GET https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/aggregated/instances?fields=items.name,items.machineType,items.upcomingMaintenance&filter=machineType%20eq%20%2E%2Aa3-ultragpu-8g
To view VMs in a specific zone: beta
instances.list
method.GET https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/zones/ZONE/instances?fields=items.name,items.machineType,items.upcomingMaintenance&filter=machineType%20eq%20%2E%2Aa3-ultragpu-8g
Replace the following:
PROJECT_ID
: the ID of the project where the VMs are located.ZONE
: the zone where the VMs are located.
If a host maintenance event is scheduled for your VMs, then the output is similar to the following:
{
"items": [
{
"name": "vm-01",
"machineType": "https://www.googleapis.com/compute/beta/projects/example-project/zones/europe-west1-b/machineTypes/a3-ultragpu-8g",
"upcomingMaintenance": {
"canReschedule": true,
"latestWindowStartTime": "2024-12-01T19:00:00Z",
"maintenanceStatus": "PENDING",
"type": "SCHEDULED",
"windowEndTime": "2024-12-01T22:00:00Z",
"windowStartTime": "2024-12-01T19:00:00Z"
}
},
{
"name": "vm-02",
"machineType": "https://www.googleapis.com/compute/beta/projects/example-project/zones/europe-west1-b/machineTypes/a3-ultragpu-8g",
"upcomingMaintenance": {
"canReschedule": true,
"latestWindowStartTime": "2024-12-01T19:00:00Z",
"maintenanceStatus": "PENDING",
"type": "SCHEDULED",
"windowEndTime": "2024-12-01T22:00:00Z",
"windowStartTime": "2024-12-01T19:00:00Z"
}
}
]
}
Optionally, to further narrow down a list of VMs, set the filter
query
parameter to a different
filter expression.
Metadata server
To view the maintenance state of a VM, do the following:
If you haven't already, then connect to your Linux or Windows VM.
Query the metadata server as follows:
curl http://metadata.google.internal/computeMetadata/beta/instance/upcoming-maintenance?alt=json -H "Metadata-Flavor: Google"
If a host maintenance event is scheduled for the VM, then the output is similar to the following:
"Upcoming maintenance": { "can_reschedule": "true", "latest_window_start_time": "2024-12-01T19:00:01Z", "maintenance_status": "PENDING", "type": "SCHEDULED", "window_end_time": "2024-12-01T21:00:01Z", "window_start_time": "2024-12-01T19:00:01Z" }
Manually start maintenance on VMs
You can manually start maintenance for your VMs instead of waiting for the scheduled time.
Depending on the maintenance state of a VM, the following occurs:
Maintenance state | Description | What you see |
---|---|---|
Scheduled | Compute Engine has scheduled maintenance for the VM. You can manually start maintenance before the scheduled time. |
|
In progress | Maintenance is underway. You can't reschedule it. |
|
Complete | Maintenance is finished. Compute Engine has removed all maintenance notifications from the VM. |
|
You can manually start maintenance for multiple VMs simultaneously or for individual VMs. For multiple VMs, use the Google Cloud console or, for VMs located in the same zone, the gcloud CLI. For individual VMs, select any of the following options:
Console
In the Google Cloud console, go to the VM instances page.
Select the rows for the VMs where you want to start maintenance.
Click
Run maintenance.To confirm, click Run maintenance.
gcloud
To manually start maintenance for one or more VMs within the same zone, use
the
gcloud beta compute instances perform-maintenance
command:
gcloud beta compute instances perform-maintenance VM_NAMES \
--zone=ZONE
Replace the following:
VM_NAMES
: a list of VM names separated by spaces; for example,vm-01 vm-02 vm-03
.ZONE
: the zone where the VMs are located.
REST
To manually start maintenance for a VM, make a POST
request to the
beta instances.performMaintenance
method:
POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/performMaintenance
Replace the following:
PROJECT_ID
: the ID of the project where the VM is located.ZONE
: the zone where the VM is located.VM_NAME
: the VM name.