Manage host events

This document explains how to monitor, plan for, and perform pending maintenance on the virtual machine (VM) instances that are running on Hypercompute Cluster. To learn more about host maintenance events, see About host events in the Compute Engine documentation.

By proactively managing upcoming maintenance in your VMs, you can minimize disruptions to your workloads and maintain optimal performance.

Before you begin

  • Select the tab for how you plan to use the samples on this page:

    gcloud

    In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

    REST

    To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.

      Install the Google Cloud CLI, then initialize it by running the following command:

      gcloud init

    For more information, see Authenticate for using REST in the Google Cloud authentication documentation.

Required roles

To get the permissions that you need to manage host maintenance events, ask your administrator to grant you the following IAM roles:

  • Compute Admin (roles/compute.admin) on the project
  • For read-only access to System Event audit logs: Logs Viewer (roles/logging.viewer) on the project

For more information about granting roles, see Manage access to projects, folders, and organizations.

These predefined roles contain the permissions required to manage host maintenance events. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to manage host maintenance events:

  • To view the details of a VM: compute.instances.get on the project

You might also be able to get these permissions with custom roles or other predefined roles.

Overview

To optimize the maintenance of your VMs and minimize disruptions to your workloads, complete the following steps:

  1. Set up notification alerts. Create log-based alerts to receive notifications when maintenance is scheduled, started, or completed for your VMs. This helps you proactively plan your activities and avoid unexpected downtime.

    For instructions, see Set up notification alerts in this document.

  2. Manage maintenance across VMs. View and, optionally, manually start maintenance across your VMs. This helps you increase the resilience of your workload to host errors, prevent downtime, and ensure that your applications remain available.

    For instructions, see Manage maintenance across VMs in this document.

Set up notification alerts

You can receive notifications when maintenance for your VMs is scheduled, started, ongoing, or completed by creating log-based alerting policies.

To create an alert for the maintenance events of your VMs, complete the following procedure. If you want to create multiple alerts, then repeat this procedure for each alert that you want to create.

  1. In the Google Cloud console, go to the Logs Explorer page:

    Go to Logs Explorer

    If you use the search bar to find this page, then select the result whose subheading is Logging.

  2. Click the Show query toggle to the on position.

  3. In the Query pane, build one of the following queries. These queries filter log entries to identify specific maintenance events. If you want to use multiple queries, repeat this procedure to create an unique alert for each query.

    • To receive alerts when maintenance for a VM is scheduled:

      protoPayload.methodName="compute.instances.upcomingMaintenance" severity>=DEFAULT
      protoPayload.status.message =~ "scheduled"
      
    • To receive alerts when the maintenance window for a VM has opened:

      protoPayload.methodName="compute.instances.upcomingMaintenance" severity>=DEFAULT
      protoPayload.status.message =~ "ongoing"
      
    • To receive alerts when maintenance for a VM has started:

      protoPayload.methodName="compute.instance.terminateOnHostMaintenance" severity>=DEFAULT
      
    • To receive alerts when maintenance for a VM has completed:

      protoPayload.methodName="compute.instances.upcomingMaintenance" severity>=DEFAULT
      protoPayload.status.message =~ "completed"
      
  4. To validate the query, click Run query.

  5. In the Query results toolbar, click the Actions list, and then select Create log alert.

    The Create logs-based alert policy pane appears.

  6. In the Alert details section, do the following:

    1. In the Alert Policy Name field, enter a name for the policy.

    2. In the Policy severity level list, select Warning (or a higher severity).

    3. Click Next.

  7. In the Choose logs to include in the alert section, click Next.

  8. In the Set notification frequency and autoclose duration section, specify the following:

    1. In the Time between notifications list, select how often you want to be notified.

    2. In the Incident autoclose duration list, select after how long Cloud Logging stops sending notifications and automatically closes the incident.

    3. Click Next.

  9. In the Who should be notified? section, specify a notification channel for Logging to send notifications to.

  10. Click Save.

To view examples of maintenance event notifications in the Logs Explorer, see Examples of maintenance notifications in the Compute Engine documentation.

Manage maintenance across VMs

You can view and control maintenance for your VMs by doing one or more of the following:

View the maintenance state of VMs

You can view the state and scheduled time of upcoming maintenance for your VMs by checking the value of their upcomingMaintenance field. If a VM doesn't contain the upcomingMaintenance field, then no host maintenance event is scheduled for the VM. For more information about the fields in upcomingMaintenance, see Maintenance status definitions in the Compute Engine documentation.

You can view the maintenance state for multiple VMs simultaneously or for individual VMs. For multiple VMs, use the Google Cloud console or REST API. For individual VMs, select any of the following options:

Console

  1. In the Google Cloud console, go to the VM instances page.

    Go to VM instances

  2. In the Maintenance status column, Compute Engine displays the maintenance state of your VMs.

gcloud

To view the maintenance state of a VM, use the gcloud beta compute instances describe command with the --flatten=upcomingMaintenance flag:

gcloud beta compute instances describe VM_NAME \
    --flatten=upcomingMaintenance \
    --zone=ZONE

Replace the following:

  • VM_NAME: the VM name.

  • ZONE: the zone where the VM is located.

If a host maintenance event is scheduled, then the output is similar to the following:

---
canReschedule: true
latestWindowStartTime: '2024-12-01T19:00:00Z'
maintenanceStatus: 'PENDING'
type: 'SCHEDULED'
windowEndTime: '2024-12-01T22:00:00Z'
windowStartTime: '2024-12-01T19:00:00Z'

REST

To view the maintenance state of your VMs, make one of the following GET requests using URL-encoded values for the filter query parameter:

  • To view VMs across all zones: beta instances.aggregatedList method.

    GET https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/aggregated/instances?fields=items.name,items.machineType,items.upcomingMaintenance&filter=machineType%20eq%20%2E%2Aa3-ultragpu-8g
    
  • To view VMs in a specific zone: beta instances.list method.

    GET https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/zones/ZONE/instances?fields=items.name,items.machineType,items.upcomingMaintenance&filter=machineType%20eq%20%2E%2Aa3-ultragpu-8g
    

Replace the following:

  • PROJECT_ID: the ID of the project where the VMs are located.

  • ZONE: the zone where the VMs are located.

If a host maintenance event is scheduled for your VMs, then the output is similar to the following:

{
  "items": [
    {
      "name": "vm-01",
      "machineType": "https://www.googleapis.com/compute/beta/projects/example-project/zones/europe-west1-b/machineTypes/a3-ultragpu-8g",
      "upcomingMaintenance": {
        "canReschedule": true,
        "latestWindowStartTime": "2024-12-01T19:00:00Z",
        "maintenanceStatus": "PENDING",
        "type": "SCHEDULED",
        "windowEndTime": "2024-12-01T22:00:00Z",
        "windowStartTime": "2024-12-01T19:00:00Z"
      }
    },
    {
      "name": "vm-02",
      "machineType": "https://www.googleapis.com/compute/beta/projects/example-project/zones/europe-west1-b/machineTypes/a3-ultragpu-8g",
      "upcomingMaintenance": {
        "canReschedule": true,
        "latestWindowStartTime": "2024-12-01T19:00:00Z",
        "maintenanceStatus": "PENDING",
        "type": "SCHEDULED",
        "windowEndTime": "2024-12-01T22:00:00Z",
        "windowStartTime": "2024-12-01T19:00:00Z"
      }
    }
  ]
}

Optionally, to further narrow down a list of VMs, set the filter query parameter to a different filter expression.

Metadata server

To view the maintenance state of a VM, do the following:

  1. If you haven't already, then connect to your Linux or Windows VM.

  2. Query the metadata server as follows:

    curl http://metadata.google.internal/computeMetadata/beta/instance/upcoming-maintenance?alt=json -H "Metadata-Flavor: Google"
    

    If a host maintenance event is scheduled for the VM, then the output is similar to the following:

    "Upcoming maintenance": {
      "can_reschedule": "true",
      "latest_window_start_time": "2024-12-01T19:00:01Z",
      "maintenance_status": "PENDING",
      "type": "SCHEDULED",
      "window_end_time": "2024-12-01T21:00:01Z",
      "window_start_time": "2024-12-01T19:00:01Z"
    }
    

Manually start maintenance on VMs

You can manually start maintenance for your VMs instead of waiting for the scheduled time.

Depending on the maintenance state of a VM, the following occurs:

Maintenance state Description What you see
Scheduled Compute Engine has scheduled maintenance for the VM. You can manually start maintenance before the scheduled time.
  • In the Google Cloud console, the maintenance state shows as Ready to run - will run on DATE.
  • In the gcloud CLI or REST API, Compute Engine sets the maintenanceStatus field to PENDING.
In progress Maintenance is underway. You can't reschedule it.
  • In the Google Cloud console, the maintenance state shows as Running.
  • In the gcloud CLI or REST API, Compute Engine sets the maintenanceStatus field to ONGOING.
Complete Maintenance is finished. Compute Engine has removed all maintenance notifications from the VM.
  • In the Google Cloud console, the maintenance state shows as Up-to-date.
  • In the gcloud CLI or REST API, Compute Engine sets the maintenanceStatus field to COMPLETE.

You can manually start maintenance for multiple VMs simultaneously or for individual VMs. For multiple VMs, use the Google Cloud console or, for VMs located in the same zone, the gcloud CLI. For individual VMs, select any of the following options:

Console

  1. In the Google Cloud console, go to the VM instances page.

    Go to VM instances

  2. Select the rows for the VMs where you want to start maintenance.

  3. Click Run maintenance.

  4. To confirm, click Run maintenance.

gcloud

To manually start maintenance for one or more VMs within the same zone, use the gcloud beta compute instances perform-maintenance command:

gcloud beta compute instances perform-maintenance VM_NAMES \
    --zone=ZONE

Replace the following:

  • VM_NAMES: a list of VM names separated by spaces; for example, vm-01 vm-02 vm-03.

  • ZONE: the zone where the VMs are located.

REST

To manually start maintenance for a VM, make a POST request to the beta instances.performMaintenance method:

POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/performMaintenance

Replace the following:

  • PROJECT_ID: the ID of the project where the VM is located.

  • ZONE: the zone where the VM is located.

  • VM_NAME: the VM name.

What's next