Monitor and plan for a host maintenance event


Each virtual machine (VM) instance or bare metal instance uses a host maintenance policy to determine the behavior of the instance during a maintenance operation. Some instances offer the additional option of viewing the maintenance schedule ahead of time.

This page explains how to monitor and plan for a host maintenance event on Compute Engine instances.

Before you begin

  • If you haven't already, then set up authentication. Authentication is the process by which your identity is verified for access to Google Cloud services and APIs. To run code or samples from a local development environment, you can authenticate to Compute Engine by selecting one of the following options:

    Select the tab for how you plan to use the samples on this page:

    Console

    When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.

    gcloud

    1. Install the Google Cloud CLI, then initialize it by running the following command:

      gcloud init
    2. Set a default region and zone.

    REST

    To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.

      Install the Google Cloud CLI, then initialize it by running the following command:

      gcloud init

    For more information, see Authenticate for using REST in the Google Cloud authentication documentation.

Required roles

To get the permissions that you need to create instances and manage instance maintenance, ask your administrator to grant you the following IAM roles on the project:

For more information about granting roles, see Manage access to projects, folders, and organizations.

These predefined roles contain the permissions required to create instances and manage instance maintenance. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to create instances and manage instance maintenance:

  • To get information about an instance, including metadata: compute.instances.get

You might also be able to get these permissions with custom roles or other predefined roles.

Limitations

You can view notifications for an instance's upcoming maintenance event only if the instance uses a machine type from one of the following machine families:

  • Accelerator-optimized machine families:

  • General-purpose machine families:

  • Memory-optimized machine families:

  • Storage-optimized machine families:

Overview of maintenance notifications

Google sends notifications for upcoming host maintenance through several methods. When the maintenance window opens, Google Cloud automatically performs maintenance on your instance. By monitoring your instance's upcoming maintenance windows, you can proactively prepare your workloads to handle upcoming maintenance with minimal disruption.

Compute instances that support maintenance event notifications have the following characteristics:

  • Fewer maintenance events: In general, instances with recurrent maintenance intervals should see fewer maintenance events.
  • Longer maintenance notification: Get notified of maintenance events well in advance for planning purposes.
  • Monitoring and planning: Use Cloud Logging to track your maintenance schedule. Use incidents and alerts to stay informed.
  • On-demand maintenance control: Start maintenance during the notification period to update your instances when it fits your schedule.

The information about an upcoming notification event is presented in a manner similar to the following:

upcomingMaintenance:{
    "canReschedule":True
    "latestWindowStartTime": "2024-12-01T19:00:01Z"
    "maintenanceStatus":"PENDING"
    "type":"SCHEDULED"
    "windowEndTime": "2024-12-01T22:00:00Z"
    "windowStartTime": "2024-12-01T19:00:00Z"
}

If there is no upcoming maintenance event, instead you see a message similar to the following:

{ "error": "no notifications have been received yet, try again later" }

Maintenance status definitions

The following status definitions explain the responses to queries about host maintenance for an instance. They provide information related to the maintenance event. The Google Cloud CLI, REST, and the metadata server use these same responses:

  • canReschedule: whether the maintenance can be started manually during the notification period for this instance.
    • TRUE: customer-triggered maintenance can be performed during the notification period.
    • FALSE: customer-triggered maintenance can't be performed on this instance. This often occurs during the period in which the instance is undergoing maintenance or if the instance type doesn't support on-demand maintenance.
  • latestWindowStartTime: the latest time the maintenance window can be moved to.
  • maintenanceStatus: the current status of the maintenance event.
    • ONGOING: the maintenance operation is underway.
    • PENDING: the maintenance operation is scheduled, but has not yet started.
  • type: the type of maintenance to be performed.
    • NONE: No maintenance is scheduled for this instance.
    • SCHEDULED: For disruptive maintenance, Compute Engine provides a minimum of 7 days notice for most instances; X4 instances get approximately 60 days advanced notice.
    • UNSCHEDULED: Because the maintenance represents critical updates, Compute Engine tries to provide as much advanced notice as possible, but it is usually much less than for scheduled maintenance events.
  • windowEndTime: the end of the time window in which maintenance occurs.
  • windowStartTime: the start of the time window in which maintenance occurs.

Maintenance status behaviors

When managing maintenance events, check the values for canReschedule and maintenanceStatus. When combined, these fields indicate what actions you can or can't take with regards to rescheduling a maintenance event:

  • canReschedule=True and maintenanceStatus=Pending— you can manually start the maintenance event for the instance before the scheduled start time.
  • canReschedule=False and maintenanceStatus=Ongoing—the maintenance is underway and can't be rescheduled.
  • canReschedule=False and maintenanceStatus=Pending—your instance doesn't support manually-triggered maintenance events.

View maintenance notifications

You can find maintenance notifications by querying your compute instances, the metadata server, or by using Cloud Logging.

Check instances for a maintenance event notification

Use the Google Cloud CLI, REST, or query the metadata server to see if there is an upcoming host maintenance event for your instance.

gcloud

To see the upcoming maintenance window for an instance, use the gcloud compute instances describe command.

gcloud compute instances describe INSTANCE_NAME \
   --zone=ZONE_NAME --format="yaml(upcomingMaintenance)"

Replace the following:

  • INSTANCE_NAME: The name of the compute instance.
  • ZONE_NAME: The zone where the instance resides.

If there is an upcoming maintenance event, then the response contains a section similar to the following:

  upcomingMaintenance:{
    "canReschedule":True
    "latestWindowStartTime": "2024-12-01T19:00:01Z"
    "maintenanceStatus":"PENDING"
    "type":"SCHEDULED"
    "windowEndTime": "2024-12-01T22:00:00Z"
    "windowStartTime": "2024-12-01T19:00:00Z"
  }

In this response:

  • The maintenance is scheduled for the date and time shown in windowStartTime.
  • canReschedule is set to True and maintenanceStatus is set to PENDING. These settings indicate you can manually start the scheduled maintenance event before the date shown in latestWindowStartTime.

REST

To see if there is upcoming maintenance for an instance, construct a GET request using the instances.get method:

GET https://compute.googleapis.com/compute/v1/projects/PROJECT_NAME/zones/ZONE/instances/INSTANCE_NAME

Replace the following:

  • PROJECT_NAME: The name of the project that that contains the compute instance.
  • ZONE: The zone where the instance is located.
  • INSTANCE_NAME: The name of the instance.

If there is an upcoming maintenance event, then the response contains a section similar to the following:

  upcomingMaintenance:{
    "canReschedule":True
    "latestWindowStartTime": "2023-12-01T19:00:01Z"
    "maintenanceStatus":"PENDING"
    "type":"SCHEDULED"
    "windowEndTime": "2023-12-01T22:00:00Z"
    "windowStartTime": "2023-12-01T19:00:00Z"
  }

In this response:

  • The maintenance is scheduled for the date and time shown in windowStartTime.
  • canReschedule is set to True and maintenanceStatus is set to PENDING. These settings indicate you can manually start the scheduled maintenance event before the date shown in latestWindowStartTime.

Metadata server

From the Guest OS, query the metadata server to see the next maintenance event.

$ curl http://metadata.google.internal/computeMetadata/v1/instance/upcoming-maintenance?alt=json -H "Metadata-Flavor: Google"

Check Cloud Logging for a maintenance event notification

Compute Engine creates system events in the Cloud Audit Logs for an instance for maintenance events. You can view these events before, during, and after a maintenance event using Cloud Logging and Logs Explorer.

Console

To query the audit logs for maintenance notifications for an instance, complete the following steps:

  1. Go to the VM instances page.

    Go to VM instances

  2. Click the Name of the instance for which you want to view maintenance notifications.

    The Instance details page opens.

  3. In the Logs section, click the link labeled Logging.

    The Logs Explorer query editor page opens. In the Query pane, the resource.type and instance ID are already populated for your instance.

  4. In the Query pane, add the following line to the query:

    operation.producer="compute.instances.upcomingMaintenance" OR
    "compute.instances.terminateOnHostMaintenance" OR
    "compute.instances.migrateOnHostMaintenance"
    
  5. Click Run query. The matching maintenance notification events appear in the query results pane.

    In the query results pane you can click Edit time to expand the search timeframe, or to narrow the results to specific dates or times.

  6. Click a log entry to view the maintenance notification details.

    1. For upcoming maintenance notifications, expand the heading metadata to view information such as the current status, type, and scheduled maintenance window start and end times.
    2. Expand the heading status to view the descriptive message for the notification.

Examples of maintenance notifications

A maintenance event notification for an instance appears in the Logs Explorer with values similar to the following:

  • methodName: "compute.instances.upcomingMaintenance"
  • metadata:
    • maintenanceStatus: "PENDING"
    • windowStartTime: "2024-07-23T20:00:00Z"

When the maintenance event starts, a new informational event appears in the logs with values similar to the following:

  • methodName: "compute.instances.upcomingMaintenance"
  • metadata:
    • maintenanceStatus: "ONGOING"
    • windowStartTime: "2024-07-23T20:00:00Z"

During the maintenance event, depending on the host maintenance policy configuration for the instance, one of the following system events is logged to the audit logs:

  • For instances configured to use live migration during maintenance events, a system event with methodName: "compute.instances.migrateOnHostMaintenance".
  • For instances configured to terminate during maintenance events, a system event with methodName: "compute.instances.terminateOnHostMaintenance".

When the maintenance event ends, a new informational event appears in the audit logs with values similar to the following:

  • methodName: "compute.instances.upcomingMaintenance"
  • status: { message: "Maintenance window has completed for this instance. All maintenance notifications on the instance have been removed." }

Configure alerts for host maintenance notifications

You can set up a log-based alerting policy to search for specific maintenance notification events and send alerts using a notification channel.

Console

To create an alert for a maintenance event for your instance, complete the following steps:

  1. Go to the VM instances page.

    Go to VM instances

  2. Click the Name of the instance for which you want to create a maintenance event alert.

    The Instance details page opens.

  3. In the Logs section, click the link labeled Logging.

    The Logs Explorer query editor page opens. In the Query pane, the resource.type and instance ID are already populated for your instance.

  4. In the Query pane, add the following line to the query:

    operation.producer="compute.instances.upcomingMaintenance"
    
  5. Click Run query. The matching maintenance notification events appear in the query results pane.

  6. In the query results pane, click Edit time.

    1. On the left-hand side of the edit window, in the Relative time field, enter 1d to view the log entries for the past week.
    2. Click Apply.
  7. In the header of the Query results pane, click  Create alert. If your viewing window is narrow, the Create alert option might appear on the Actions menu instead.

  8. In the Create logs-based alert policy pane, in the Alert details section, do the following:

    1. Enter a name for the alert policy, for example Upcoming maintenance for my-c3d-vm@us-central1-b.
    2. From the Policy severity level menu, select No severity.

    3. In the Documentation field, you can enter a description for your alerting policy. You can also include information that might help the recipient of a notification diagnose the problem. The following string summarizes the reason for the notification:

      Log-based alerting policy in project ${project} to monitor upcoming
      maintenance notifications. See also "Host maintenance alerts" and
      "onHostMaintenance actions" alerting policies.
      

      For information about how you can format and tailor the content of this field, see Using Markdown and variables in documentation templates.

    4. To advance to the next step, click Next.

  9. In the Choose logs to include in the alert section, check the query and results by clicking Preview logs.

    The query you built in the Query pane is also displayed on this pane. We recommend building the query in the Logs Explorer Query pane first.

    You can edit the query in this pane, if necessary. If you edit the query, then check the results by clicking Preview logs.

  10. Click Next.

  11. In the Set notification frequency and autoclose duration pane, do the following:

    1. Select the minimum time between notifications. This value lets you control the number of notifications you get from Monitoring if this condition is met multiple times. For this example, select 1 day from the options.

    2. For the Incident autoclose duration, use the maximum value of 7 days.

    3. Click Next.

  12. If you already have an email notification channel configured, then you can select it from the list. If not, click Manage notification channels and add an email channel. For information about creating notification channels, see Create and manage notification channels.

  13. Click Save.

    Your log-based alerting policy is now ready to test as described in Test the example log-based alerting policy

To learn more, read Configure log-based alerts and Create and manage notification channels.

What's next