View maintenance notifications
A host maintenance event is when Google Cloud has to perform a maintenance or repair activity on your TPU. Google sends notifications for upcoming host maintenance prior to the maintenance being performed. When the maintenance window opens, Google Cloud automatically performs maintenance on your instance. By monitoring your instance's upcoming maintenance windows, you can proactively prepare your workloads to handle upcoming maintenance with minimal disruption.
Cloud TPU lets you view maintenance notifications using the Google Cloud CLI and by querying the metadata server. You can also view upcoming maintenance events in Cloud Logging. For information about viewing maintenance notifications for TPUs in GKE, see Manage GKE node disruption for GPUs and TPUs.
Maintenance notification fields
Maintenance notifications contain the following fields:
windowStartTime
: The start of the time window in which maintenance will occurwindowEndTime
: The end of the time window in which maintenance will occurlatestWindowStartTime
: The latest time that the maintenance window can be moved tomaintenanceType
: The type of maintenance that will be performedSCHEDULED
: Maintenance will get seven days noticeUNSCHEDULED
: Maintenance represents critical updates for which less notice is given than for scheduled maintenance events
canReschedule
: Whether you can manually start maintenance during the notification period for this VM.TRUE
: You can manually start maintenance during the notification period.FALSE
: You can't manually start maintenance on this VM. This is typically observed during the period in which the VM is actively undergoing maintenance.
maintenanceStatus
: The current maintenance operation's statusONGOING
: The maintenance operation is underwayPENDING
: The maintenance operation has not yet started, but is scheduled
If there is no maintenance notification, the response looks similar to the following:
{ "error": "no notifications have been received yet, try again later" }
Maintenance status behaviors
When managing maintenance events, check the values for canReschedule
and
maintenanceStatus
. When combined, these fields indicate which actions you can
or can't take with regards to manually starting a maintenance event:
canReschedule=True
andmaintenanceStatus=Pending
: you can manually start the maintenance event for the instance before the scheduled start time.canReschedule=False
andmaintenanceStatus=Ongoing
: the maintenance is underway and can't be rescheduled.canReschedule=False
andmaintenanceStatus=Pending
: your instance doesn't support manually-triggered maintenance events.
View maintenance notifications
You can view maintenance notifications by:
- Calling the Cloud TPU API using the Google Cloud CLI
- Querying the metadata server on your VM
- Checking Cloud Logging
Check TPUs for a maintenance notification
gcloud
Use the gcloud alpha compute tpus tpu-vm
describe
command to view
maintenance notifications:
gcloud alpha compute tpus tpu-vm describe TPU_NAME \ --zone=ZONE
If there is an upcoming maintenance event, the response will contain a section like the following:
upcomingMaintenance: canReschedule: true latestWindowStartTime: "2025-12-01T19:00:00Z" maintenanceStatus: PENDING type: SCHEDULED windowEndTime: "2025-12-01T22:00:00Z" windowStartTime: "2025-12-01T19:00:00Z"
In this response:
- The maintenance is scheduled for the date and time shown in
windowStartTime
. canReschedule
is set totrue
andmaintenanceStatus
is set toPENDING
. These settings indicate that you can manually start the scheduled maintenance event before the date shown inlatestWindowStartTime
.
Metadata server
From a TPU VM, query the metadata server to see the next maintenance event:
curl http://metadata.google.internal/computeMetadata/v1/instance/upcoming-maintenance?alt=json -H "Metadata-Flavor: Google"
If there is an upcoming maintenance event, the response will contain a section similar to the following:
Upcoming maintenance: { "can_reschedule" : "true", "latest_window_start_time" : "2024-06-12T16:00:01+00:00", "maintenance_status" : "PENDING", "type" : "SCHEDULED", "window_end_time" : "2024-06-12T20:00:00+00:00", "window_start_time" : "2024-06-12T16:00:00+00:00" }
You can query the metadata server from any TPU VM in the slice because the upcoming maintenance event notification is the same for all VMs in a slice.
For more information about VM metadata, see About VM metadata in the Compute Engine documentation.
Check Cloud Logging for a maintenance notification
When a notification is scheduled on your Cloud TPU, Cloud Logging will
contain a system event log for the event, with the methodName
:
compute.instance.upcomingMaintenance
. To view logs for upcoming maintenance
events:
In the Google Cloud console navigation menu, go to the Logs Explorer page:
Use the following search query to view any TPUs that have an upcoming maintenance event scheduled:
"compute.instances.upcomingMaintenance"
Cloud TPU logs upcoming maintenance events in Cloud Logging by the individual VM instance, for example,
t1v-n-5bdca789-w-0
.
Examples of maintenance notification logs
A maintenance event notification appears in Logs Explorer with values similar to the following:
methodName
:"compute.instances.upcomingMaintenance"
metadata
:maintenanceStatus
:"PENDING"
windowStartTime
:"2024-07-23T20:00:00Z"
The following is an example of a complete log entry for an upcoming maintenance event:
{
"protoPayload": {
"@type": "type.googleapis.com/google.cloud.audit.AuditLog",
"status": {
"message": "Maintenance is scheduled for this instance. Review the maintenance schedule by describing the VM with gcloud CLI or querying the http://metadata.google.internal/computeMetadata/v1/instance/upcoming-maintenance metadata key."
},
"serviceName": "compute.googleapis.com",
"methodName": "compute.instances.upcomingMaintenance",
"resourceName": "projects/cloud-tpu-multipod-dev/zones/europe-west4-b/instances/t1v-n-9472280f-w-0",
"request": {
"@type": "type.googleapis.com/compute.instances.upcomingMaintenance"
},
"metadata": {
"type": "SCHEDULED",
"windowStartTime": "2024-11-15T04:00:00Z",
"canReschedule": true,
"latestWindowStartTime": "2024-11-15T04:00:01Z",
"windowEndTime": "2024-11-15T08:00:00Z",
"maintenanceStatus": "PENDING"
},
"logName": "projects/cloud-tpu-multipod-dev/logs/cloudaudit.googleapis.com%2Fsystem_event",
"operation": {
"id": "systemevent-1731038451389-6265ecbfcd453-5127b81e-f40b8149",
"producer": "compute.instances.upcomingMaintenance",
"first": true,
"last": true
},
"receiveTimestamp": "2024-11-08T04:00:54.457835088Z"
}
When the maintenance event starts, a new informational event appears in the logs with values similar to the following:
methodName
:"compute.instances.upcomingMaintenance"
metadata
:maintenanceStatus
:"ONGOING"
windowStartTime
:"2024-07-23T20:00:00Z"
When the maintenance event ends, a new informational event appears in the audit logs with values similar to the following:
methodName
:"compute.instances.upcomingMaintenance"
status: { message: "Maintenance window has completed for this instance. All maintenance notifications on the instance have been removed." }