Set VM host maintenance policy


This document describes how to set a virtual machine (VM) instance's host maintenance policy to control how the VM behaves when a host event occurs.

Before you begin

  • If you haven't already, set up authentication. Authentication is the process by which your identity is verified for access to Google Cloud services and APIs. To run code or samples from a local development environment, you can authenticate to Compute Engine as follows.

    Select the tab for how you plan to use the samples on this page:

    Console

    When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.

    gcloud

    1. Install the Google Cloud CLI, then initialize it by running the following command:

      gcloud init
    2. Set a default region and zone.

    REST

    To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.

      Install the Google Cloud CLI, then initialize it by running the following command:

      gcloud init

Limitations

  • You can't change the host maintenance policy of a preemptible VM. When there is a maintenance event, the preemptible VM stops and it does not migrate. You must manually restart the preempted VM.
  • After you create a VM using an E2 machine type, you can't change the VM's host maintenance settings from MIGRATE to TERMINATE or the other way around.

Available host maintenance properties

You can configure a VM's maintenance behavior, restart behavior, and behavior after a host error occurs with the following properties.

Compute Engine configures each VM with the default values unless you specify otherwise.

During host events, depending on the configured host maintenance policy, VMs that don't support live migration are terminated or automatically restarted.

  • onHostMaintenance: determines the behavior when a maintenance event occurs that might cause your VM to reboot.

    • MIGRATE (Default): causes Compute Engine to live migrate an instance when there is a maintenance event.
    • TERMINATE: stops a VM instead of migrating it.
  • automaticRestart: determines the behavior when a VM crashes or is stopped by the system.

    • true (Default): Compute Engine restarts an instance if the instance crashes or is stopped.
    • false: Compute Engine does not restart a VM if the VM crashes or is stopped.
  • localSsdRecoveryTimeout: Sets the Local SSD recovery timeout. This is the maximum amount of time, in hours, that Compute Engine waits to recover Local SSD data after a host error. This setting only applies to VMs with attached Local SSD disks.

    • Unset (Default): Compute Engine waits up to 1 hour to recover the disk. For Z3 VMs (Preview), the default wait time is 4 hours.
    • A number from 0 to 168: specifies how long Compute Engine waits to recover the disk. The number is must be an integer, in increments of 1 hour, with a maximum value of 7 days. A value of 0 means that Compute Engine won't wait to recover the data.
  • hostErrorTimeoutSeconds (Preview): Sets the maximum amount of time, in seconds, that Compute Engine waits to restart or terminate a VM after detecting that the VM is unresponsive.

    • Unset (Default): Compute Engine waits up to 5.5 minutes (330 seconds) before restarting an unresponsive VM.
    • Number from 90 to 330: specifies the number of seconds, in increments of 30, that Compute Engine waits before restarting an unresponsive VM.

Set host maintenance policy of a VM

You can change the host maintenance policy of a VM when you first create the VM or after the VM is created.

Set host maintenance policy during VM creation

The information in this section focuses on how to set the host maintenance policy when you create a VM. For more VM creation examples, see Create and start a VM instance.

You can set the host maintenance policy of a VM at creation using the Google Cloud console, gcloud CLI or the Compute Engine API.

Console

  1. In the Google Cloud console, go to the Create an instance page.

    Go to Create an instance

  2. Specify a Name for the VM.

  3. Select a Region and Zone for the VM.

  4. In the Machine configuration section, do the following:

    1. Specify the details of the machine type for the VM.
    2. Expand the VM provisioning model advanced settings menu.
    3. In the On host maintenance menu, select one of the following steps:
    4. To migrate VMs during maintenance events, select Migrate VM instance.
    5. To stop VMs during maintenance events, select Terminate VM instance.
  5. To create the VM, click Create.

gcloud

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell

At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

To set the host maintenance policy of a new VM, use the gcloud compute instances create command. Include one or more of the following parameters:

  • --maintenance-policy: whether the VM is migrated or stopped during host maintenance. The VM is migrated by default if you omit this property.
  • --no-restart-on-failure or --restart-on-failure: whether the VM restarts automatically after a host error. By default, the VM will always restart when a failure is detected.
  • --local-ssd-recovery-timeout: how much time Compute Engine spends recovering any attached Local SSD disks after a host error. The default is 1 hour.

Set the host maintenance policy of a new VM with the following command. If you omit any of the flags, the flag's default is used.

  gcloud compute instances create VM_NAME \
      --maintenance-policy=MAINTENANCE_POLICY \
      --RESTART_ON_FAILURE_BEHAVIOR \
      --local-ssd-recovery-timeout=SSD_RECOVERY_TIMEOUT

Replace the following:

  • VM_NAME: the VM name.
  • MAINTENANCE_POLICY: the maintenance policy for this VM, either TERMINATE or MIGRATE.
  • RESTART_ON_FAILURE_BEHAVIOR: Restart behaviour for the VM, set to either --no-restart-on-failure or --restart-on-failure.
  • SSD_RECOVERY_TIMEOUT: the number of hours to spend recovering a Local SSD attached to an unresponsive VM. Valid values are from 0 to 168, in increments of 1 hour.

Set the host error detection timeout

To set the maximum amount of time Compute Engine waits to restart or terminate an unresponsive VM, use the gcloud compute instances create command. Specify the timeout with the --host-error-timeout-seconds flag.

  gcloud beta compute instances create VM_NAME \
      --maintenance-policy=MAINTENANCE_POLICY \
      --RESTART_ON_FAILURE_BEHAVIOR \
      --local-ssd-recovery-timeout=SSD_RECOVERY_TIMEOUT \
      --host-error-timeout-seconds=ERROR_DETECTION_TIMEOUT

Replace the following:

  • VM_NAME: the VM name.
  • MAINTENANCE_POLICY: the maintenance policy for this VM, either TERMINATE or MIGRATE.
  • RESTART_ON_FAILURE_BEHAVIOR: Restart behaviour for the VM, set to either --no-restart-on-failure or --restart-on-failure.
  • SSD_RECOVERY_TIMEOUT: the number of hours Compute Engine spends recovering a Local SSD that was attached to an unresponsive VM. Valid values are from 0 to 168, in increments of 1 hour.
  • ERROR_DETECTION_TIMEOUT: the number of seconds Compute Engine waits before restarting an unresponsive VM, from 90 to 330, in increments of 30.

REST

To set the host maintenance policy of a new VM using the Compute Engine API, use the instances.insert method. Include one or more of the following properties in the scheduling object of the request body:

  • onHostMaintenance: whether the VM is migrated or stopped during host maintenance. The VM is migrated by default.
  • automaticRestart: whether the VM restarts automatically after a host error. VMs are restarted automatically by default.
  • localSsdRecoveryTimeout: how much time Compute Engine spends recovering any attached Local SSD disks after detecting a host error. The default is 1 hour.
      POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances

      {
        "name": "VM_NAME",

        "scheduling": {
          "onHostMaintenance": "MAINTENANCE_POLICY",
          "automaticRestart": "RESTART_POLICY,
          "localSsdRecoveryTimeout": SSD_RECOVERY_TIMEOUT
        }
      }

Replace the following:

  • PROJECT_ID: the project for the VM.
  • ZONE: the zone where you want to create the VM.
  • VM_NAME: the VM name.
  • MAINTENANCE_POLICY: the maintenance policy for this VM, either TERMINATE or MIGRATE.
  • RESTART_POLICY: the restart policy for this VM, either true or false.
  • SSD_RECOVERY_TIMEOUT: the number of hours Compute Engine spends recovering a Local SSD disk that was attached to an unresponsive VM. Valid values are from 0 to 168, in increments of 1 hour.

Set the host error detection timeout

To set the maximum amount of time Compute Engine waits to restart or terminate an unresponsive VM, use the beta instances.insert method because this option is available in Preview.

Add the hostErrorTimeoutSeconds property to the scheduling object of the request body.


   POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/zones/ZONE/instances

   {
      "name": "VM_NAME",

      "scheduling": {
        "onHostMaintenance": "MAINTENANCE_POLICY",
        "automaticRestart": "RESTART_POLICY,
        "localSsdRecoveryTimeout": SSD_RECOVERY_TIMEOUT
        "hostErrorTimeoutSeconds": HOST_ERROR_TIMEOUT,
      }
    }

Replace the following:

  • PROJECT_ID: the project for the VM.
  • ZONE: the zone where you want to create the VM.
  • VM_NAME: the VM name.
  • MAINTENANCE_POLICY: the maintenance policy for this VM, either TERMINATE or MIGRATE.
  • RESTART_POLICY: the restart policy for this VM, either true or false.
  • SSD_RECOVERY_TIMEOUT: the number of hours Compute Engine to spend recovering a Local SSD disk that was attached to an unresponsive VM. Valid values are from 0 to 168, in increments of 1 hour.
  • HOST_ERROR_TIMEOUT: the number of seconds Compute Engine waits before restarting or terminating an unresponsive VM. Valid values are from 90 to 330, in increments of 30.

Update the host maintenance policy of an existing VM

Console

  1. In the Google Cloud console, go to the VM instances page.

    Go to VM instances

  2. Click the VM for which you want to change settings. The VM details page displays.

  3. On the VM details page, complete the following steps:

    1. Click the Edit button at the top of the page.
    2. Go to the Management section. From the Availability policies section, you can set the On host maintenance and Automatic restart options.
    3. Click Save.

gcloud

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell

At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

Update the host maintenance policy of an existing VM with the gcloud compute instances set-scheduling command. Use the same parameters described in the VM creation command in the preceding section.

    gcloud compute instances set-scheduling VM_NAME \
      --maintenance-policy=MAINTENANCE_POLICY \
      --RESTART_ON_FAILURE_BEHAVIOR \
      --local-ssd-recovery-timeout=SSD_RECOVERY_TIMEOUT

Replace the following:

  • VM_NAME: the VM name.
  • MAINTENANCE_POLICY: the policy for this VM, either TERMINATE or MIGRATE.
  • RESTART_ON_FAILURE_BEHAVIOR: restart behaviour for the VM, either --no-restart-on-failure or --restart-on-failure.
  • SSD_RECOVERY_TIMEOUT: the time, in hours, Compute Engine spends recovering a Local SSD disk attached to an unresponsive VM. Valid values are from 0 to 168.

Update the host error detection timeout

To update the maximum amount of time Compute Engine waits to restart or terminate an unresponsive VM, use the gcloud beta compute instances set-scheduling command, because this feature is only available in Preview.

Update the timeout with the --host-error-timeout-seconds parameter. For example:

    gcloud beta compute instances set-scheduling VM_NAME \
      --maintenance-policy=MAINTENANCE_POLICY \
      --RESTART_ON_FAILURE_BEHAVIOR \
      --local-ssd-recovery-timeout=SSD_RECOVERY_TIMEOUT \
      --host-error-timeout-seconds=NUMBER_OF_SECONDS

Replace the following:

  • VM_NAME: the VM name.
  • MAINTENANCE_POLICY: the maintenance policy for this VM, either TERMINATE or MIGRATE.
  • RESTART_ON_FAILURE_BEHAVIOR: Restart behaviour for the VM, set to either --no-restart-on-failure or --restart-on-failure.
  • SSD_RECOVERY_TIMEOUT: the time, in hours, Compute Engine spends recovering a Local SSD disk that was attached to an unresponsive VM. Valid values are from 0 to 168.
  • NUMBER_OF_SECONDS: the number of seconds Compute Engine waits before restarting or terminating an unresponsive VM, from 90 to 330, in increments of 30.

REST

Update the host maintenance policy of an existing VM with a POST request to the instances.setScheduling method.

Include one or more of the following properties in the request body:

  • onHostMaintenance: whether the VM is migrated or stopped during host maintenance. The VM is migrated by default.
  • automaticRestart: whether the VM restarts automatically after a host error. VMs are restarted automatically by default.
  • localSsdRecoveryTimeout: how much time Compute Engine spends recovering any attached Local SSD disks after detecting a host error. If omitted, the default is 1 hour.
    POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/setScheduling

    {
      "onHostMaintenance": "MAINTENANCE_POLICY",
      "automaticRestart": RESTART_POLICY,
      "localSsdRecoveryTimeout": SSD_RECOVERY_TIMEOUT
    }

Replace the following:

  • PROJECT_ID: the project for the VM.
  • ZONE: the zone where the VM is located.
  • VM_NAME: the VM name.
  • MAINTENANCE_POLICY: the maintenance policy for this VM, either TERMINATE or MIGRATE.
  • RESTART_POLICY: the restart policy for this VM, either true or false.
  • SSD_RECOVERY_TIMEOUT: the time, in hours, that Compute Engine spends recovering a Local SSD disk that was attached to an unresponsive VM. Valid values are from 0 to 168.

Update the host error detection timeout

To update the maximum amount of time Compute Engine waits to restart or terminate an unresponsive VM, you must use the beta instances.setScheduling method because this feature is available in Preview.

Add the hostErrorTimeoutSeconds parameter to the request body.

  POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/setScheduling

  {
    "hostErrorTimeoutSeconds": NUMBER_OF_SECONDS,
  }

Replace the following:

  • PROJECT_ID: the project for the VM.
  • ZONE: the zone where the VM is located.
  • VM_NAME: the VM name.
  • NUMBER_OF_SECONDS: the number of seconds Compute Engine waits before restarting or terminating an unresponsive VM, from 90 to 330, in increments of 30.

View host maintenance policy settings of a VM

Console

  1. Go to the VM instances page.

    Go to VM instances

  2. Click the Name of the VM for which you want to view settings. The VM instance details page opens.

  3. Go to the Management section. The Availability policies subsection shows your current settings for On host maintenance and Automatic restart.

gcloud

View the host maintenance option settings for a VM with the gcloud compute instances describe command:

    gcloud compute instances describe VM_NAME --format="yaml(scheduling)"

Replace VM_NAME with the VM name.

The output includes the VM's host error detection timeout, for example:

    scheduling:
      automaticRestart: true
      localSsdRecoveryTimeout:
        nanos: 0
        seconds: '10800'
      onHostMaintenance: MIGRATE
      preemptible: false
      provisioningModel: STANDARD

View the host error detection timeout setting

View the current value of the hostErrorTimeoutSeconds with the gcloud beta compute instances describe command, because this option is only available in Preview.

  gcloud beta compute instances describe VM_NAME --format="yaml(scheduling)"

Replace VM_NAME with the VM name.

The output includes the VM's host error detection timeout, for example:

  scheduling:
    automaticRestart: true
    hostErrorTimeoutSeconds: 120
    localSsdRecoveryTimeout:
      nanos: 0
      seconds: '10800'
    onHostMaintenance: MIGRATE
    preemptible: false
    provisioningModel: STANDARD

REST

To view the host maintenance settings for a VM, use the instances.get method:

  GET https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME

Replace the following:

  • PROJECT_ID: the project where the VM is located.
  • ZONE: the zone where the VM is located.
  • VM_NAME: the VM name.

In the output, the scheduling object contains the VM's host maintenance policy, for example:

  "scheduling": {
      "onHostMaintenance": "MIGRATE",
      "automaticRestart": true,
      "preemptible": false,
      "provisioningModel": "STANDARD",
      "localSsdRecoveryTimeout": {
        "seconds": "10800",
        "nanos": 0
      }
    }

View the host error timeout settings

View the current hostErrorTimeoutSeconds setting with a GET request to the beta instances.get method, because this option is only available in Preview.

 GET https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME

Replace the following:

  • PROJECT_ID: the project for the VM.
  • ZONE: the zone where the VM is located.
  • VM_NAME: the VM name.

In the output, the scheduling object includes the VM's host error detection timeout, for example:

  "scheduling": {
    "hostErrorTimeoutSeconds": 120
  }

What's next