Create and use Spot VMs

This page explains how to create and manage Spot VMs, including the following:

  • How to create, start, and identify Spot VMs
  • How to detect, handle, and test preemption of Spot VMs
  • Best practices for Spot VMs

Spot VMs are virtual machine (VM) instances with the spot provisioning model. Spot VMs are available at a 60-91% discount compared to the price of standard VMs. However, Compute Engine might preempt Spot VMs if it needs to reclaim those resources for other tasks. Spot VMs are recommended only for fault-tolerant applications that can withstand VM preemption. Make sure your application can handle preemption before you decide to create Spot VMs.

Before you begin

Create a Spot VM

Create a Spot VM using the console, gcloud CLI, or the Compute Engine API. A Spot VM is any VM that is configured to use the spot provisioning model:

  • VM provisioning model set to Spot in the console
  • --provisioning-model=SPOT in the gcloud CLI
  • "provisioningModel": "SPOT" in the Compute Engine API

Console

  1. In the Google Cloud console, go to the Create an instance page.

    Go to Create an instance

  2. Expand the Networking, disks, security, management, sole tenancy section, and do the following:

    1. Expand the Management section.
    2. In the Availability policies section, select Spot from the VM provisioning model list. This setting disables automatic restart and host maintenance options for the VM and enables the termination action option.
    3. Optional: In the On VM termination list, select what happens when Compute Engine preempts the VM:
      • To stop the VM during preemption, select Stop (default).
      • To delete the VM during preemption, select Delete.
  3. Optional: Specify other VM options. For more information, see Creating and starting a VM instance.

  4. To create and start the VM, click Create.

gcloud

To create a VM from the gcloud CLI, use the gcloud compute instances create command. To create Spot VMs, you must include the --provisioning-model=SPOT flag. Optionally, you can also specify a termination action for Spot VMs by also including the --instance-termination-action flag.

gcloud compute instances create VM_NAME \
    --provisioning-model=SPOT \
    --instance-termination-action=TERMINATION_ACTION

Replace the following:

  • VM_NAME: name of the new VM.
  • TERMINATION_ACTION: Optional: specify which action to take when Compute Engine preempts the VM, either STOP (default behavior) or DELETE.

For more information about the options you can specify when creating a VM, see Creating and starting a VM instance. For example, to create Spot VMs with a specified machine type and image, use the following command:

gcloud compute instances create VM_NAME \
    --provisioning-model=SPOT \
    [--image=IMAGE | --image-family=IMAGE_FAMILY] \
    --image-project=IMAGE_PROJECT \
    --machine-type=MACHINE_TYPE \
    --instance-termination-action=TERMINATION_ACTION

Replace the following:

  • VM_NAME: name of the new VM.
  • IMAGE: specify one of the following:
    • IMAGE: a specific version of a public image or the image family. For example, a specific image is --image=debian-10-buster-v20200309.
    • An image family. This creates the VM from the most recent, non-deprecated OS image. For example, if you specify --image-family=debian-10, Compute Engine creates a VM from the latest version of the OS image in the Debian 10 image family.
  • IMAGE_PROJECT: the project containing the image. For example, if you specify debian-10 as the image family, specify debian-cloud as the image project.
  • MACHINE_TYPE: the predefined or custom, machine type for the new VM.
  • TERMINATION_ACTION: Optional: specify which action to take when Compute Engine preempts the VM, either STOP (default behavior) or DELETE.

    To get a list of the machine types available in a zone, use the gcloud compute machine-types list command with the --zones flag.

API

To create a VM from the Compute Engine API, use the instances.insert method. You must specify a machine type and name for the VM. Optionally, you can also specify an image for the boot disk.

To create Spot VMs, you must include the "provisioningModel": spot field. Optionally, you can also specify a termination action for Spot VMs by also including the "instanceTerminationAction" field.

POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances
{
 "machineType": "zones/ZONE/machineTypes/MACHINE_TYPE",
 "name": "VM_NAME",
 "disks": [
   {
     "initializeParams": {
       "sourceImage": "projects/IMAGE_PROJECT/global/images/IMAGE"
     },
     "boot": true
   }
 ]
 "scheduling":
 {
     "provisioningModel": "SPOT",
     "instanceTerminationAction": "TERMINATION_ACTION"
 },
 ...
}

Replace the following:

  • PROJECT_ID: the project id of the project to create the VM in.
  • ZONE: the zone to create the VM in. The zone must also support the machine type to use for the new VM.
  • MACHINE_TYPE: the predefined or custom, machine type for the new VM.
  • VM_NAME: the name of the new VM.
  • IMAGE_PROJECT: the project containing the image. For example, if you specify debian-10 as the image family, specify debian-cloud as the image project.
  • IMAGE: specify one of the following:
    • A specific version of a public image. For example, a specific image is "sourceImage": "projects/debian-cloud/global/images/debian-10-buster-v20200309" where debian-cloud is the IMAGE_PROJECT.
    • An image family. This creates the VM from the most recent, non-deprecated OS image. For example, if you specify "sourceImage": "projects/debian-cloud/global/images/family/debian-10" where debian-cloud is the IMAGE_PROJECT, Compute Engine creates a VM from the latest version of the OS image in the Debian 10 image family.
  • TERMINATION_ACTION: Optional: specify which action to take when Compute Engine preempts the VM, either STOP (default behavior) or DELETE.

For more information about the options you can specify when creating a VM, see Creating and starting a VM instance.

Terraform

You can use a Terraform resource to create a spot instance using scheduling block


resource "google_compute_instance" "spot_vm_instance" {
  name         = "spot-instance-name"
  machine_type = "f1-micro"
  zone         = "us-central1-c"

  boot_disk {
    initialize_params {
      image = "debian-cloud/debian-11"
    }
  }

  scheduling {
      preemptible = true
      automatic_restart = false
      provisioning_model = "SPOT"
      instance_termination_action = "STOP"
  }

  network_interface {
    # A default network is created for all GCP projects
    network = "default"
    access_config {
    }
  }
}

To create multiple Spot VMs with the same properties, you can create an instance template, and use the template to create a managed instance group (MIG). For more information, see best practices.

Start Spot VMs

Like other VMs, Spot VMs start upon creation. Likewise, if Spot VMs are stopped, you can restart the VMs to resume the RUNNING state. You can stop and restart preempted Spot VMs as many times as you would like, as long as there is capacity. For more information, see VM instance life cycle.

If Compute Engine stops one or more Spot VMs in an autoscaling managed instance group (MIG) or Google Kubernetes Engine (GKE) cluster, the group restarts the VMs when the resources become available again.

Identify a VM's provisioning model and termination action

Identify a VM's provisioning model to see if it is a standard VM, Spot VM, or preemptible VM. For a Spot VM, you can also identify the termination action. You can identify a VM's provisioning model and termination action using the console, gcloud CLI, or the Compute Engine API.

Console

  1. Go to the VM instances page.

    Go to the VM instances page

  2. Click the Name of the VM you want to identify. The VM instance details page opens.

  3. Go to the Management section at the bottom of the page. In the Availability policies subsection, check the following options:

    • If the VM provisioning model is set to Spot, the VM is a Spot VM.
      • On VM termination indicates which action to take when Compute Engine preempts the VM, either Stop or Delete the VM.
    • Otherwise, if the VM provisioning model is set to Standard or :
      • If the Preemptibility option is set to On, the VM is a preemptible VM.
      • Otherwise, the VM is a standard VM.

gcloud

To describe a VM from the gcloud CLI, use the gcloud compute instances describe command:

gcloud compute instances describe VM_NAME

where VM_NAME is the name of the VM that you want to check.

In the output, check the scheduling field to identify the VM:

  • If the output includes the provisioningModel field set to SPOT, similar to the following, the VM is a Spot VM.

    ...
    scheduling:
    ...
    provisioningModel: SPOT
    instanceTerminationAction: TERMINATION_ACTION
    ...
    

    where TERMINATION_ACTION indicates which action to take when Compute Engine preempts the VM, either stop (STOP) or delete (DELETE) the VM. If the instanceTerminationAction field is missing, the default value is STOP.

  • Otherwise, if the output includes the provisioningModel field set to standard or if the output omits the provisioningModel field:

    • If the output includes the preemptible field set to true, the VM is a preemptible VM.
    • Otherwise, the VM is a standard VM.

API

To describe a VM from the Compute Engine API, use the instances.get method:

GET https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME

Replace the following:

  • PROJECT_ID: the project id of the project that the VM is in.
  • ZONE: the zone where the VM is located.
  • VM_NAME: the name of the VM that you want to check.

In the output, check the scheduling field to identify the VM:

  • If the output includes the provisioningModel field set to SPOT, similar to the following, the VM is a Spot VM.

    {
      ...
      "scheduling":
      {
         ...
         "provisioningModel": "SPOT",
         "instanceTerminationAction": "TERMINATION_ACTION"
         ...
      },
      ...
    }
    

    where TERMINATION_ACTION indicates which action to take when Compute Engine preempts the VM, either stop (STOP) or delete (DELETE) the VM. If the instanceTerminationAction field is missing, the default value is STOP.

  • Otherwise, if the output includes the provisioningModel field set to standard or if the output omits the provisioningModel field:

    • If the output includes the preemptible field set to true, the VM is a preemptible VM.
    • Otherwise, the VM is a standard VM.

Handle preemption with a shutdown script

When your Spot VMs are preempted by Compute Engine, you can use a shutdown script to perform cleanup actions before each VM is preempted. For example, you can gracefully stop a running process and copy a checkpoint file to Cloud Storage.

The following is an example of a shutdown script that you can add to running Spot VMs or add while creating new Spot VMs. This script runs when the VM starts to shut down, before the operating system's normal kill command stops all remaining processes. After gracefully stopping the desired program, the script performs a parallel upload of a checkpoint file to a Cloud Storage bucket.

#!/bin/bash

MY_PROGRAM="PROGRAM_NAME" # For example, "apache2" or "nginx"
MY_USER="LOCAL_USER"
CHECKPOINT="/home/$MY_USER/checkpoint.out"
GSUTIL_OPTS="-m -o GSUtil:parallel_composite_upload_threshold=32M"
BUCKET_NAME="BUCKET_NAME" # For example, "my-checkpoint-files" (without gs://)

echo "Shutting down!  Seeing if ${MY_PROGRAM} is running."

# Find the newest copy of $MY_PROGRAM
PID="$(pgrep -n "$MY_PROGRAM")"

if [[ "$?" -ne 0 ]]; then
  echo "${MY_PROGRAM} not running, shutting down immediately."
  exit 0
fi

echo "Sending SIGINT to $PID"
kill -2 "$PID"

# Portable waitpid equivalent
while kill -0 "$PID"; do
   sleep 1
done

echo "$PID is done, copying ${CHECKPOINT} to gs://${BUCKET_NAME} as ${MY_USER}"

su "${MY_USER}" -c "gsutil $GSUTIL_OPTS cp $CHECKPOINT gs://${BUCKET_NAME}/"

echo "Done uploading, shutting down."

This script assumes the following:

  • The VM was created with at least read/write access to Cloud Storage. For instructions about how to create a VM with the appropriate scopes, see the authentication documentation.

  • You have an existing Cloud Storage bucket and permission to write to it.

To add this script to a VM, configure the script to work with an application on your VM and add it to the VM's metadata.

  1. Copy or download the shutdown script:

    • Copy the preceding shutdown script after replacing the following:

      • PROGRAM_NAME is the name of the process or program you want to shut down. For example, apache2 or nginx.
      • LOCAL_USER is the username you are logged into the virtual machine as.
      • BUCKET_NAME is the name of the Cloud Storage bucket where you want to save the program's checkpoint file. Note the bucket name does not start with gs:// in this case.
    • Download the shutdown script to your local workstation and then replace the following variables in the file:

      • [PROGRAM_NAME] is the name of the process or program you want to shut down. For example, apache2 or nginx.
      • [LOCAL_USER] is the username you are logged into the virtual machine as.
      • [BUCKET_NAME] is the name of the Cloud Storage bucket where you want to save the program's checkpoint file. Note the bucket name does not start with gs:// in this case.
  2. Add the shutdown script to a new VM or an existing VM.

Detect preemption of Spot VMs

Determine if Spot VMs were preempted by Compute Engine using the Google Cloud console, gcloud CLI or the Compute Engine API.

Console

You can check if a VM was preempted by checking the system activity logs.

  1. In the Google Cloud console, go to the Logs page.

    Go to Logs

  2. Select your project and click Continue.

  3. Add compute.instances.preempted to the filter by label or text search field.

  4. Optionally, you can also enter a VM name if you want to see preemption operations for a specific VM.

  5. Press enter to apply the specified filters. The console updates the list of logs to show only the operations where a VM was preempted.

  6. Select an operation in the list to see details about the VM that was preempted.

gcloud

Use the gcloud compute operations list command with a filter parameter to get a list of preemption events in your project.

gcloud compute operations list \
    --filter="operationType=compute.instances.preempted"

Optionally, you can use additional filter parameters to further scope the results. For example, to see preemption events only for instances within a managed instance group, use the following command:

gcloud compute operations list \
    --filter="operationType=compute.instances.preempted AND targetLink:instances/BASE_INSTANCE_NAME"

where BASE_INSTANCE_NAME is the base name specified as a prefix for the names of all the VMs in this managed instance group.

The output is similar to the following:

NAME                  TYPE                         TARGET                                        HTTP_STATUS STATUS TIMESTAMP
systemevent-xxxxxxxx  compute.instances.preempted  us-central1-f/instances/example-instance-xxx  200         DONE   2015-04-02T12:12:10.881-07:00

An operation type of compute.instances.preempted indicates that the VM instance was preempted. You can use the gcloud compute operations describe command to get more information about a specific preemption operation.

gcloud compute operations describe \
    SYSTEM_EVENT

where SYSTEM_EVENT is the system event from the output of the gcloud compute operations list command, for example systemevent-xxxxxxxx.

The output is similar to the following:

...
operationType: compute.instances.preempted
progress: 100
selfLink: https://compute.googleapis.com/compute/v1/projects/my-project/zones/us-central1-f/operations/systemevent-xxxxxxxx
startTime: '2015-04-02T12:12:10.881-07:00'
status: DONE
statusMessage: Instance was preempted.
...

API

To get a list of recent system operations for a specific project and zone, use the zoneOperations.get method.

GET https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/operations

Replace the following:

Optionally, to scope the response to show only preemption operations, you can add a filter to your API request:

operationType="compute.instances.preempted"

Alternatively, to see preemption operations for a specific VM, add a targetLink parameter to the filter:

operationType="compute.instances.preempted" AND
targetLink="https://www.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME

Replace the following: + PROJECT_ID: the project id. + ZONE: the zone. + VM_NAME: the name of a specific VM in this zone and project.

The response contains a list of recent operations. For example, a preemption looks similar to the following:

{
  "kind": "compute#operation",
  "id": "15041793718812375371",
  "name": "systemevent-xxxxxxxx",
  "zone": "https://www.googleapis.com/compute/v1/projects/my-project/zones/us-central1-f",
  "operationType": "compute.instances.preempted",
  "targetLink": "https://www.googleapis.com/compute/v1/projects/my-project/zones/us-central1-f/instances/example-instance",
  "targetId": "12820389800990687210",
  "status": "DONE",
  "statusMessage": "Instance was preempted.",
  ...
}

Alternatively, you can determine if a VM was preempted from inside the VM itself. This is useful if you want to handle a shutdown due to a Compute Engine preemption differently from a normal shutdown in a shutdown script. To do this, simply check the metadata server for the preempted value in your VM's default metadata.

For example, use curl from within your VM to obtain the value for preempted:

curl "http://metadata.google.internal/computeMetadata/v1/instance/preempted" -H "Metadata-Flavor: Google"
TRUE

If this value is TRUE, the VM was preempted by Compute Engine, otherwise it is FALSE.

If you want to use this outside of a shutdown script, you can append ?wait_for_change=true to the URL. This performs a hanging HTTP GET request that only returns when the metadata has changed and the VM has been preempted.

curl "http://metadata.google.internal/computeMetadata/v1/instance/preempted?wait_for_change=true" -H "Metadata-Flavor: Google"
TRUE

Testing preemption settings

You can run simulated maintenance events on your VMs to force them to preempt. Use this feature to test how your apps handle Spot VMs. Read testing your availability policies to learn how to test maintenance events on your instances.

You can also simulate a VM preemption by stopping the VM instance, which can be used instead of simulating a maintenance event and which avoids quota limits.

Best practices

Here are some best practices to help you get the most out of Spot VMs.

  • Use instance templates. Rather than creating Spot VMs one at a time, you can use instance templates to create multiple Spot VMs with the same properties. Instance templates are required for using MIGs. Alternatively, you can also create multiple Spot VMs using the bulk instance API.

  • Use MIGs to regionally distribute and automatically recreate Spot VMs. Use MIGs to make workloads on Spot VMs more flexible and resilient. For example, use regional MIGs to distribute VMs across multiple zones, which helps mitigate resource-availability errors. Additionally, use autohealing to automatically recreate Spot VMs after they are preempted.

  • Pick smaller machine types. Resources for Spot VMs come out of excess and backup Google Cloud capacity. Capacity for Spot VMs is often easier to get for smaller machine types, meaning machine types with less resources like vCPUs and memory. You might find more capacity for Spot VMs by selecting a smaller custom machine type, but capacity is even more likely for smaller predefined machine types. For example, compared to capacity for the n2-standard-32 predefined machine type, capacity for the n2-custom-24-96 custom machine type is more likely, but capacity for the n2-standard-16 predefined machine type is even more likely.

  • Run large clusters of Spot VMs during off peak times. The load on Google Cloud data centers varies with location and time of day, but generally lowest on nights and weekends. As such, nights and weekends are the best times to run large clusters of Spot VMs.

  • Design your applications to be fault and preemption tolerant. It's important to be prepared for the fact that there are changes in preemption patterns at different points in time. For example, if a zone suffers a partial outage, large numbers of Spot VMs could be preempted to make room for standard VMs that need to be moved as part of the recovery. In that small window of time, the preemption rate would look very different than on any other day. If your application assumes that preemptions are always done in small groups, you might not be prepared for such an event. You can test your application's behavior under a preemption event by stopping the VM.

  • Retry creating Spot VMs that have been preempted. If your Spot VMs have been preempted, try creating new Spot VMs once or twice before falling back to standard VMs. Depending on your requirements, it might be a good idea to combine standard VMs and Spot VMs in your clusters to ensure that work proceeds at an adequate pace.

  • Use shutdown scripts. Manage shutdown and preemption notices with a shutdown script that can save a job's progress so that it can pick up where it left off, rather than start over from scratch.

What's next?