Create and use Spot VMs


This page explains how to create and manage Spot VMs, including the following:

  • How to create, start, and identify Spot VMs
  • How to detect, handle, and test preemption of Spot VMs
  • Best practices for Spot VMs

Spot VMs are virtual machine (VM) instances with the spot provisioning model. Spot VMs are available at up to a 60-91% discount compared to the price of standard VMs. However, Compute Engine might reclaim the resources by preempting Spot VMs at any time. Spot VMs are recommended only for fault-tolerant applications that can withstand VM preemption. Make sure your application can handle preemption before you decide to create Spot VMs.

Before you begin

  • Read the conceptual documentation for Spot VMs:
    • Review the limitations and pricing of Spot VMs.
    • To prevent Spot VMs from consuming your quotas for standard VMs' CPUs, GPUs, and disks, consider requesting preemptible quota for Spot VMs.
  • If you haven't already, then set up authentication. Authentication is the process by which your identity is verified for access to Google Cloud services and APIs. To run code or samples from a local development environment, you can authenticate to Compute Engine by selecting one of the following options:

    Select the tab for how you plan to use the samples on this page:

    Console

    When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.

    gcloud

    1. Install the Google Cloud CLI, then initialize it by running the following command:

      gcloud init
    2. Set a default region and zone.

    Terraform

    To use the Terraform samples on this page in a local development environment, install and initialize the gcloud CLI, and then set up Application Default Credentials with your user credentials.

    1. Install the Google Cloud CLI.
    2. To initialize the gcloud CLI, run the following command:

      gcloud init
    3. If you're using a local shell, then create local authentication credentials for your user account:

      gcloud auth application-default login

      You don't need to do this if you're using Cloud Shell.

    For more information, see Set up authentication for a local development environment.

    REST

    To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.

      Install the Google Cloud CLI, then initialize it by running the following command:

      gcloud init

    For more information, see Authenticate for using REST in the Google Cloud authentication documentation.

Create a Spot VM

Create a Spot VM using the Google Cloud console, gcloud CLI, or the Compute Engine API. A Spot VM is any VM that is configured to use the spot provisioning model:

  • VM provisioning model set to Spot in the Google Cloud console
  • --provisioning-model=SPOT in the gcloud CLI
  • "provisioningModel": "SPOT" in the Compute Engine API

Console

  1. In the Google Cloud console, go to the Create an instance page.

    Go to Create an instance

  2. Then, do the following:

    1. In the Availability policies section, select Spot from the VM provisioning model list. This setting disables automatic restart and host maintenance options for the VM and enables the termination action option.
    2. Optional: In the On VM termination list, select what happens when Compute Engine preempts the VM:
      • To stop the VM during preemption, select Stop (default).
      • To delete the VM during preemption, select Delete.
  3. Optional: Specify other VM options. For more information, see Creating and starting a VM instance.

  4. To create and start the VM, click Create.

gcloud

To create a VM from the gcloud CLI, use the gcloud compute instances create command. To create Spot VMs, you must include the --provisioning-model=SPOT flag. Optionally, you can also specify a termination action for Spot VMs by also including the --instance-termination-action flag.

gcloud compute instances create VM_NAME \
    --provisioning-model=SPOT \
    --instance-termination-action=TERMINATION_ACTION

Replace the following:

  • VM_NAME: name of the new VM.
  • TERMINATION_ACTION: Optional: specify which action to take when Compute Engine preempts the VM, either STOP (default behavior) or DELETE.

For more information about the options you can specify when creating a VM, see Creating and starting a VM instance. For example, to create Spot VMs with a specified machine type and image, use the following command:

gcloud compute instances create VM_NAME \
    --provisioning-model=SPOT \
    [--image=IMAGE | --image-family=IMAGE_FAMILY] \
    --image-project=IMAGE_PROJECT \
    --machine-type=MACHINE_TYPE \
    --instance-termination-action=TERMINATION_ACTION

Replace the following:

  • VM_NAME: name of the new VM.
  • IMAGE: specify one of the following:
    • IMAGE: a specific version of a public image or the image family. For example, a specific image is --image=debian-10-buster-v20200309.
    • An image family. This creates the VM from the most recent, non-deprecated OS image. For example, if you specify --image-family=debian-10, Compute Engine creates a VM from the latest version of the OS image in the Debian 10 image family.
  • IMAGE_PROJECT: the project containing the image. For example, if you specify debian-10 as the image family, specify debian-cloud as the image project.
  • MACHINE_TYPE: the predefined or custom, machine type for the new VM.
  • TERMINATION_ACTION: Optional: specify which action to take when Compute Engine preempts the VM, either STOP (default behavior) or DELETE.

    To get a list of the machine types available in a zone, use the gcloud compute machine-types list command with the --zones flag.

Terraform

You can use a Terraform resource to create a spot instance using scheduling block


resource "google_compute_instance" "spot_vm_instance" {
  name         = "spot-instance-name"
  machine_type = "f1-micro"
  zone         = "us-central1-c"

  boot_disk {
    initialize_params {
      image = "debian-cloud/debian-11"
    }
  }

  scheduling {
    preemptible                 = true
    automatic_restart           = false
    provisioning_model          = "SPOT"
    instance_termination_action = "STOP"
  }

  network_interface {
    # A default network is created for all GCP projects
    network = "default"
    access_config {
    }
  }
}

REST

To create a VM from the Compute Engine API, use the instances.insert method. You must specify a machine type and name for the VM. Optionally, you can also specify an image for the boot disk.

To create Spot VMs, you must include the "provisioningModel": spot field. Optionally, you can also specify a termination action for Spot VMs by also including the "instanceTerminationAction" field.

POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances
{
 "machineType": "zones/ZONE/machineTypes/MACHINE_TYPE",
 "name": "VM_NAME",
 "disks": [
   {
     "initializeParams": {
       "sourceImage": "projects/IMAGE_PROJECT/global/images/IMAGE"
     },
     "boot": true
   }
 ]
 "scheduling":
 {
     "provisioningModel": "SPOT",
     "instanceTerminationAction": "TERMINATION_ACTION"
 },
 ...
}

Replace the following:

  • PROJECT_ID: the project id of the project to create the VM in.
  • ZONE: the zone to create the VM in. The zone must also support the machine type to use for the new VM.
  • MACHINE_TYPE: the predefined or custom, machine type for the new VM.
  • VM_NAME: the name of the new VM.
  • IMAGE_PROJECT: the project containing the image. For example, if you specify family/debian-10 as the image family, specify debian-cloud as the image project.
  • IMAGE: specify one of the following:
    • A specific version of a public image. For example, a specific image is "sourceImage": "projects/debian-cloud/global/images/debian-10-buster-v20200309" where debian-cloud is the IMAGE_PROJECT.
    • An image family. This creates the VM from the most recent, non-deprecated OS image. For example, if you specify "sourceImage": "projects/debian-cloud/global/images/family/debian-10" where debian-cloud is the IMAGE_PROJECT, Compute Engine creates a VM from the latest version of the OS image in the Debian 10 image family.
  • TERMINATION_ACTION: Optional: specify which action to take when Compute Engine preempts the VM, either STOP (default behavior) or DELETE.

For more information about the options you can specify when creating a VM, see Creating and starting a VM instance.

Go


import (
	"context"
	"fmt"
	"io"

	compute "cloud.google.com/go/compute/apiv1"
	"cloud.google.com/go/compute/apiv1/computepb"
	"google.golang.org/protobuf/proto"
)

// createSpotInstance creates a new Spot VM instance with Debian 10 operating system.
func createSpotInstance(w io.Writer, projectID, zone, instanceName string) error {
	// projectID := "your_project_id"
	// zone := "europe-central2-b"
	// instanceName := "your_instance_name"

	ctx := context.Background()
	imagesClient, err := compute.NewImagesRESTClient(ctx)
	if err != nil {
		return fmt.Errorf("NewImagesRESTClient: %w", err)
	}
	defer imagesClient.Close()

	instancesClient, err := compute.NewInstancesRESTClient(ctx)
	if err != nil {
		return fmt.Errorf("NewInstancesRESTClient: %w", err)
	}
	defer instancesClient.Close()

	req := &computepb.GetFromFamilyImageRequest{
		Project: "debian-cloud",
		Family:  "debian-11",
	}

	image, err := imagesClient.GetFromFamily(ctx, req)
	if err != nil {
		return fmt.Errorf("getImageFromFamily: %w", err)
	}

	diskType := fmt.Sprintf("zones/%s/diskTypes/pd-standard", zone)
	disks := []*computepb.AttachedDisk{
		{
			AutoDelete: proto.Bool(true),
			Boot:       proto.Bool(true),
			InitializeParams: &computepb.AttachedDiskInitializeParams{
				DiskSizeGb:  proto.Int64(10),
				DiskType:    proto.String(diskType),
				SourceImage: proto.String(image.GetSelfLink()),
			},
			Type: proto.String(computepb.AttachedDisk_PERSISTENT.String()),
		},
	}

	req2 := &computepb.InsertInstanceRequest{
		Project: projectID,
		Zone:    zone,
		InstanceResource: &computepb.Instance{
			Name:        proto.String(instanceName),
			Disks:       disks,
			MachineType: proto.String(fmt.Sprintf("zones/%s/machineTypes/%s", zone, "n1-standard-1")),
			NetworkInterfaces: []*computepb.NetworkInterface{
				{
					Name: proto.String("global/networks/default"),
				},
			},
			Scheduling: &computepb.Scheduling{
				ProvisioningModel: proto.String(computepb.Scheduling_SPOT.String()),
			},
		},
	}
	op, err := instancesClient.Insert(ctx, req2)
	if err != nil {
		return fmt.Errorf("insert: %w", err)
	}

	if err = op.Wait(ctx); err != nil {
		return fmt.Errorf("unable to wait for the operation: %w", err)
	}

	instance, err := instancesClient.Get(ctx, &computepb.GetInstanceRequest{
		Project:  projectID,
		Zone:     zone,
		Instance: instanceName,
	})

	if err != nil {
		return fmt.Errorf("createInstance: %w", err)
	}

	fmt.Fprintf(w, "Instance created: %v\n", instance)
	return nil
}

Java


import com.google.cloud.compute.v1.AccessConfig;
import com.google.cloud.compute.v1.AccessConfig.Type;
import com.google.cloud.compute.v1.Address.NetworkTier;
import com.google.cloud.compute.v1.AttachedDisk;
import com.google.cloud.compute.v1.AttachedDiskInitializeParams;
import com.google.cloud.compute.v1.ImagesClient;
import com.google.cloud.compute.v1.InsertInstanceRequest;
import com.google.cloud.compute.v1.Instance;
import com.google.cloud.compute.v1.InstancesClient;
import com.google.cloud.compute.v1.NetworkInterface;
import com.google.cloud.compute.v1.Scheduling;
import com.google.cloud.compute.v1.Scheduling.ProvisioningModel;
import java.io.IOException;
import java.util.UUID;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class CreateSpotVm {
  public static void main(String[] args)
          throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    // Project ID or project number of the Google Cloud project you want to use.
    String projectId = "your-project-id";
    // Name of the virtual machine to check.
    String instanceName = "your-instance-name";
    // Name of the zone you want to use. For example: "us-west3-b"
    String zone = "your-zone";

    createSpotInstance(projectId, instanceName, zone);
  }

  // Create a new Spot VM instance with Debian 11 operating system.
  public static Instance createSpotInstance(String projectId, String instanceName, String zone)
          throws IOException, ExecutionException, InterruptedException, TimeoutException {
    String image;
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (ImagesClient imagesClient = ImagesClient.create()) {
      image = imagesClient.getFromFamily("debian-cloud", "debian-11").getSelfLink();
    }
    AttachedDisk attachedDisk = buildAttachedDisk(image, zone);
    String machineTypes = String.format("zones/%s/machineTypes/%s", zone, "n1-standard-1");

    // Send an instance creation request to the Compute Engine API and wait for it to complete.
    Instance instance =
            createInstance(projectId, zone, instanceName, attachedDisk, true, machineTypes, false);

    System.out.printf("Spot instance '%s' has been created successfully", instance.getName());

    return instance;
  }

  // disks: a list of compute_v1.AttachedDisk objects describing the disks
  //     you want to attach to your new instance.
  // machine_type: machine type of the VM being created. This value uses the
  //     following format: "zones/{zone}/machineTypes/{type_name}".
  //     For example: "zones/europe-west3-c/machineTypes/f1-micro"
  // external_access: boolean flag indicating if the instance should have an external IPv4
  //     address assigned.
  // spot: boolean value indicating if the new instance should be a Spot VM or not.
  private static Instance createInstance(String projectId, String zone, String instanceName,
                                         AttachedDisk disk, boolean isSpot, String machineType,
                                         boolean externalAccess)
          throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (InstancesClient client = InstancesClient.create()) {
      Instance instanceResource =
              buildInstanceResource(instanceName, disk, machineType, externalAccess, isSpot);

      InsertInstanceRequest build = InsertInstanceRequest.newBuilder()
              .setProject(projectId)
              .setRequestId(UUID.randomUUID().toString())
              .setZone(zone)
              .setInstanceResource(instanceResource)
              .build();
      client.insertCallable().futureCall(build).get(60, TimeUnit.SECONDS);

      return client.get(projectId, zone, instanceName);
    }
  }

  private static Instance buildInstanceResource(String instanceName, AttachedDisk disk,
                                                String machineType, boolean externalAccess,
                                                boolean isSpot) {
    NetworkInterface networkInterface =
            networkInterface(externalAccess);
    Instance.Builder builder = Instance.newBuilder()
            .setName(instanceName)
            .addDisks(disk)
            .setMachineType(machineType)
            .addNetworkInterfaces(networkInterface);

    if (isSpot) {
      // Set the Spot VM setting
      Scheduling.Builder scheduling = builder.getScheduling()
              .toBuilder()
              .setProvisioningModel(ProvisioningModel.SPOT.name())
              .setInstanceTerminationAction("STOP");
      builder.setScheduling(scheduling);
    }

    return builder.build();
  }

  private static NetworkInterface networkInterface(boolean externalAccess) {
    NetworkInterface.Builder build = NetworkInterface.newBuilder()
            .setNetwork("global/networks/default");

    if (externalAccess) {
      AccessConfig.Builder accessConfig = AccessConfig.newBuilder()
              .setType(Type.ONE_TO_ONE_NAT.name())
              .setName("External NAT")
              .setNetworkTier(NetworkTier.PREMIUM.name());
      build.addAccessConfigs(accessConfig.build());
    }

    return build.build();
  }

  private static AttachedDisk buildAttachedDisk(String sourceImage, String zone) {
    AttachedDiskInitializeParams initializeParams = AttachedDiskInitializeParams.newBuilder()
            .setSourceImage(sourceImage)
            .setDiskSizeGb(10)
            .setDiskType(String.format("zones/%s/diskTypes/pd-standard", zone))
            .build();
    return AttachedDisk.newBuilder()
            .setInitializeParams(initializeParams)
            // Remember to set auto_delete to True if you want the disk to be deleted
            // when you delete your VM instance.
            .setAutoDelete(true)
            .setBoot(true)
            .build();
  }
}

Python

from __future__ import annotations

import re
import sys
from typing import Any
import warnings

from google.api_core.extended_operation import ExtendedOperation
from google.cloud import compute_v1


def get_image_from_family(project: str, family: str) -> compute_v1.Image:
    """
    Retrieve the newest image that is part of a given family in a project.

    Args:
        project: project ID or project number of the Cloud project you want to get image from.
        family: name of the image family you want to get image from.

    Returns:
        An Image object.
    """
    image_client = compute_v1.ImagesClient()
    # List of public operating system (OS) images: https://cloud.google.com/compute/docs/images/os-details
    newest_image = image_client.get_from_family(project=project, family=family)
    return newest_image


def disk_from_image(
    disk_type: str,
    disk_size_gb: int,
    boot: bool,
    source_image: str,
    auto_delete: bool = True,
) -> compute_v1.AttachedDisk:
    """
    Create an AttachedDisk object to be used in VM instance creation. Uses an image as the
    source for the new disk.

    Args:
         disk_type: the type of disk you want to create. This value uses the following format:
            "zones/{zone}/diskTypes/(pd-standard|pd-ssd|pd-balanced|pd-extreme)".
            For example: "zones/us-west3-b/diskTypes/pd-ssd"
        disk_size_gb: size of the new disk in gigabytes
        boot: boolean flag indicating whether this disk should be used as a boot disk of an instance
        source_image: source image to use when creating this disk. You must have read access to this disk. This can be one
            of the publicly available images or an image from one of your projects.
            This value uses the following format: "projects/{project_name}/global/images/{image_name}"
        auto_delete: boolean flag indicating whether this disk should be deleted with the VM that uses it

    Returns:
        AttachedDisk object configured to be created using the specified image.
    """
    boot_disk = compute_v1.AttachedDisk()
    initialize_params = compute_v1.AttachedDiskInitializeParams()
    initialize_params.source_image = source_image
    initialize_params.disk_size_gb = disk_size_gb
    initialize_params.disk_type = disk_type
    boot_disk.initialize_params = initialize_params
    # Remember to set auto_delete to True if you want the disk to be deleted when you delete
    # your VM instance.
    boot_disk.auto_delete = auto_delete
    boot_disk.boot = boot
    return boot_disk


def wait_for_extended_operation(
    operation: ExtendedOperation, verbose_name: str = "operation", timeout: int = 300
) -> Any:
    """
    Waits for the extended (long-running) operation to complete.

    If the operation is successful, it will return its result.
    If the operation ends with an error, an exception will be raised.
    If there were any warnings during the execution of the operation
    they will be printed to sys.stderr.

    Args:
        operation: a long-running operation you want to wait on.
        verbose_name: (optional) a more verbose name of the operation,
            used only during error and warning reporting.
        timeout: how long (in seconds) to wait for operation to finish.
            If None, wait indefinitely.

    Returns:
        Whatever the operation.result() returns.

    Raises:
        This method will raise the exception received from `operation.exception()`
        or RuntimeError if there is no exception set, but there is an `error_code`
        set for the `operation`.

        In case of an operation taking longer than `timeout` seconds to complete,
        a `concurrent.futures.TimeoutError` will be raised.
    """
    result = operation.result(timeout=timeout)

    if operation.error_code:
        print(
            f"Error during {verbose_name}: [Code: {operation.error_code}]: {operation.error_message}",
            file=sys.stderr,
            flush=True,
        )
        print(f"Operation ID: {operation.name}", file=sys.stderr, flush=True)
        raise operation.exception() or RuntimeError(operation.error_message)

    if operation.warnings:
        print(f"Warnings during {verbose_name}:\n", file=sys.stderr, flush=True)
        for warning in operation.warnings:
            print(f" - {warning.code}: {warning.message}", file=sys.stderr, flush=True)

    return result


def create_instance(
    project_id: str,
    zone: str,
    instance_name: str,
    disks: list[compute_v1.AttachedDisk],
    machine_type: str = "n1-standard-1",
    network_link: str = "global/networks/default",
    subnetwork_link: str = None,
    internal_ip: str = None,
    external_access: bool = False,
    external_ipv4: str = None,
    accelerators: list[compute_v1.AcceleratorConfig] = None,
    preemptible: bool = False,
    spot: bool = False,
    instance_termination_action: str = "STOP",
    custom_hostname: str = None,
    delete_protection: bool = False,
) -> compute_v1.Instance:
    """
    Send an instance creation request to the Compute Engine API and wait for it to complete.

    Args:
        project_id: project ID or project number of the Cloud project you want to use.
        zone: name of the zone to create the instance in. For example: "us-west3-b"
        instance_name: name of the new virtual machine (VM) instance.
        disks: a list of compute_v1.AttachedDisk objects describing the disks
            you want to attach to your new instance.
        machine_type: machine type of the VM being created. This value uses the
            following format: "zones/{zone}/machineTypes/{type_name}".
            For example: "zones/europe-west3-c/machineTypes/f1-micro"
        network_link: name of the network you want the new instance to use.
            For example: "global/networks/default" represents the network
            named "default", which is created automatically for each project.
        subnetwork_link: name of the subnetwork you want the new instance to use.
            This value uses the following format:
            "regions/{region}/subnetworks/{subnetwork_name}"
        internal_ip: internal IP address you want to assign to the new instance.
            By default, a free address from the pool of available internal IP addresses of
            used subnet will be used.
        external_access: boolean flag indicating if the instance should have an external IPv4
            address assigned.
        external_ipv4: external IPv4 address to be assigned to this instance. If you specify
            an external IP address, it must live in the same region as the zone of the instance.
            This setting requires `external_access` to be set to True to work.
        accelerators: a list of AcceleratorConfig objects describing the accelerators that will
            be attached to the new instance.
        preemptible: boolean value indicating if the new instance should be preemptible
            or not. Preemptible VMs have been deprecated and you should now use Spot VMs.
        spot: boolean value indicating if the new instance should be a Spot VM or not.
        instance_termination_action: What action should be taken once a Spot VM is terminated.
            Possible values: "STOP", "DELETE"
        custom_hostname: Custom hostname of the new VM instance.
            Custom hostnames must conform to RFC 1035 requirements for valid hostnames.
        delete_protection: boolean value indicating if the new virtual machine should be
            protected against deletion or not.
    Returns:
        Instance object.
    """
    instance_client = compute_v1.InstancesClient()

    # Use the network interface provided in the network_link argument.
    network_interface = compute_v1.NetworkInterface()
    network_interface.network = network_link
    if subnetwork_link:
        network_interface.subnetwork = subnetwork_link

    if internal_ip:
        network_interface.network_i_p = internal_ip

    if external_access:
        access = compute_v1.AccessConfig()
        access.type_ = compute_v1.AccessConfig.Type.ONE_TO_ONE_NAT.name
        access.name = "External NAT"
        access.network_tier = access.NetworkTier.PREMIUM.name
        if external_ipv4:
            access.nat_i_p = external_ipv4
        network_interface.access_configs = [access]

    # Collect information into the Instance object.
    instance = compute_v1.Instance()
    instance.network_interfaces = [network_interface]
    instance.name = instance_name
    instance.disks = disks
    if re.match(r"^zones/[a-z\d\-]+/machineTypes/[a-z\d\-]+$", machine_type):
        instance.machine_type = machine_type
    else:
        instance.machine_type = f"zones/{zone}/machineTypes/{machine_type}"

    instance.scheduling = compute_v1.Scheduling()
    if accelerators:
        instance.guest_accelerators = accelerators
        instance.scheduling.on_host_maintenance = (
            compute_v1.Scheduling.OnHostMaintenance.TERMINATE.name
        )

    if preemptible:
        # Set the preemptible setting
        warnings.warn(
            "Preemptible VMs are being replaced by Spot VMs.", DeprecationWarning
        )
        instance.scheduling = compute_v1.Scheduling()
        instance.scheduling.preemptible = True

    if spot:
        # Set the Spot VM setting
        instance.scheduling.provisioning_model = (
            compute_v1.Scheduling.ProvisioningModel.SPOT.name
        )
        instance.scheduling.instance_termination_action = instance_termination_action

    if custom_hostname is not None:
        # Set the custom hostname for the instance
        instance.hostname = custom_hostname

    if delete_protection:
        # Set the delete protection bit
        instance.deletion_protection = True

    # Prepare the request to insert an instance.
    request = compute_v1.InsertInstanceRequest()
    request.zone = zone
    request.project = project_id
    request.instance_resource = instance

    # Wait for the create operation to complete.
    print(f"Creating the {instance_name} instance in {zone}...")

    operation = instance_client.insert(request=request)

    wait_for_extended_operation(operation, "instance creation")

    print(f"Instance {instance_name} created.")
    return instance_client.get(project=project_id, zone=zone, instance=instance_name)


def create_spot_instance(
    project_id: str, zone: str, instance_name: str
) -> compute_v1.Instance:
    """
    Create a new Spot VM instance with Debian 10 operating system.

    Args:
        project_id: project ID or project number of the Cloud project you want to use.
        zone: name of the zone to create the instance in. For example: "us-west3-b"
        instance_name: name of the new virtual machine (VM) instance.

    Returns:
        Instance object.
    """
    newest_debian = get_image_from_family(project="debian-cloud", family="debian-11")
    disk_type = f"zones/{zone}/diskTypes/pd-standard"
    disks = [disk_from_image(disk_type, 10, True, newest_debian.self_link)]
    instance = create_instance(project_id, zone, instance_name, disks, spot=True)
    return instance

To create multiple Spot VMs with the same properties, you can create an instance template, and use the template to create a managed instance group (MIG). For more information, see best practices.

Start Spot VMs

Like other VMs, Spot VMs start upon creation. Likewise, if Spot VMs are stopped, you can restart the VMs to resume the RUNNING state. You can stop and restart preempted Spot VMs as many times as you would like, as long as there is capacity. For more information, see VM instance life cycle.

If Compute Engine stops one or more Spot VMs in an autoscaling managed instance group (MIG) or Google Kubernetes Engine (GKE) cluster, the group restarts the VMs when the resources become available again.

Identify a VM's provisioning model and termination action

Identify a VM's provisioning model to see if it is a standard VM, Spot VM, or preemptible VM. For a Spot VM, you can also identify the termination action. You can identify a VM's provisioning model and termination action using the Google Cloud console, gcloud CLI, or the Compute Engine API.

Console

  1. Go to the VM instances page.

    Go to the VM instances page

  2. Click the Name of the VM you want to identify. The VM instance details page opens.

  3. Go to the Management section at the bottom of the page. In the Availability policies subsection, check the following options:

    • If the VM provisioning model is set to Spot, the VM is a Spot VM.
      • On VM termination indicates which action to take when Compute Engine preempts the VM, either Stop or Delete the VM.
    • Otherwise, if the VM provisioning model is set to Standard or :
      • If the Preemptibility option is set to On, the VM is a preemptible VM.
      • Otherwise, the VM is a standard VM.

gcloud

To describe a VM from the gcloud CLI, use the gcloud compute instances describe command:

gcloud compute instances describe VM_NAME

where VM_NAME is the name of the VM that you want to check.

In the output, check the scheduling field to identify the VM:

  • If the output includes the provisioningModel field set to SPOT, similar to the following, the VM is a Spot VM.

    ...
    scheduling:
    ...
    provisioningModel: SPOT
    instanceTerminationAction: TERMINATION_ACTION
    ...
    

    where TERMINATION_ACTION indicates which action to take when Compute Engine preempts the VM, either stop (STOP) or delete (DELETE) the VM. If the instanceTerminationAction field is missing, the default value is STOP.

  • Otherwise, if the output includes the provisioningModel field set to standard or if the output omits the provisioningModel field:

    • If the output includes the preemptible field set to true, the VM is a preemptible VM.
    • Otherwise, the VM is a standard VM.

REST

To describe a VM from the Compute Engine API, use the instances.get method:

GET https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME

Replace the following:

  • PROJECT_ID: the project id of the project that the VM is in.
  • ZONE: the zone where the VM is located.
  • VM_NAME: the name of the VM that you want to check.

In the output, check the scheduling field to identify the VM:

  • If the output includes the provisioningModel field set to SPOT, similar to the following, the VM is a Spot VM.

    {
      ...
      "scheduling":
      {
         ...
         "provisioningModel": "SPOT",
         "instanceTerminationAction": "TERMINATION_ACTION"
         ...
      },
      ...
    }
    

    where TERMINATION_ACTION indicates which action to take when Compute Engine preempts the VM, either stop (STOP) or delete (DELETE) the VM. If the instanceTerminationAction field is missing, the default value is STOP.

  • Otherwise, if the output includes the provisioningModel field set to standard or if the output omits the provisioningModel field:

    • If the output includes the preemptible field set to true, the VM is a preemptible VM.
    • Otherwise, the VM is a standard VM.

Go


import (
	"context"
	"fmt"
	"io"

	compute "cloud.google.com/go/compute/apiv1"
	"cloud.google.com/go/compute/apiv1/computepb"
)

// isSpotVM checks if a given instance is a Spot VM or not.
func isSpotVM(w io.Writer, projectID, zone, instanceName string) (bool, error) {
	// projectID := "your_project_id"
	// zone := "europe-central2-b"
	// instanceName := "your_instance_name"
	ctx := context.Background()
	client, err := compute.NewInstancesRESTClient(ctx)
	if err != nil {
		return false, fmt.Errorf("NewInstancesRESTClient: %w", err)
	}
	defer client.Close()

	req := &computepb.GetInstanceRequest{
		Project:  projectID,
		Zone:     zone,
		Instance: instanceName,
	}

	instance, err := client.Get(ctx, req)
	if err != nil {
		return false, fmt.Errorf("GetInstance: %w", err)
	}

	isSpot := instance.GetScheduling().GetProvisioningModel() == computepb.Scheduling_SPOT.String()

	var isSpotMessage string
	if !isSpot {
		isSpotMessage = " not"
	}
	fmt.Fprintf(w, "Instance %s is%s spot\n", instanceName, isSpotMessage)

	return instance.GetScheduling().GetProvisioningModel() == computepb.Scheduling_SPOT.String(), nil
}

Java


import com.google.cloud.compute.v1.Instance;
import com.google.cloud.compute.v1.InstancesClient;
import com.google.cloud.compute.v1.Scheduling;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException;

public class CheckIsSpotVm {
  public static void main(String[] args)
          throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    // Project ID or project number of the Google Cloud project you want to use.
    String projectId = "your-project-id";
    // Name of the virtual machine to check.
    String instanceName = "your-route-name";
    // Name of the zone you want to use. For example: "us-west3-b"
    String zone = "your-zone";

    boolean isSpotVm = isSpotVm(projectId, instanceName, zone);
    System.out.printf("Is %s spot VM instance - %s", instanceName, isSpotVm);
  }

  // Check if a given instance is Spot VM or not.
  public static boolean isSpotVm(String projectId, String instanceName, String zone)
          throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (InstancesClient client = InstancesClient.create()) {
      Instance instance = client.get(projectId, zone, instanceName);

      return instance.getScheduling().getProvisioningModel()
              .equals(Scheduling.ProvisioningModel.SPOT.name());
    }
  }
}

Python

from google.cloud import compute_v1


def is_spot_vm(project_id: str, zone: str, instance_name: str) -> bool:
    """
    Check if a given instance is Spot VM or not.
    Args:
        project_id: project ID or project number of the Cloud project you want to use.
        zone: name of the zone you want to use. For example: "us-west3-b"
        instance_name: name of the virtual machine to check.
    Returns:
        The Spot VM status of the instance.
    """
    instance_client = compute_v1.InstancesClient()
    instance = instance_client.get(
        project=project_id, zone=zone, instance=instance_name
    )
    return (
        instance.scheduling.provisioning_model
        == compute_v1.Scheduling.ProvisioningModel.SPOT.name
    )

Handle preemption with a shutdown script

When Compute Engine preempts a Spot VM, you can use a shutdown script to try to perform cleanup actions before the VM is preempted. For example, you can gracefully stop a running process and copy a checkpoint file to Cloud Storage. Notably, the maximum length of the shutdown period is shorter for a preemption notice than for a user-initiated shutdown. For more information about the shutdown period for a preemption notice, see Preemption process in the conceptual documentation for Spot VMs.

The following is an example of a shutdown script that you can add to a running Spot VM or add while creating a new Spot VM. This script runs when the VM starts to shut down, before the operating system's normal kill command stops all remaining processes. After gracefully stopping the desired program, the script performs a parallel upload of a checkpoint file to a Cloud Storage bucket.

#!/bin/bash

MY_PROGRAM="PROGRAM_NAME" # For example, "apache2" or "nginx"
MY_USER="LOCAL_USER"
CHECKPOINT="/home/$MY_USER/checkpoint.out"
BUCKET_NAME="BUCKET_NAME" # For example, "my-checkpoint-files" (without gs://)

echo "Shutting down!  Seeing if ${MY_PROGRAM} is running."

# Find the newest copy of $MY_PROGRAM
PID="$(pgrep -n "$MY_PROGRAM")"

if [[ "$?" -ne 0 ]]; then
  echo "${MY_PROGRAM} not running, shutting down immediately."
  exit 0
fi

echo "Sending SIGINT to $PID"
kill -2 "$PID"

# Portable waitpid equivalent
while kill -0 "$PID"; do
   sleep 1
done

echo "$PID is done, copying ${CHECKPOINT} to gs://${BUCKET_NAME} as ${MY_USER}"

su "${MY_USER}" -c "gcloud storage cp $CHECKPOINT gs://${BUCKET_NAME}/"

echo "Done uploading, shutting down."

This script assumes the following:

  • The VM was created with at least read/write access to Cloud Storage. For instructions about how to create a VM with the appropriate scopes, see the authentication documentation.

  • You have an existing Cloud Storage bucket and permission to write to it.

To add this script to a VM, configure the script to work with an application on your VM and add it to the VM's metadata.

  1. Copy or download the shutdown script:

    • Copy the preceding shutdown script after replacing the following:

      • PROGRAM_NAME is the name of the process or program you want to shut down. For example, apache2 or nginx.
      • LOCAL_USER is the username you are logged into the virtual machine as.
      • BUCKET_NAME is the name of the Cloud Storage bucket where you want to save the program's checkpoint file. Note the bucket name does not start with gs:// in this case.
    • Download the shutdown script to your local workstation and then replace the following variables in the file:

      • [PROGRAM_NAME] is the name of the process or program you want to shut down. For example, apache2 or nginx.
      • [LOCAL_USER] is the username you are logged into the virtual machine as.
      • [BUCKET_NAME] is the name of the Cloud Storage bucket where you want to save the program's checkpoint file. Note the bucket name does not start with gs:// in this case.
  2. Add the shutdown script to a new VM or an existing VM.

Detect preemption of Spot VMs

Determine if Spot VMs were preempted by Compute Engine using the Google Cloud console, gcloud CLI or the Compute Engine API.

Console

You can check if a VM was preempted by checking the system activity logs.

  1. In the Google Cloud console, go to the Logs page.

    Go to Logs

  2. Select your project and click Continue.

  3. Add compute.instances.preempted to the filter by label or text search field.

  4. Optionally, you can also enter a VM name if you want to see preemption operations for a specific VM.

  5. Press enter to apply the specified filters. The Google Cloud console updates the list of logs to show only the operations where a VM was preempted.

  6. Select an operation in the list to see details about the VM that was preempted.

gcloud

Use the gcloud compute operations list command with a filter parameter to get a list of preemption events in your project.

gcloud compute operations list \
    --filter="operationType=compute.instances.preempted"

Optionally, you can use additional filter parameters to further scope the results. For example, to see preemption events only for instances within a managed instance group, use the following command:

gcloud compute operations list \
    --filter="operationType=compute.instances.preempted AND targetLink:instances/BASE_INSTANCE_NAME"

where BASE_INSTANCE_NAME is the base name specified as a prefix for the names of all the VMs in this managed instance group.

The output is similar to the following:

NAME                  TYPE                         TARGET                                        HTTP_STATUS STATUS TIMESTAMP
systemevent-xxxxxxxx  compute.instances.preempted  us-central1-f/instances/example-instance-xxx  200         DONE   2015-04-02T12:12:10.881-07:00

An operation type of compute.instances.preempted indicates that the VM instance was preempted. You can use the gcloud compute operations describe command to get more information about a specific preemption operation.

gcloud compute operations describe SYSTEM_EVENT \
    --zone=ZONE

Replace the following:

  • SYSTEM_EVENT: the system event from the output of the gcloud compute operations list command—for example, systemevent-xxxxxxxx.
  • ZONE: the zone of the system event—for example, us-central1-f.

The output is similar to the following:

...
operationType: compute.instances.preempted
progress: 100
selfLink: https://compute.googleapis.com/compute/v1/projects/my-project/zones/us-central1-f/operations/systemevent-xxxxxxxx
startTime: '2015-04-02T12:12:10.881-07:00'
status: DONE
statusMessage: Instance was preempted.
...

REST

To get a list of recent system operations for a specific project and zone, use the zoneOperations.get method.

GET https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/operations

Replace the following:

Optionally, to scope the response to show only preemption operations, you can add a filter to your API request:

operationType="compute.instances.preempted"

Alternatively, to see preemption operations for a specific VM, add a targetLink parameter to the filter:

operationType="compute.instances.preempted" AND
targetLink="https://www.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME

Replace the following: + PROJECT_ID: the project id. + ZONE: the zone. + VM_NAME: the name of a specific VM in this zone and project.

The response contains a list of recent operations. For example, a preemption looks similar to the following:

{
  "kind": "compute#operation",
  "id": "15041793718812375371",
  "name": "systemevent-xxxxxxxx",
  "zone": "https://www.googleapis.com/compute/v1/projects/my-project/zones/us-central1-f",
  "operationType": "compute.instances.preempted",
  "targetLink": "https://www.googleapis.com/compute/v1/projects/my-project/zones/us-central1-f/instances/example-instance",
  "targetId": "12820389800990687210",
  "status": "DONE",
  "statusMessage": "Instance was preempted.",
  ...
}

Alternatively, you can determine if a VM was preempted from inside the VM itself. This is useful if you want to handle a shutdown due to a Compute Engine preemption differently from a normal shutdown in a shutdown script. To do this, simply check the metadata server for the preempted value in your VM's default metadata.

For example, use curl from within your VM to obtain the value for preempted:

curl "http://metadata.google.internal/computeMetadata/v1/instance/preempted" -H "Metadata-Flavor: Google"
TRUE

If this value is TRUE, the VM was preempted by Compute Engine, otherwise it is FALSE.

If you want to use this outside of a shutdown script, you can append ?wait_for_change=true to the URL. This performs a hanging HTTP GET request that only returns when the metadata has changed and the VM has been preempted.

curl "http://metadata.google.internal/computeMetadata/v1/instance/preempted?wait_for_change=true" -H "Metadata-Flavor: Google"
TRUE

How to test preemption settings

You can run simulated maintenance events on your VMs to force them to preempt. Use this feature to test how your apps handle Spot VMs. Read Simulate a host maintenance event to learn how to test maintenance events on your instances.

You can also simulate a VM preemption by stopping the VM instance, which can be used instead of simulating a maintenance event and which avoids quota limits.

Best practices

Here are some best practices to help you get the most out of Spot VMs.

  • Use instance templates. Rather than creating Spot VMs one at a time, you can use instance templates to create multiple Spot VMs with the same properties. Instance templates are required for using MIGs. Alternatively, you can also create multiple Spot VMs using the bulk instance API.

  • Use MIGs to regionally distribute and automatically recreate Spot VMs. Use MIGs to make workloads on Spot VMs more flexible and resilient. For example, use regional MIGs to distribute VMs across multiple zones, which helps mitigate resource-availability errors. Additionally, use autohealing to automatically recreate Spot VMs after they are preempted.

  • Pick smaller machine types. Resources for Spot VMs come out of excess and backup Google Cloud capacity. Capacity for Spot VMs is often easier to get for smaller machine types, meaning machine types with less resources like vCPUs and memory. You might find more capacity for Spot VMs by selecting a smaller custom machine type, but capacity is even more likely for smaller predefined machine types. For example, compared to capacity for the n2-standard-32 predefined machine type, capacity for the n2-custom-24-96 custom machine type is more likely, but capacity for the n2-standard-16 predefined machine type is even more likely.

  • Run large clusters of Spot VMs during off peak times. The load on Google Cloud data centers varies with location and time of day, but generally lowest on nights and weekends. As such, nights and weekends are the best times to run large clusters of Spot VMs.

  • Design your applications to be fault and preemption tolerant. It's important to be prepared for the fact that there are changes in preemption patterns at different points in time. For example, if a zone suffers a partial outage, large numbers of Spot VMs could be preempted to make room for standard VMs that need to be moved as part of the recovery. In that small window of time, the preemption rate would look very different than on any other day. If your application assumes that preemptions are always done in small groups, you might not be prepared for such an event.

  • Retry creating Spot VMs that have been preempted. If your Spot VMs have been preempted, try creating new Spot VMs once or twice before falling back to standard VMs. Depending on your requirements, it might be a good idea to combine standard VMs and Spot VMs in your clusters to ensure that work proceeds at an adequate pace.

  • Use shutdown scripts. Manage shutdown and preemption notices with a shutdown script that can save a job's progress so that it can pick up where it left off, rather than start over from scratch.

What's next?