This page explains how to create and manage Spot VMs, including the following:
- How to create, start, and identify Spot VMs
- How to detect, handle, and test preemption of Spot VMs
- Best practices for Spot VMs
Spot VMs are virtual machine (VM) instances with the spot provisioning model. Spot VMs are available at up to a 60-91% discount compared to the price of standard VMs. However, Compute Engine might reclaim the resources by preempting Spot VMs at any time. Spot VMs are recommended only for fault-tolerant applications that can withstand VM preemption. Make sure your application can handle preemption before you decide to create Spot VMs.
Before you begin
- Read the conceptual documentation for Spot VMs:
- Review the limitations and pricing of Spot VMs.
- To prevent Spot VMs from consuming your quotas for standard VMs' CPUs, GPUs, and disks, consider requesting preemptible quota for Spot VMs.
-
If you haven't already, then set up authentication.
Authentication is
the process by which your identity is verified for access to Google Cloud services and APIs.
To run code or samples from a local development environment, you can authenticate to
Compute Engine by selecting one of the following options:
Select the tab for how you plan to use the samples on this page:
Console
When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.
gcloud
-
Install the Google Cloud CLI, then initialize it by running the following command:
gcloud init
- Set a default region and zone.
Terraform
To use the Terraform samples on this page in a local development environment, install and initialize the gcloud CLI, and then set up Application Default Credentials with your user credentials.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
If you're using a local shell, then create local authentication credentials for your user account:
gcloud auth application-default login
You don't need to do this if you're using Cloud Shell.
For more information, see Set up authentication for a local development environment.
REST
To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.
Install the Google Cloud CLI, then initialize it by running the following command:
gcloud init
For more information, see Authenticate for using REST in the Google Cloud authentication documentation.
-
Create a Spot VM
Create a Spot VM using the Google Cloud console, gcloud CLI, or the Compute Engine API. A Spot VM is any VM that is configured to use the spot provisioning model:
- VM provisioning model set to Spot in the Google Cloud console
--provisioning-model=SPOT
in the gcloud CLI"provisioningModel": "SPOT"
in the Compute Engine API
Console
In the Google Cloud console, go to the Create an instance page.
Then, do the following:
- In the Availability policies section, select Spot from the VM provisioning model list. This setting disables automatic restart and host maintenance options for the VM and enables the termination action option.
- Optional: In the On VM termination list, select what happens
when Compute Engine preempts the VM:
- To stop the VM during preemption, select Stop (default).
- To delete the VM during preemption, select Delete.
Optional: Specify other VM options. For more information, see Creating and starting a VM instance.
To create and start the VM, click Create.
gcloud
To create a VM from the gcloud CLI, use the
gcloud compute instances create
command.
To create Spot VMs, you must include the
--provisioning-model=SPOT
flag. Optionally, you can also specify a
termination action for Spot VMs by also including the
--instance-termination-action
flag.
gcloud compute instances create VM_NAME \ --provisioning-model=SPOT \ --instance-termination-action=TERMINATION_ACTION
Replace the following:
VM_NAME
: name of the new VM.TERMINATION_ACTION
: Optional: specify which action to take when Compute Engine preempts the VM, eitherSTOP
(default behavior) orDELETE
.
For more information about the options you can specify when creating a VM, see Creating and starting a VM instance. For example, to create Spot VMs with a specified machine type and image, use the following command:
gcloud compute instances create VM_NAME \ --provisioning-model=SPOT \ [--image=IMAGE | --image-family=IMAGE_FAMILY] \ --image-project=IMAGE_PROJECT \ --machine-type=MACHINE_TYPE \ --instance-termination-action=TERMINATION_ACTION
Replace the following:
VM_NAME
: name of the new VM.IMAGE
: specify one of the following:IMAGE
: a specific version of a public image or the image family. For example, a specific image is--image=debian-10-buster-v20200309
.- An image family.
This creates the VM from the most recent, non-deprecated OS image.
For example, if you specify
--image-family=debian-10
, Compute Engine creates a VM from the latest version of the OS image in the Debian 10 image family.
IMAGE_PROJECT
: the project containing the image. For example, if you specifydebian-10
as the image family, specifydebian-cloud
as the image project.MACHINE_TYPE
: the predefined or custom, machine type for the new VM.TERMINATION_ACTION
: Optional: specify which action to take when Compute Engine preempts the VM, eitherSTOP
(default behavior) orDELETE
.To get a list of the machine types available in a zone, use the
gcloud compute machine-types list
command with the--zones
flag.
Terraform
You can use a Terraform resource to create a spot instance using scheduling block
REST
To create a VM from the Compute Engine API, use the
instances.insert
method.
You must specify a machine type and name for the VM. Optionally, you can
also specify an image for the boot disk.
To create Spot VMs, you must include the "provisioningModel": spot
field.
Optionally, you can also specify a termination action for Spot VMs by also
including the "instanceTerminationAction"
field.
POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID
/zones/ZONE
/instances { "machineType": "zones/ZONE/machineTypes/MACHINE_TYPE", "name": "VM_NAME
", "disks": [ { "initializeParams": { "sourceImage": "projects/IMAGE_PROJECT/global/images/IMAGE" }, "boot": true } ] "scheduling": { "provisioningModel": "SPOT", "instanceTerminationAction": "TERMINATION_ACTION" }, ... }
Replace the following:
PROJECT_ID
: the project id of the project to create the VM in.ZONE
: the zone to create the VM in. The zone must also support the machine type to use for the new VM.MACHINE_TYPE
: the predefined or custom, machine type for the new VM.VM_NAME
: the name of the new VM.IMAGE_PROJECT
: the project containing the image. For example, if you specifyfamily/debian-10
as the image family, specifydebian-cloud
as the image project.IMAGE
: specify one of the following:- A specific version of a public image. For example, a specific image is
"sourceImage": "projects/debian-cloud/global/images/debian-10-buster-v20200309"
wheredebian-cloud
is theIMAGE_PROJECT
. - An image family.
This creates the VM from the most recent, non-deprecated OS image.
For example, if you specify
"sourceImage": "projects/debian-cloud/global/images/family/debian-10"
wheredebian-cloud
is theIMAGE_PROJECT
, Compute Engine creates a VM from the latest version of the OS image in the Debian 10 image family.
- A specific version of a public image. For example, a specific image is
TERMINATION_ACTION
: Optional: specify which action to take when Compute Engine preempts the VM, eitherSTOP
(default behavior) orDELETE
.
For more information about the options you can specify when creating a VM, see Creating and starting a VM instance.
Go
Java
Python
To create multiple Spot VMs with the same properties, you can create an instance template, and use the template to create a managed instance group (MIG). For more information, see best practices.
Start Spot VMs
Like other VMs, Spot VMs start upon creation. Likewise, if
Spot VMs are stopped, you can
restart the VMs
to resume the RUNNING
state.
You can stop and restart preempted Spot VMs
as many times as you would like, as long as there is capacity.
For more information, see
VM instance life cycle.
If Compute Engine stops one or more Spot VMs in an autoscaling managed instance group (MIG) or Google Kubernetes Engine (GKE) cluster, the group restarts the VMs when the resources become available again.
Identify a VM's provisioning model and termination action
Identify a VM's provisioning model to see if it is a standard VM, Spot VM, or preemptible VM. For a Spot VM, you can also identify the termination action. You can identify a VM's provisioning model and termination action using the Google Cloud console, gcloud CLI, or the Compute Engine API.
Console
Go to the VM instances page.
Click the Name of the VM you want to identify. The VM instance details page opens.
Go to the Management section at the bottom of the page. In the Availability policies subsection, check the following options:
- If the VM provisioning model is set to Spot, the VM is a
Spot VM.
- On VM termination indicates which action to take when Compute Engine preempts the VM, either Stop or Delete the VM.
- Otherwise, if the VM provisioning model is set to Standard
or —:
- If the Preemptibility option is set to On, the VM is a preemptible VM.
- Otherwise, the VM is a standard VM.
- If the VM provisioning model is set to Spot, the VM is a
Spot VM.
gcloud
To describe a VM from the gcloud CLI, use the
gcloud compute instances describe
command:
gcloud compute instances describe VM_NAME
where VM_NAME
is the
name of the VM
that you want to check.
In the output, check the scheduling
field to identify the VM:
If the output includes the
provisioningModel
field set toSPOT
, similar to the following, the VM is a Spot VM.... scheduling: ... provisioningModel: SPOT instanceTerminationAction: TERMINATION_ACTION ...
where
TERMINATION_ACTION
indicates which action to take when Compute Engine preempts the VM, either stop (STOP
) or delete (DELETE
) the VM. If theinstanceTerminationAction
field is missing, the default value isSTOP
.Otherwise, if the output includes the
provisioningModel
field set tostandard
or if the output omits theprovisioningModel
field:- If the output includes the
preemptible
field set totrue
, the VM is a preemptible VM. - Otherwise, the VM is a standard VM.
- If the output includes the
REST
To describe a VM from the Compute Engine API, use the
instances.get
method:
GET https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME
Replace the following:
PROJECT_ID
: the project id of the project that the VM is in.ZONE
: the zone where the VM is located.VM_NAME
: the name of the VM that you want to check.
In the output, check the scheduling
field to identify the VM:
If the output includes the
provisioningModel
field set toSPOT
, similar to the following, the VM is a Spot VM.{ ... "scheduling": { ... "provisioningModel": "SPOT", "instanceTerminationAction": "TERMINATION_ACTION" ... }, ... }
where
TERMINATION_ACTION
indicates which action to take when Compute Engine preempts the VM, either stop (STOP
) or delete (DELETE
) the VM. If theinstanceTerminationAction
field is missing, the default value isSTOP
.Otherwise, if the output includes the
provisioningModel
field set tostandard
or if the output omits theprovisioningModel
field:- If the output includes the
preemptible
field set totrue
, the VM is a preemptible VM. - Otherwise, the VM is a standard VM.
- If the output includes the
Go
Java
Python
Handle preemption with a shutdown script
When Compute Engine preempts a Spot VM, you can use a shutdown script to try to perform cleanup actions before the VM is preempted. For example, you can gracefully stop a running process and copy a checkpoint file to Cloud Storage. Notably, the maximum length of the shutdown period is shorter for a preemption notice than for a user-initiated shutdown. For more information about the shutdown period for a preemption notice, see Preemption process in the conceptual documentation for Spot VMs.
The following is an example of a shutdown script that you can add to a
running Spot VM or add while creating a new
Spot VM. This script runs when the VM starts to shut down,
before the operating system's normal kill
command stops all remaining
processes. After gracefully stopping the desired program, the script
performs a parallel upload of a checkpoint file to a Cloud Storage bucket.
#!/bin/bash MY_PROGRAM="PROGRAM_NAME" # For example, "apache2" or "nginx" MY_USER="LOCAL_USER" CHECKPOINT="/home/$MY_USER/checkpoint.out" BUCKET_NAME="BUCKET_NAME" # For example, "my-checkpoint-files" (without gs://) echo "Shutting down! Seeing if ${MY_PROGRAM} is running." # Find the newest copy of $MY_PROGRAM PID="$(pgrep -n "$MY_PROGRAM")" if [[ "$?" -ne 0 ]]; then echo "${MY_PROGRAM} not running, shutting down immediately." exit 0 fi echo "Sending SIGINT to $PID" kill -2 "$PID" # Portable waitpid equivalent while kill -0 "$PID"; do sleep 1 done echo "$PID is done, copying ${CHECKPOINT} to gs://${BUCKET_NAME} as ${MY_USER}" su "${MY_USER}" -c "gcloud storage cp $CHECKPOINT gs://${BUCKET_NAME}/" echo "Done uploading, shutting down."
This script assumes the following:
The VM was created with at least read/write access to Cloud Storage. For instructions about how to create a VM with the appropriate scopes, see the authentication documentation.
You have an existing Cloud Storage bucket and permission to write to it.
To add this script to a VM, configure the script to work with an application on your VM and add it to the VM's metadata.
Copy or download the shutdown script:
Copy the preceding shutdown script after replacing the following:
PROGRAM_NAME
is the name of the process or program you want to shut down. For example,apache2
ornginx
.LOCAL_USER
is the username you are logged into the virtual machine as.BUCKET_NAME
is the name of the Cloud Storage bucket where you want to save the program's checkpoint file. Note the bucket name does not start withgs://
in this case.
Download the shutdown script to your local workstation and then replace the following variables in the file:
[PROGRAM_NAME]
is the name of the process or program you want to shut down. For example,apache2
ornginx
.[LOCAL_USER]
is the username you are logged into the virtual machine as.[BUCKET_NAME]
is the name of the Cloud Storage bucket where you want to save the program's checkpoint file. Note the bucket name does not start withgs://
in this case.
Add the shutdown script to a new VM or an existing VM.
Detect preemption of Spot VMs
Determine if Spot VMs were preempted by Compute Engine using the Google Cloud console, gcloud CLI or the Compute Engine API.
Console
You can check if a VM was preempted by checking the system activity logs.
In the Google Cloud console, go to the Logs page.
Select your project and click Continue.
Add
compute.instances.preempted
to the filter by label or text search field.Optionally, you can also enter a VM name if you want to see preemption operations for a specific VM.
Press enter to apply the specified filters. The Google Cloud console updates the list of logs to show only the operations where a VM was preempted.
Select an operation in the list to see details about the VM that was preempted.
gcloud
Use the gcloud compute operations list
command
with a filter parameter to
get a list of preemption events in your project.
gcloud compute operations list \ --filter="operationType=compute.instances.preempted"
Optionally, you can use additional filter parameters to further scope the results. For example, to see preemption events only for instances within a managed instance group, use the following command:
gcloud compute operations list \ --filter="operationType=compute.instances.preempted AND targetLink:instances/BASE_INSTANCE_NAME"
where BASE_INSTANCE_NAME
is the base name
specified as a prefix for the names of all the VMs in this
managed instance group.
The output is similar to the following:
NAME TYPE TARGET HTTP_STATUS STATUS TIMESTAMP systemevent-xxxxxxxx compute.instances.preempted us-central1-f/instances/example-instance-xxx 200 DONE 2015-04-02T12:12:10.881-07:00
An operation type of compute.instances.preempted
indicates that the
VM instance was preempted. You can use the
gcloud compute operations describe
command
to get more information about a specific preemption operation.
gcloud compute operations describe SYSTEM_EVENT \ --zone=ZONE
Replace the following:
SYSTEM_EVENT
: the system event from the output of thegcloud compute operations list
command—for example,systemevent-xxxxxxxx
.ZONE
: the zone of the system event—for example,us-central1-f
.
The output is similar to the following:
... operationType: compute.instances.preempted progress: 100 selfLink: https://compute.googleapis.com/compute/v1/projects/my-project/zones/us-central1-f/operations/systemevent-xxxxxxxx startTime: '2015-04-02T12:12:10.881-07:00' status: DONE statusMessage: Instance was preempted. ...
REST
To get a list of recent system operations for a specific project and zone,
use the zoneOperations.get
method.
GET https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/operations
Replace the following:
PROJECT_ID
: a project id.ZONE
: a zone.
Optionally, to scope the response to show only preemption operations, you can add a filter to your API request:
operationType="compute.instances.preempted"
Alternatively, to see preemption operations
for a specific VM, add a targetLink
parameter to the filter:
operationType="compute.instances.preempted" AND targetLink="https://www.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME
Replace the following:
+ PROJECT_ID
: the
project id.
+ ZONE
: the zone.
+ VM_NAME
: the name of a specific VM in this
zone and project.
The response contains a list of recent operations. For example, a preemption looks similar to the following:
{ "kind": "compute#operation", "id": "15041793718812375371", "name": "systemevent-xxxxxxxx", "zone": "https://www.googleapis.com/compute/v1/projects/my-project/zones/us-central1-f", "operationType": "compute.instances.preempted", "targetLink": "https://www.googleapis.com/compute/v1/projects/my-project/zones/us-central1-f/instances/example-instance", "targetId": "12820389800990687210", "status": "DONE", "statusMessage": "Instance was preempted.", ... }
Alternatively, you can determine if a VM was preempted from
inside the VM itself. This is useful if you want to handle a shutdown due to a
Compute Engine preemption differently from a normal
shutdown in a shutdown script. To do this, simply check
the metadata server for the preempted
value in your VM's
default metadata.
For example, use curl
from within your VM to obtain the value for
preempted
:
curl "http://metadata.google.internal/computeMetadata/v1/instance/preempted" -H "Metadata-Flavor: Google"
TRUE
If this value is TRUE
, the VM was preempted by Compute Engine,
otherwise it is FALSE
.
If you want to use this outside of a shutdown script, you can append
?wait_for_change=true
to the URL. This performs a hanging HTTP GET request
that only returns when the metadata has changed and the VM has been
preempted.
curl "http://metadata.google.internal/computeMetadata/v1/instance/preempted?wait_for_change=true" -H "Metadata-Flavor: Google"
TRUE
How to test preemption settings
You can run simulated maintenance events on your VMs to force them to preempt. Use this feature to test how your apps handle Spot VMs. Read Simulate a host maintenance event to learn how to test maintenance events on your instances.
You can also simulate a VM preemption by stopping the VM instance, which can be used instead of simulating a maintenance event and which avoids quota limits.
Best practices
Here are some best practices to help you get the most out of Spot VMs.
Use instance templates. Rather than creating Spot VMs one at a time, you can use instance templates to create multiple Spot VMs with the same properties. Instance templates are required for using MIGs. Alternatively, you can also create multiple Spot VMs using the bulk instance API.
Use MIGs to regionally distribute and automatically recreate Spot VMs. Use MIGs to make workloads on Spot VMs more flexible and resilient. For example, use regional MIGs to distribute VMs across multiple zones, which helps mitigate resource-availability errors. Additionally, use autohealing to automatically recreate Spot VMs after they are preempted.
Pick smaller machine types. Resources for Spot VMs come out of excess and backup Google Cloud capacity. Capacity for Spot VMs is often easier to get for smaller machine types, meaning machine types with less resources like vCPUs and memory. You might find more capacity for Spot VMs by selecting a smaller custom machine type, but capacity is even more likely for smaller predefined machine types. For example, compared to capacity for the
n2-standard-32
predefined machine type, capacity for then2-custom-24-96
custom machine type is more likely, but capacity for then2-standard-16
predefined machine type is even more likely.Run large clusters of Spot VMs during off peak times. The load on Google Cloud data centers varies with location and time of day, but generally lowest on nights and weekends. As such, nights and weekends are the best times to run large clusters of Spot VMs.
Design your applications to be fault and preemption tolerant. It's important to be prepared for the fact that there are changes in preemption patterns at different points in time. For example, if a zone suffers a partial outage, large numbers of Spot VMs could be preempted to make room for standard VMs that need to be moved as part of the recovery. In that small window of time, the preemption rate would look very different than on any other day. If your application assumes that preemptions are always done in small groups, you might not be prepared for such an event.
Retry creating Spot VMs that have been preempted. If your Spot VMs have been preempted, try creating new Spot VMs once or twice before falling back to standard VMs. Depending on your requirements, it might be a good idea to combine standard VMs and Spot VMs in your clusters to ensure that work proceeds at an adequate pace.
Use shutdown scripts. Manage shutdown and preemption notices with a shutdown script that can save a job's progress so that it can pick up where it left off, rather than start over from scratch.
What's next?
- Connect to your VM instance.
- Learn about shutdown scripts.
- Learn about limiting the runtime of a VM.
- Learn about instance templates.
- Learn about MIGs.