Create and use preemptible VMs

This page explains how to create and use a preemptible virtual machine (VM) instance. Preemptible VMs are available at a 60-91% discount compared to the price of standard VMs. However, Compute Engine might stop (preempt) these VMs if it needs to reclaim those resources for other tasks. Preemptible VMs always stop after 24 hours. Preemptible VMs are recommended only for fault-tolerant applications that can withstand VM preemption. Make sure your application can handle preemptions before you decide to create a preemptible VM. To understand the risks and value of preemptible VMs, read the preemptible VM instances documentation.

Before you begin

Creating a preemptible VM

Create a preemptible VM using the gcloud CLI or the Compute Engine API. To use the Google Cloud console, create a Spot VM instead.

gcloud

With gcloud compute, use the same instances create command that you would use to create a normal VM, but add the --preemptible flag.

gcloud compute instances create [VM_NAME] --preemptible

where [VM_NAME] is the name of the VM.

API

In the API, construct a normal request to create a VM, but include the preemptible property under scheduling and set it to true. For example:

POST https://compute.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/instances

{
  'machineType': 'zones/[ZONE]/machineTypes/[MACHINE_TYPE]',
  'name': '[INSTANCE_NAME]',
  'scheduling':
  {
    'preemptible': true
  },
  ...
}

Preemptible CPU quotas

Preemptible VMs require available CPU quotas like standard VMs. To avoid preemptible VMs consuming the CPU quotas for your standard VMs, you can request a special "Preemptible CPU" quota. After Compute Engine grants you preemptible CPU quota in that region, all preemptible VMs count against that quota, and all standard VMs continue to count against the standard CPU quota.

In regions where you don't have preemptible CPU quota, you can use standard CPU quota to launch preemptible VMs. You also need sufficient IP and disk quota, as usual. Preemptible CPU quota is not visible in the gcloud CLI or Google Cloud console quota pages unless Compute Engine has granted the quota.

For more information about quotas, visit the Resource Quotas page.

Starting a preempted VM

Like any other VM, if a preemptible VM is stopped or preempted, you can start the VM again and bring it back to the RUNNING state. Starting a preemptible VM resets the 24-hour counter but as it is still a preemptible VM, Compute Engine can preempt before 24 hours. It isn't possible to convert a preemptible VM to a standard VM while it's running.

If Compute Engine stops a preemptible VM in an autoscaling managed instance group (MIG) or Google Kubernetes Engine (GKE) cluster, the group restarts the VM when the resources become available again.

Handling preemption with a shutdown script

When your VM is preempted, you can use a shutdown script to perform cleanup actions before the VM stops. For example, you can gracefully stop a running process and copy a checkpoint file to Cloud Storage.

The following is a shutdown script that you can add to a running preemptible VM or add to a new preemptible VM when you create it. This script runs when the VM starts to shut down, before the operating system's normal kill command stops all remaining processes. After gracefully stopping the desired program, the script performs a parallel upload of a checkpoint file to a Cloud Storage bucket.

#!/bin/bash

MY_PROGRAM="[PROGRAM_NAME]" # For example, "apache2" or "nginx"
MY_USER="[LOCAL_USERNAME]"
CHECKPOINT="/home/$MY_USER/checkpoint.out"
GSUTIL_OPTS="-m -o GSUtil:parallel_composite_upload_threshold=32M"
BUCKET_NAME="[BUCKET_NAME]" # For example, "my-checkpoint-files" (without gs://)

echo "Shutting down!  Seeing if ${MY_PROGRAM} is running."

# Find the newest copy of $MY_PROGRAM
PID="$(pgrep -n "$MY_PROGRAM")"

if [[ "$?" -ne 0 ]]; then
  echo "${MY_PROGRAM} not running, shutting down immediately."
  exit 0
fi

echo "Sending SIGINT to $PID"
kill -2 "$PID"

# Portable waitpid equivalent
while kill -0 "$PID"; do
   sleep 1
done

echo "$PID is done, copying ${CHECKPOINT} to gs://${BUCKET_NAME} as ${MY_USER}"

su "${MY_USER}" -c "gsutil $GSUTIL_OPTS cp $CHECKPOINT gs://${BUCKET_NAME}/"

echo "Done uploading, shutting down."

To add this script to a VM, configure the script to work with an application on your VM and add it to the VM's metadata.

  1. Copy or download the shutdown script to your local workstation.
  2. Open the file for edit and change the following variables:
    • [PROGRAM_NAME] is the name of the process or program you want to shut down. For example, apache2 or nginx.
    • [LOCAL_USER] is the username you are logged into the virtual machine as.
    • [BUCKET_NAME] is the name of the Cloud Storage bucket where you want to save the program's checkpoint file. Note the bucket name does not start with gs:// in this case.
  3. Save your changes.
  4. Add the shutdown script to a new VM or an existing VM.

This script assumes the following:

  • The VM was created with at least read/write access to Cloud Storage. See the authentication documentation for instructions about how to create a VM with the appropriate scopes.

  • You have an existing Cloud Storage bucket and permission to write to it.

Identifying preemptible VMs

To check if a VM is a preemptible VM, follow the steps to Identify a VM's provisioning model and termination action.

Detecting if a VM was preempted

Determine if a VM was preempted with the Google Cloud console, the gcloud CLI, or the API.

Console

You can check if an VM was preempted by checking the system activity logs.

  1. In the Google Cloud console, go to the Logs page.

    Go to Logs

  2. Select your project and click Continue.

  3. Add compute.instances.preempted to the filter by label or text search field.

  4. Optionally, you can also enter a VM name if you want to see preemption operations for a specific VM.

  5. Press enter to apply the specified filters. The Google Cloud console updates the list of logs to show only the operations where a VM was preempted.

  6. Select an operation in the list to see details about the VM that was preempted.

gcloud


Use the gcloud compute operations list command with a filter parameter to get a list of preemption events in your project.

gcloud compute operations list \
    --filter="operationType=compute.instances.preempted"

You can use the filter param to further scope the results. For example, to see preemption events only for VMs within a managed instance group:

gcloud compute operations list \
    --filter="operationType=compute.instances.preempted AND targetLink:instances/[BASE_VM_NAME]"

gcloud returns a response similar to:

NAME                  TYPE                         TARGET                                   HTTP_STATUS STATUS TIMESTAMP
systemevent-xxxxxxxx  compute.instances.preempted  us-central1-f/instances/example-vm-xxx  200         DONE   2015-04-02T12:12:10.881-07:00

An operation type of compute.instances.preempted indicates that the VM was preempted. You can use the operations describe command to get more information about a specific preemption operation.

gcloud compute operations describe \
    systemevent-xxxxxxxx

gcloud returns a response similar to:

...
operationType: compute.instances.preempted
progress: 100
selfLink: https://compute.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/us-central1-f/operations/systemevent-xxxxxxxx
startTime: '2015-04-02T12:12:10.881-07:00'
status: DONE
statusMessage: Instance was preempted.
...

API


To get a list of recent system operations, send a GET request to the URI of zone operations.

GET https://compute.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/operations

The response contains a list of recent operations.

{
  "kind": "compute#operation",
  "id": "15041793718812375371",
  "name": "systemevent-xxxxxxxx",
  "zone": "https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/us-central1-f",
  "operationType": "compute.instances.preempted",
  "targetLink": "https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/us-central1-f/instances/example-vm",
  "targetId": "12820389800990687210",
  "status": "DONE",
  "statusMessage": "Instance was preempted.",
  ...
}

To scope the response to show only preemption operations, you can add a filter to your API request: operationType="compute.instances.preempted". To see preemption operations for a specific VM, add a targetLink param to the filter: operationType="compute.instances.preempted" AND targetLink="https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/instances/[VM_NAME]".

Alternatively, you can determine if a VM was preempted from inside the VM itself. This is useful if you want to handle a shutdown due to a Compute Engine preemption differently from a normal shutdown in a shutdown script. To do this, simply check the metadata server for the preempted value in your VM's default instance metadata.

For example, use curl from within your VM to obtain the value for preempted:

curl "http://metadata.google.internal/computeMetadata/v1/instance/preempted" -H "Metadata-Flavor: Google"
TRUE

If this value is TRUE, the VM was preempted by Compute Engine, otherwise it is FALSE.

If you want to use this outside of a shutdown script, you can append ?wait_for_change=true to the URL. This performs a hanging HTTP GET request that only returns when the metadata has changed and the VM has been preempted.

curl "http://metadata.google.internal/computeMetadata/v1/instance/preempted?wait_for_change=true" -H "Metadata-Flavor: Google"
TRUE

Testing preemption settings

You can run simulated maintenance events on your VMs to force them to preempt. Use this feature to test how your apps handle preemptible VMs. Read testing your availability policies to learn how to test maintenance events on your VMs.

You can also simulate a VM's preemption by stopping the VM, which can be used instead of simulating a maintenance event and which avoids quota limits.

Best practices

Here are some best practices to help you get the most out of preemptible VM instances.

Using the bulk instance API

Rather than creating single VMs, you can use the bulk instance API.

Pick smaller machine shapes

Resources for preemptible VMs come out of excess and backup Google Cloud capacity. Capacity is often easier to get for smaller machine types, meaning machine types with less resources like vCPUs and memory. You might find more capacity for preemptible VMs by selecting a smaller custom machine type, but capacity is even more likely for smaller predefined machine types. For example, compared to capacity for the n2-standard-32 predefined machine type, capacity for the n2-custom-24-96 custom machine type is more likely, but capacity for the n2-standard-16 predefined machine type is even more likely.

Run large preemptible VM clusters during off peak times

The load on Google Cloud data centers varies with location and time of day, but generally lowest on nights and weekends. As such, nights and weekends are the best times to run large preemptible VM clusters.

Design your applications to be fault and preemption tolerant

It's important to be prepared for the fact that there are changes in preemption patterns at different points in time. For example, if a zone suffers a partial outage, large numbers of preemptible VMs could be preempted to make room for standard VMs that need to be moved as part of the recovery. In that small window of time, the preemption rate would look very different than on any other day. If your application assumes that preemptions are always done in small groups, you might not be prepared for such an event. You can test your application's behavior under a preemption event by stopping the VM instance.

Retry creating VMs that have been preempted

If your VM instance been preempted, try creating new preemptible VMs once or twice before falling back to standard VMs. Depending on your requirements, it might be a good idea to combine standard and preemptible VMs in your clusters to ensure that work proceeds at an adequate pace.

Use shutdown scripts

Manage shutdown and preemption notices with a shutdown script that can save a job's progress so that it can pick up where it left off, rather than start over from scratch.

What's next?