Queued resources user guide

Queued resources enable you to request Cloud TPU resources in a queued manner. When you request queued resources, the request is added to a queue maintained by the Cloud TPU service. When the requested resource becomes available, it's assigned to your Google Cloud project for your immediate exclusive use. It will remain assigned to your project unless you delete it or it's preempted. Only preemptible TPUs are eligible for preemption.

You can specify an optional start and end time in a queued resource request. The start time specifies the earliest time in which to fill the request. If a request has not been filled by the specified end time, the request expires. The request remains in the queue after it has expired.

Queued resource requests can be in one the following states:

WAITING_FOR_RESOURCES
The request has passed initial validation and has been added to the queue. It remains in this state until there are sufficient free resources to begin provisioning your request or the allocation interval elapses. When demand is high, not all requests can be immediately provisioned. If you need more reliable obtainability of TPUs, consider purchasing a reservation.
PROVISIONING
The request has been selected from the queue and its resources are currently being allocated.
ACTIVE
The request has been allocated. When queued resource requests are in the ACTIVE state, you can manage your TPU VMs as described in Manage TPUs.
FAILED
The request couldn't be completed, either because there is a problem with the request or the requested resources were not available within the allocation interval. The request remains in the queue until it is explicitly deleted.
SUSPENDING
The resources associated with the request are currently being deleted.
SUSPENDED
The resources specified in the request have been deleted. When a request is in the SUSPENDED state, it's no longer eligible for further allocation.

Prerequisites

Before running the commands in this guide, make sure you:

Request an on-demand queued resource

You can request an on-demand queued resource using the gcloud compute tpus queued-resources create command. For more information about on-demand resources, see Quota types.

gcloud

gcloud alpha compute tpus queued-resources create your-queued-resource-id \
--node-id your-node-id \
--project your-project \
--zone us-central2-b \
--accelerator-type v4-8 \
--runtime-version tpu-vm-tf-2.16.1-pjrt

curl

curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d "{
'tpu': {
  'node_spec': {
    'parent': 'projects/your-project-number/locations/us-central2-b',
    'node_id': 'your-node-id',
    'node': {
      'accelerator_type': 'v4-8',
      'runtime_version': 'tpu-vm-tf-2.16.1-pjrt',
    }
  }
}
}" \
https://tpu.googleapis.com/v2alpha1/projects/your-project-id/locations/us-central2-b/queuedResources?queued_resource_id=your-queued-resource-id

Command parameter descriptions

queued-resource-id
The user-assigned ID of the queued resource request.
node-id
The user-assigned ID of the TPU which is created when the queued resource request is allocated.
project
Your Google Cloud project.
zone
The zone where you plan to create your Cloud TPU.
accelerator-type
The accelerator type specifies the version and size of the Cloud TPU you want to create. For more information about supported accelerator types for each TPU version, see TPU versions.
runtime-version
The Cloud TPU software version.

Default slice sizes for on-demand queued resources

When you use on-demand quota, you must request a slice size less than the default limit for the accelerator type you are using. Requests that exceed the default limits are declined by the system.

The following table shows the TPU types and their associated default limits.

Accelerator type Default limit (in number of TensorCores)
v2 128
v3 128
v4 384
v5 32

If you require larger slice sizes, contact Cloud TPU support for additional information.

Request a queued resource using reserved quota

You can request a queued resource using reserved quota by specifying the --reserved flag in your gcloud command or guaranteed.reserved=true in your curl request. For more information about reserved quota, see Quota types.

gcloud

gcloud alpha compute tpus queued-resources create your-queued-resource-id \
--node-id your-node-id \
--project your-project \
--zone us-central2-b \
--accelerator-type v4-8 \
--runtime-version tpu-vm-tf-2.16.1-pjrt \
--reserved

curl

curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d "{
'tpu': {
  'node_spec': {
    'parent': 'projects/your-project-number/locations/us-central2-b',
    'node_id': 'your-node-id',
    'node': {
      'accelerator_type': 'v4-8',
      'runtime_version': 'tpu-vm-tf-2.16.1-pjrt',
    }
  }
},
'guaranteed': {
  'reserved': true,
}
}" \
https://tpu.googleapis.com/v2alpha1/projects/your-project-id/locations/us-central2-b/queuedResources?queued_resource_id=your-queued-resource-id

Command parameter descriptions

queued-resource-id
The user-assigned ID of the queued resource request.
node-id
The user-assigned ID of the TPU which is created when the queued resource request is allocated.
project
Your Google Cloud project.
zone
The zone where you plan to create your Cloud TPU.
accelerator-type
The accelerator type specifies the version and size of the Cloud TPU you want to create. For more information about supported accelerator types for each TPU version, see TPU versions.
runtime-version
The Cloud TPU software version.
reserved
Use this flag when requesting queued resources as part of a Cloud TPU reservation.

Request a preemptible queued resource

You can request a preemptible queued resource. A preemptible resource is a resource that may be assigned to another workload if extra resources are needed by other workloads. Preemptible resources cost less and you may get access to resources sooner compared to a non-preemptible request. For more information about preemptible quota, see Quota types.

gcloud

gcloud alpha compute tpus queued-resources create your-queued-resource-id \
--node-id your-node-id \
--project your-project-id \
--zone us-central2-b \
--accelerator-type v4-8 \
--runtime-version tpu-vm-tf-2.16.1-pjrt \
--best-effort

curl

curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d "{
'tpu': {
  'node_spec': {
    'parent': 'projects/your-project-number/locations/us-central2-b',
    'node_id': 'your-node-id',
    'node': {
      'accelerator_type': 'v4-8',
      'runtime_version': 'tpu-vm-tf-2.16.1-pjrt',
    }
  }
},
'best_effort': {}
}" \
https://tpu.googleapis.com/v2alpha1/projects/your-project-id/locations/us-central2-b/queuedResources?queued_resource_id=your-queued-resource-id

Command parameter descriptions

queued-resource-request-id
The user-assigned ID of the queued resource request.
node-id
The user-defined ID of the TPU created in response to the request.
project
The ID of the project where the queued resource is allocated.
zone
The zone where you plan to create your Cloud TPU.
accelerator-type
The accelerator type specifies the version and size of the Cloud TPU you want to create. For more information about supported accelerator types for each TPU version, see TPU versions.
runtime-version
The Cloud TPU software version.
best-effort
A boolean flag specifying that the queued resource is preemptible.

Request a queued resource to be allocated before or after a specified time

You can specify an optional start time, end time, start duration, or end duration in a queued resource request. The start time or start duration specifies the earliest time in which to fill the request. If a request has not been filled by the specified end time or within the specified duration, the request expires. After the request has expired, it remains in the queue but is no longer eligible for allocation.

You can also specify an allocation interval by specifying a start time or duration and an end time or duration.

See Datetime for a list of supported timestamp and duration formats.

Request a queued resource after a specified duration

You can specify a duration after which a resource should be allocated using the --valid-after-duration flag. The following example requests a v4-32 to be allocated after six hours.

gcloud

gcloud alpha compute tpus queued-resources create your-queued-resource-id \
--node-id your-node-id \
--project your-project-id \
--zone us-central2-b \
--accelerator-type v4-32 \
--runtime-version tpu-vm-tf-2.16.1-pod-pjrt \
--valid-after-duration 6h

curl

curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d "{
'tpu': {
  'node_spec': {
    'parent': 'projects/your-project-number/locations/us-central2-b',
    'node_id': 'your-node-id',
    'node': {
      'accelerator_type': 'v4-32',
      'runtime_version': 'tpu-vm-tf-2.16.1-pod-pjrt',
    }
  }
},
'queueing_policy': {
  'valid_after_duration': {
    'seconds': 21600
  }
}" \
https://tpu.googleapis.com/v2alpha1/projects/your-project-id/locations/us-central2-b/queuedResources?queued_resource_id=your-queued-resource-id

Command parameter descriptions

queued-resource-request-id
The user-assigned ID of the queued resource request.
node-id
The user-defined ID of the TPU created in response to the request.
project
The Google Cloud project where the queued resource is allocated.
zone
The zone where you plan to create your Cloud TPU.
accelerator-type
The accelerator type specifies the version and size of the Cloud TPU you want to create. For more information about supported accelerator types for each TPU version, see TPU versions.
runtime-version
The Cloud TPU software version.
valid-after-duration
The duration before which the TPU must not be provisioned. For more information on duration formats, see Google Cloud CLI topic datetime

Request a queued resource that expires after a specified duration

You can specify how long a queued resource request remains valid using the --valid-until-duration flag. The following example requests a v4-32 that expires if not filled in six hours.

gcloud

gcloud alpha compute tpus queued-resources create your-queued-resource-id \
--node-id your-node-id \
--project your-project-id \
--zone us-central2-b \
--accelerator-type v4-32 \
--runtime-version tpu-vm-tf-2.16.1-pod-pjrt \
--valid-until-duration 6h

curl

curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d "{
'tpu': {
  'node_spec': {
    'parent': 'projects/your-project-number/locations/us-central2-b',
    'node_id': 'your-node-id',
    'node': {
      'accelerator_type': 'v4-32',
      'runtime_version': 'tpu-vm-tf-2.16.1-pod-pjrt',
    }
  }
},
'queueing_policy': {
  'valid_until_duration': {
    'seconds': 21600
  }
}" \
https://tpu.googleapis.com/v2alpha1/projects/your-project-id/locations/us-central2-b/queuedResources?queued_resource_id=your-queued-resource-id

Command parameter descriptions

queued-resource-request-id
The user-assigned ID of the queued resource request.
node-id
The user-defined ID of the TPU created in response to the request.
project
The Google Cloud project where the queued resource is allocated.
zone
The zone where you plan to create your Cloud TPU.
accelerator-type
The accelerator type specifies the version and size of the Cloud TPU you want to create. For more information about supported accelerator types for each TPU version, see TPU versions.
runtime-version
The Cloud TPU software version.
valid-until-duration
The duration for which the request is valid. For more information on duration formats, see Google Cloud CLI topic datetime

Request a queued resource after a specified time

You can specify a time after which a resource should be allocated using the --valid-after-time flag.

The following command requests a v4-4096 TPU with runtime version tpu-vm-tf-2.16.1-pjrt to be allocated after 9AM on December 14, 2022.

gcloud

gcloud alpha compute tpus queued-resources create your-queued-resource-id \
--node-id your-node-id \
--project your-project-id \
--zone us-central2-b \
--accelerator-type v4-4096 \
--runtime-version tpu-vm-tf-2.16.1-pod-pjrt \
--valid-after-time 2022-12-14T09:00:00Z

curl

curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d "{
'tpu': {
  'node_spec': {
    'parent': 'projects/your-project-number/locations/us-central2-b',
    'node_id': 'your-node-id',
    'node': {
      'accelerator_type': 'v4-4096',
      'runtime_version': 'tpu-vm-tf-2.16.1-pod-pjrt',
    }
  }
},
'queueing_policy': {
  'valid_after_time': {
    'seconds': 2022-12-14T09:00:00Z
  }
}
}" \
https://tpu.googleapis.com/v2alpha1/projects/your-project-id/locations/us-central2-b/queuedResources?queued_resource_id=your-queued-resource-id

Command parameter descriptions

queued-resource-request-id
The user-assigned ID of the queued resource request.
node-id
The user-defined ID of the TPU created in response to the request.
project
The Google Cloud project where the queued resource is allocated.
zone
The zone where you plan to create your Cloud TPU.
accelerator-type
The accelerator type specifies the version and size of the Cloud TPU you want to create. For more information about supported accelerator types for each TPU version, see TPU versions.
runtime-version
The Cloud TPU software version.
valid-after-time
The time, after which, the resource should be allocated For more information on duration formats, see Google Cloud CLI topic datetime.

Request a queued resource before a specified time

You can specify a time before which the resource should be allocated using the --valid-until-time flag.

The following command requests a v4-4096 TPU node with runtime version tpu-vm-tf-2.10.0-pod to be created no later than December 14, 2022 at 9:00 AM.

gcloud

gcloud alpha compute tpus queued-resources create your-queued-resource-id \
--node-id your-node-id \
--project your-project-id \
--zone us-central2-b \
--accelerator-type v4-4096 \
--runtime-version tpu-vm-tf-2.16.1-pod-pjrt \
--valid-until-time 2022-12-14T09:00:00Z

curl

curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d "{
'tpu': {
  'node_spec': {
    'parent': 'projects/your-project-number/locations/us-central2-b',
    'node_id': 'your-node-id',
    'node': {
      'accelerator_type': 'v4-4096',
      'runtime_version': 'tpu-vm-tf-2.16.1-pod-pjrt',
    }
  }
},
'queueing_policy': {
  'valid_until_time': {
    'seconds': 1655197200
  }
}
}" \
https://tpu.googleapis.com/v2alpha1/projects/your-project-id/locations/us-central2-b/queuedResources?queued_resource_id=your-queued-resource-id

Command parameter descriptions

queued-resource-request-id
The user-assigned ID of the queued resource request.
node-id
The user-defined ID of the TPU created in response to the request.
project
The ID of the project where the queued resource is allocated.
zone
The zone where you plan to create your Cloud TPU.
accelerator-type
The accelerator type specifies the version and size of the Cloud TPU you want to create. For more information about supported accelerator types for each TPU version, see TPU versions.
runtime-version
The Cloud TPU software version.
valid-until-time
The time after which the request is canceled. For more information on duration formats, see Google Cloud CLI topic datetime.

Request a queued resource to be allocated within a specified interval

You can specify an allocation interval using any pair of the --valid-after-time, --valid-after-duration, --valid-until-duration, and --valid-until-time flags, provided one flag specifies the beginning of the allocation interval and the other specifies the end of the allocation interval.

The following command requests a v4-32 in 5 hours and 30 minutes from the current time, to be created no later than December 14, 2022 at 9:00 AM.

gcloud

gcloud alpha compute tpus queued-resources create your-queued-resource-id \
--node-id your-node-id \
--project your-project-id \
--zone us-central2-b \
--accelerator-type v4-32 \
--runtime-version tpu-vm-tf-2.16.1-pod-pjrt \
--valid-after-duration 5h30m \
--valid-until-time 2022-12-14T09:00:00Z

curl

curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d "{
'tpu': {
  'node_spec': {
    'parent': 'projects/your-project-number/locations/us-central2-b',
    'node_id': 'your-node-id',
    'node': {
      'accelerator_type': 'v4-32',
      'runtime_version': 'tpu-vm-tf-2.16.1-pod-pjrt',
    }
  }
},
'queueing_policy': {
  'validInterval': {
    'startTime': '2022-12-10T14:30:00Z',
    'endTime': '2022-12-14T09:00:00Z'
  }
},
}" \
https://tpu.googleapis.com/v2alpha1/projects/your-project-id/locations/us-central2-b/queuedResources?queued_resource_id=your-queued-resource-id

Command flag descriptions

queued-resource-request-id
The user-assigned ID of the queued resource request.
node-id
The user-defined ID of the TPU created in response to the request.
project
The ID of the project where the queued resource is allocated.
zone
The zone where you plan to create your Cloud TPU.
accelerator-type
The accelerator type specifies the version and size of the Cloud TPU you want to create. For more information about supported accelerator types for each TPU version, see TPU versions.
runtime-version
The Cloud TPU software version.
valid-until-timw
The time after which the request is canceled. For more information on duration formats, see Google Cloud CLI topic datetime.
valid-until-duration
The duration for which the request is valid. For more information on duration formats, see Google Cloud CLI topic datetime.

Request a queued resource with a startup script

You can specify a script to be run on a queued resource after it has been provisioned. When using the gcloud command, you can use either the --metadata or --metadata-from-file flag to specify a script command or a file containing the script code, respectively. When using curl, you must include the script code in the JSON content. The following example creates a queued resource request that will run the script contained in startup-script.sh. The curl example shows an inline script in the JSON body.

gcloud

gcloud alpha compute tpus queued-resources create your-queued-resource-id \
--node-id your-node-id \
--project your-project \
--zone us-central2-b \
--accelerator-type v4-8 \
--runtime-version tpu-vm-tf-2.12.0 \
--reserved \
--metadata-from-file='startup-script=startup-script.sh'

curl

curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d "{
tpu: {
    node_spec: {
      parent: 'projects/your-project-number/locations/us-central2-b',
      node_id: 'your-node-id',
      node: {
          accelerator_type: 'v2-8',
          runtime_version: 'tpu-vm-tf-2.16.1-pjrt',
          metadata: {
              "startup-script": "#! /bin/bash\npwd > /tmp/out.txt\nwhoami >> /tmp/out.txt"
          }
      }
    }
},
'queueing_policy': {
  'validInterval': {
    'startTime': '2022-12-10T14:30:00Z',
    'endTime': '2022-12-14T09:00:00Z'
  }
},
}" \
https://tpu.googleapis.com/v2alpha1/projects/your-project-id/locations/us-central2-b/queuedResources?queued_resource_id=your-queued-resource-id

Command flag descriptions

queued-resource-request-id
The user-assigned ID of the queued resource request.
node-id
The user-defined ID of the TPU created in response to the request.
project
The ID of the project where the queued resource is allocated.
zone
The zone where you plan to create your Cloud TPU.
accelerator-type
The accelerator type specifies the version and size of the Cloud TPU you want to create. For more information about supported accelerator types for each TPU version, see TPU versions.
runtime-version
The Cloud TPU software version.
validInterval
The time during which the request is valid after which the request is canceled. For more information on duration formats, see Google Cloud CLI topic datetime.
metadata-from-file
Specifies a file that contains metadata. If you don't specify a fully qualified path to the metadata file, the command assumes it is located in the current directory. In this example the file contains a startup script that is run when the queued resource is provisioned.
metadata
Specifies metadata for the request. In this example the metadata is a startup script command run when the queued resource is provisioned.

Request a queued resources with a specified network and subnetwork

You can request a queued resource specifying the network and subnetwork that you want to connect your TPU to.

gcloud

gcloud alpha compute tpus queued-resources create your-queued-resource-id \
--node-id your-node-id \
--project your-project \
--zone us-central2-b \
--accelerator-type v4-8 \
--runtime-version tpu-vm-tf-2.16.1-pjrt \
--network network-name \
--subnetwork subnetwork-name

curl

curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d "{
'tpu': {
  'node_spec': {
    'parent': 'projects/your-project-number/locations/us-central2-b',
    'node_id': 'your-node-id',
    'node': {
      'accelerator_type': 'v4-8',
      'runtime_version': 'tpu-vm-tf-2.16.1-pjrt',
       'network_config': {
        'network': 'network-name',
        'subnetwork': 'subnetwork-name',
        'enable_external_ips': true
    }
  }
},
'guaranteed': {
  'reserved': true,
}
}" \
https://tpu.googleapis.com/v2alpha1/projects/your-project-id/locations/us-central2-b/queuedResources?queued_resource_id=your-queued-resource-id

Command parameter descriptions

queued-resource-id
The user-assigned ID of the queued resource request.
node-id
The user-assigned ID of the TPU which is created when the queued resource request is allocated.
project
Your Google Cloud project.
zone
The zone where you plan to create your Cloud TPU.
accelerator-type
The accelerator type specifies the version and size of the Cloud TPU you want to create. For more information about supported accelerator types for each TPU version, see TPU versions.
runtime-version
The Cloud TPU software version.
reserved
Use this flag when requesting queued resources as part of a Cloud TPU reservation.
network
A network that the queued resource will be a part of.
subnetwork
A subnetwork that the queued resource will be a part of.

Delete a queued resource request

You can delete a queued resource request and the TPU VM created by the request by passing the --force flag to the queued-resource delete command. Otherwise, you must delete the TPU VM before deleting the queued resource request. When you delete the TPU VM, the queued resource request transitions to the SUSPENDED state, after which the queued resource request may be deleted.

The following commands delete the queued resource request named "my-queued-resource" in the "my-project" project in zone "us-central2-b". It uses the --force flag to delete both the TPU VM and the queued resource request.

gcloud

gcloud compute tpus queued-resources delete my-queued-resource \
--project my-project \
--zone us-central2-b \
--force \
--async

curl

curl -X DELETE -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://tpu.googleapis.com/v2/projects/my-project/locations/us-central2-b/queuedResources/my-queued-resource?force=true

Command flag descriptions

queued-resource-request-id
The user-assigned ID of the queued resource request.
project
The Google Cloud project where the queued resource is allocated.
zone
The zone of the Cloud TPU to delete.
force
Delete both the TPU VM and the queued resource request.

The following commands delete the queued resource request named "my-queued-resource" in the "my-project" project in zone "us-central2-b".

gcloud

gcloud compute tpus queued-resources delete your-queued-resource-id \
--project your-project-id \
--zone us-central2-b

curl

curl -X DELETE -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://tpu.googleapis.com/v2/projects/your-project-id/locations/us-central2-b/queuedResources/your-queued-resource-id

Command flag descriptions

queued-resource-request-id
The user-assigned ID of the queued resource request.
project
The Google Cloud project where the queued resource is allocated.
zone
The zone where you plan to create your Cloud TPU.

Retrieve state and diagnostic information about a queued resource request

Retrieve the state and diagnostic information about a queued resource request:

gcloud

gcloud compute tpus queued-resources describe queued-resource-request-id \
--project your-project-id \
--zone us-central2-b

curl

curl -X GET -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://tpu.googleapis.com/v2/projects/your-project-id/locations/us-central2-b/queuedResources/your-queued-resource-id

Command flag descriptions

queued-resource-request-id
The user-assigned ID of the queued resource request.
project
The ID of the project where the queued resource is allocated.
zone
The zone where you plan to create your Cloud TPU.

If the request fails, the response will contain error information. For a request that is waiting for resources, the output will look similar to the following:

name: projects/your-project-id/locations/us-central2-b/queuedResources/your-queued-resource-id
state:
  state: WAITING_FOR_RESOURCES
tpu:
  nodeSpec:
  - node:
      acceleratorType: v4-8
      bootDisk: {}
      networkConfig:
        enableExternalIps: true
      queuedResource: projects/your-project-number/locations/us-central2-b/queuedResources/your-queued-resource-id
      runtimeVersion: tpu-vm-tf-2.10.0
      schedulingConfig: {}
      serviceAccount: {}
      shieldedInstanceConfig: {}
      useTpuVm: true
    nodeId: your-node-id
    parent: projects/your-project-number/locations/us-central2-b

List queued resource requests in your project

The following command lists the queued resource requests in project "your-project-id":

gcloud

gcloud compute tpus queued-resources list --project your-project-id \
--zone us-central2-b

curl

curl -X GET -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://tpu.googleapis.com/v2/projects/your-project-id/locations/us-central2-b/queuedResources

Command flag descriptions

project
The Google Cloud project where the queued resource is allocated.
zone
The zone where you plan to create your Cloud TPU.