REST Resource: projects.locations.nodes

Resource: Node

A TPU instance.

JSON representation
{
  "name": string,
  "description": string,
  "acceleratorType": string,
  "ipAddress": string,
  "port": string,
  "state": enum (State),
  "healthDescription": string,
  "tensorflowVersion": string,
  "network": string,
  "cidrBlock": string,
  "serviceAccount": string,
  "createTime": string,
  "schedulingConfig": {
    object (SchedulingConfig)
  },
  "networkEndpoints": [
    {
      object (NetworkEndpoint)
    }
  ],
  "health": enum (Health),
  "labels": {
    string: string,
    ...
  },
  "useServiceNetworking": boolean,
  "apiVersion": enum (ApiVersion),
  "symptoms": [
    {
      object (Symptom)
    }
  ]
}
Fields
name

string

Output only. Immutable. The name of the TPU

description

string

The user-supplied description of the TPU. Maximum of 512 characters.

acceleratorType

string

Required. The type of hardware accelerators associated with this node.

ipAddress
(deprecated)

string

Output only. DEPRECATED! Use networkEndpoints instead. The network address for the TPU Node as visible to Compute Engine instances.

port
(deprecated)

string

Output only. DEPRECATED! Use networkEndpoints instead. The network port for the TPU Node as visible to Compute Engine instances.

state

enum (State)

Output only. The current state for the TPU Node.

healthDescription

string

Output only. If this field is populated, it contains a description of why the TPU Node is unhealthy.

tensorflowVersion

string

Required. The version of Tensorflow running in the Node.

network

string

The name of a network they wish to peer the TPU node to. It must be a preexisting Compute Engine network inside of the project on which this API has been activated. If none is provided, "default" will be used.

cidrBlock

string

The CIDR block that the TPU node will use when selecting an IP address. This CIDR block must be a /29 block; the Compute Engine networks API forbids a smaller block, and using a larger block would be wasteful (a node can only consume one IP address). Errors will occur if the CIDR block has already been used for a currently existing TPU node, the CIDR block conflicts with any subnetworks in the user's provided network, or the provided network is peered with another network that is using that CIDR block.

serviceAccount

string

Output only. The service account used to run the tensor flow services within the node. To share resources, including Google Cloud Storage data, with the Tensorflow job running in the Node, this account must have permissions to that data.

createTime

string (Timestamp format)

Output only. The time when the node was created.

A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z".

schedulingConfig

object (SchedulingConfig)

The scheduling options for this node.

networkEndpoints[]

object (NetworkEndpoint)

Output only. The network endpoints where TPU workers can be accessed and sent work. It is recommended that Tensorflow clients of the node reach out to the 0th entry in this map first.

health

enum (Health)

The health status of the TPU node.

labels

map (key: string, value: string)

Resource labels to represent user-provided metadata.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

useServiceNetworking

boolean

Whether the VPC peering for the node is set up through Service Networking API. The VPC Peering should be set up before provisioning the node. If this field is set, cidrBlock field should not be specified. If the network, that you want to peer the TPU Node to, is Shared VPC networks, the node must be created with this this field enabled.

apiVersion

enum (ApiVersion)

Output only. The API version that created this Node.

symptoms[]

object (Symptom)

Output only. The Symptoms that have occurred to the TPU Node.

State

Represents the different states of a TPU node during its lifecycle.

Enums
STATE_UNSPECIFIED TPU node state is not known/set.
CREATING TPU node is being created.
READY TPU node has been created.
RESTARTING TPU node is restarting.
REIMAGING TPU node is undergoing reimaging.
DELETING TPU node is being deleted.
REPAIRING TPU node is being repaired and may be unusable. Details can be found in the help_description field.
STOPPED TPU node is stopped.
STOPPING TPU node is currently stopping.
STARTING TPU node is currently starting.
PREEMPTED TPU node has been preempted. Only applies to Preemptible TPU Nodes.
TERMINATED TPU node has been terminated due to maintenance or has reached the end of its life cycle (for preemptible nodes).
HIDING TPU node is currently hiding.
HIDDEN TPU node has been hidden.
UNHIDING TPU node is currently unhiding.
UNKNOWN TPU node has unknown state after a failed repair.

SchedulingConfig

Sets the scheduling options for this node.

JSON representation
{
  "preemptible": boolean,
  "reserved": boolean
}
Fields
preemptible

boolean

Defines whether the node is preemptible.

reserved

boolean

Whether the node is created under a reservation.

NetworkEndpoint

A network endpoint over which a TPU worker can be reached.

JSON representation
{
  "ipAddress": string,
  "port": integer
}
Fields
ipAddress

string

The IP address of this network endpoint.

port

integer

The port of this network endpoint.

Health

Health defines the status of a TPU node as reported by Health Monitor.

Enums
HEALTH_UNSPECIFIED Health status is unknown: not initialized or failed to retrieve.
HEALTHY The resource is healthy.
DEPRECATED_UNHEALTHY The resource is unhealthy.
TIMEOUT The resource is unresponsive.
UNHEALTHY_TENSORFLOW The in-guest ML stack is unhealthy.
UNHEALTHY_MAINTENANCE The node is under maintenance/priority boost caused rescheduling and will resume running once rescheduled.

ApiVersion

TPU API Version.

Enums
API_VERSION_UNSPECIFIED API version is unknown.
V1_ALPHA1 TPU API V1Alpha1 version.
V1 TPU API V1 version.
V2_ALPHA1 TPU API V2Alpha1 version.

Symptom

A Symptom instance.

JSON representation
{
  "createTime": string,
  "symptomType": enum (SymptomType),
  "details": string,
  "workerId": string
}
Fields
createTime

string (Timestamp format)

Timestamp when the Symptom is created.

A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z".

symptomType

enum (SymptomType)

Type of the Symptom.

details

string

Detailed information of the current Symptom.

workerId

string

A string used to uniquely distinguish a worker within a TPU node.

SymptomType

SymptomType represents the different types of Symptoms that a TPU can be at.

Enums
SYMPTOM_TYPE_UNSPECIFIED Unspecified symptom.
LOW_MEMORY TPU VM memory is low.
OUT_OF_MEMORY TPU runtime is out of memory.
EXECUTE_TIMED_OUT TPU runtime execution has timed out.
MESH_BUILD_FAIL TPU runtime fails to construct a mesh that recognizes each TPU device's neighbors.
HBM_OUT_OF_MEMORY TPU HBM is out of memory.
PROJECT_ABUSE Abusive behaviors have been identified on the current project.

Methods

create

Creates a node.

delete

Deletes a node.

get

Gets the details of a node.

list

Lists nodes.

reimage

Reimages a node's OS.

start

Starts a node.

stop

Stops a node, this operation is only available with single TPU nodes.