管理 TPU 资源

本页介绍了如何使用 Create Node API 创建、列出、停止、启动、删除 Cloud TPU 以及连接到 Cloud TPU。当您使用 Google Cloud CLI 运行 gcloud compute tpus tpu-vm create 命令以及使用 Google Cloud 控制台创建 TPU 时,系统会调用 Create Node API。使用 Create Node API 时,系统会立即处理您的请求。如果没有足够的容量来满足您的请求,则请求将失败。

最佳实践是使用队列化资源(而非 Create Node API)创建 TPU。在您请求已排队的资源时,请求会被添加到由 Cloud TPU 服务维护的队列中。请求的资源可用后,系统会将其分配给您的 Google Cloud 项目,供您立即独家使用。如需了解详情,请参阅管理队列中的资源

使用多切片时,您必须使用已排队的资源。如需了解详情,请参阅多 Slice 简介

如果您想使用 Google Kubernetes Engine (GKE) 管理 TPU 资源,则必须先创建一个 GKE 集群。然后,您需要向集群添加包含 TPU slice 的节点池。如需了解详情,请参阅 GKE 中的 TPU 简介

前提条件

在运行这些步骤之前,您必须安装 Google Cloud CLI、创建 Google Cloud 项目并启用 Cloud TPU API。如需查看相关说明,请参阅设置 Cloud TPU 环境

如果您使用的是 Google Cloud CLI,则可以使用 Cloud Shell、Compute Engine 虚拟机或本地计算机运行命令。借助 Cloud Shell,您无需安装任何软件即可与 Cloud TPU 进行交互。Cloud Shell 会在一段时间不活动后断开连接。如果您运行的是长时间运行的命令,我们建议您在本地机器上安装 Google Cloud CLI。如需详细了解 Google Cloud CLI,请参阅 gcloud 参考文档

使用 Create Node API 创建 Cloud TPU

您可以使用 gcloud、Google Cloud 控制台或 Cloud TPU API 创建 Cloud TPU。

创建 Cloud TPU 时,您必须指定 TPU 虚拟机映像(也称为 TPU 软件版本)。如需确定应使用哪个虚拟机映像,请参阅 TPU 虚拟机映像

您还需要按 TensorCore 或 TPU 芯片指定 TPU 配置。如需了解详情,请参阅系统架构中与您所用的 TPU 版本对应的部分。

gcloud

如需使用 Create Node API 创建 TPU,请使用 gcloud compute tpus tpu-vm create 命令。如需配置特定的内部或外部 IP 地址,请参阅外部和内部 IP 地址中的说明。

以下命令使用 v4-8 TPU 配置:

$ gcloud compute tpus tpu-vm create tpu-name \
  --zone=us-central2-b \
  --accelerator-type=v4-8 \
  --version=tpu-software-version

命令标志说明

zone
拟在其中创建 Cloud TPU 的可用区
accelerator-type
加速器类型用于指定您要创建的 Cloud TPU 的版本和大小。 如需详细了解每个 TPU 版本支持的加速器类型,请参阅 TPU 版本
version
TPU 软件版本
shielded-secure-boot(可选)
指定创建 TPU 实例时启用了安全启动。 这会隐式地将它们转换为安全强化型虚拟机实例。 请参阅什么是安全强化型虚拟机?

以下命令会创建具有特定拓扑的 TPU:

$ gcloud compute tpus tpu-vm create tpu-name \
  --zone=us-central2-b \
  --type=v4 \
  --topology=2x2x1 \
  --version=tpu-software-version

必需的标志

tpu-name
您正在创建的 TPU 虚拟机的名称。
zone
您要在其中创建 Cloud TPU 的可用区
type
您要使用的 TPU 版本。如需了解详情,请参阅 TPU 版本
topology
TPU 芯片的物理排列方式,用于指定每个维度中的芯片数量。如需详细了解每个 TPU 版本支持的拓扑,请参阅 TPU 版本
version
您要使用的 TPU 软件版本。如需了解详情,请参阅 TPU 软件版本

控制台

  1. 在 Google Cloud 控制台中,前往 TPU 页面:

    前往 TPU

  2. 点击创建 TPU

  3. 名称字段中,为 TPU 输入名称。

  4. 在“可用区”框中,选择要在其中创建 TPU 的可用区。

  5. TPU 类型框中,选择加速器类型。加速器类型用于指定您要创建的 Cloud TPU 的版本和大小。如需详细了解每个 TPU 版本支持的加速器类型,请参阅 TPU 版本

  6. TPU 软件版本框中,选择软件版本。创建 Cloud TPU 虚拟机时,TPU 软件版本指定了要安装的 TPU 运行时的版本。如需了解详情,请参阅 TPU 虚拟机映像

  7. 点击创建以创建资源。

curl

以下命令使用 curl 创建 TPU。

$ curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" -d "{accelerator_type: 'v4-8', \
runtime_version:'tpu-vm-tf-2.18.0-pjrt', \
network_config: {enable_external_ips: true}, \
shielded_instance_config: { enable_secure_boot: true }}" \
https://tpu.googleapis.com/v2/projects/project-id/locations/us-central2-b/nodes?node_id=node_name

必填字段

runtime_version
您要使用的 Cloud TPU 运行时版本。
project
已注册的 Google Cloud 项目的名称。
zone
您要在其中创建 Cloud TPU 的可用区
node_name
您要创建的 TPU 虚拟机的名称。

Java

如需向 Cloud TPU 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

import com.google.api.gax.longrunning.OperationTimedPollAlgorithm;
import com.google.api.gax.retrying.RetrySettings;
import com.google.cloud.tpu.v2.CreateNodeRequest;
import com.google.cloud.tpu.v2.Node;
import com.google.cloud.tpu.v2.TpuClient;
import com.google.cloud.tpu.v2.TpuSettings;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import org.threeten.bp.Duration;

public class CreateTpuVm {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    // Project ID or project number of the Google Cloud project you want to create a node.
    String projectId = "YOUR_PROJECT_ID";
    // The zone in which to create the TPU.
    // For more information about supported TPU types for specific zones,
    // see https://cloud.google.com/tpu/docs/regions-zones
    String zone = "europe-west4-a";
    // The name for your TPU.
    String nodeName = "YOUR_TPU_NAME";
    // The accelerator type that specifies the version and size of the Cloud TPU you want to create.
    // For more information about supported accelerator types for each TPU version,
    // see https://cloud.google.com/tpu/docs/system-architecture-tpu-vm#versions.
    String tpuType = "v2-8";
    // Software version that specifies the version of the TPU runtime to install.
    // For more information see https://cloud.google.com/tpu/docs/runtimes
    String tpuSoftwareVersion = "tpu-vm-tf-2.14.1";

    createTpuVm(projectId, zone, nodeName, tpuType, tpuSoftwareVersion);
  }

  // Creates a TPU VM with the specified name, zone, accelerator type, and version.
  public static Node createTpuVm(
      String projectId, String zone, String nodeName, String tpuType, String tpuSoftwareVersion)
      throws IOException, ExecutionException, InterruptedException {
    // With these settings the client library handles the Operation's polling mechanism
    // and prevent CancellationException error
    TpuSettings.Builder clientSettings =
        TpuSettings.newBuilder();
    clientSettings
        .createNodeOperationSettings()
        .setPollingAlgorithm(
            OperationTimedPollAlgorithm.create(
                RetrySettings.newBuilder()
                    .setInitialRetryDelay(Duration.ofMillis(5000L))
                    .setRetryDelayMultiplier(1.5)
                    .setMaxRetryDelay(Duration.ofMillis(45000L))
                    .setInitialRpcTimeout(Duration.ZERO)
                    .setRpcTimeoutMultiplier(1.0)
                    .setMaxRpcTimeout(Duration.ZERO)
                    .setTotalTimeout(Duration.ofHours(24L))
                    .build()));

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (TpuClient tpuClient = TpuClient.create(clientSettings.build())) {
      String parent = String.format("projects/%s/locations/%s", projectId, zone);

      Node tpuVm = Node.newBuilder()
              .setName(nodeName)
              .setAcceleratorType(tpuType)
              .setRuntimeVersion(tpuSoftwareVersion)
              .build();

      CreateNodeRequest request = CreateNodeRequest.newBuilder()
              .setParent(parent)
              .setNodeId(nodeName)
              .setNode(tpuVm)
              .build();

      return tpuClient.createNodeAsync(request).get();
    }
  }
}

Node.js

如需向 Cloud TPU 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

// Import the TPUClient
// TODO(developer): Uncomment below line before running the sample.
// const {TpuClient} = require('@google-cloud/tpu').v2;
const {Node, NetworkConfig} =
  require('@google-cloud/tpu').protos.google.cloud.tpu.v2;

// Instantiate a tpuClient
// TODO(developer): Uncomment below line before running the sample.
// tpuClient = new TpuClient();

// TODO(developer): Update below line before running the sample.
// Project ID or project number of the Google Cloud project you want to create a node.
const projectId = await tpuClient.getProjectId();

// The name of the network you want the TPU node to connect to. The network should be assigned to your project.
const networkName = 'compute-tpu-network';

// The region of the network, that you want the TPU node to connect to.
const region = 'europe-west4';

// The name for your TPU.
const nodeName = 'node-name-1';

// The zone in which to create the TPU.
// For more information about supported TPU types for specific zones,
// see https://cloud.google.com/tpu/docs/regions-zones
const zone = 'europe-west4-a';

// The accelerator type that specifies the version and size of the Cloud TPU you want to create.
// For more information about supported accelerator types for each TPU version,
// see https://cloud.google.com/tpu/docs/system-architecture-tpu-vm#versions.
const tpuType = 'v2-8';

// Software version that specifies the version of the TPU runtime to install. For more information,
// see https://cloud.google.com/tpu/docs/runtimes
const tpuSoftwareVersion = 'tpu-vm-tf-2.14.1';

async function callCreateTpuVM() {
  // Create a node
  const node = new Node({
    name: nodeName,
    zone,
    acceleratorType: tpuType,
    runtimeVersion: tpuSoftwareVersion,
    // Define network
    networkConfig: new NetworkConfig({
      enableExternalIps: true,
      network: `projects/${projectId}/global/networks/${networkName}`,
      subnetwork: `projects/${projectId}/regions/${region}/subnetworks/${networkName}`,
    }),
  });

  const parent = `projects/${projectId}/locations/${zone}`;
  const request = {parent, node, nodeId: nodeName};

  const [operation] = await tpuClient.createNode(request);

  // Wait for the create operation to complete.
  const [response] = await operation.promise();

  console.log(`TPU VM: ${nodeName} created.`);
  return response;
}
return await callCreateTpuVM();

Python

如需向 Cloud TPU 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

from google.cloud import tpu_v2

# TODO(developer): Update and un-comment below lines
# project_id = "your-project-id"
# zone = "us-central1-b"
# tpu_name = "tpu-name"
# tpu_type = "v2-8"
# runtime_version = "tpu-vm-tf-2.17.0-pjrt"

# Create a TPU node
node = tpu_v2.Node()
node.accelerator_type = tpu_type
# To see available runtime version use command:
# gcloud compute tpus versions list --zone={ZONE}
node.runtime_version = runtime_version

request = tpu_v2.CreateNodeRequest(
    parent=f"projects/{project_id}/locations/{zone}",
    node_id=tpu_name,
    node=node,
)

# Create a TPU client
client = tpu_v2.TpuClient()
operation = client.create_node(request=request)
print("Waiting for operation to complete...")

response = operation.result()
print(response)
# Example response:
# name: "projects/[project_id]/locations/[zone]/nodes/my-tpu"
# accelerator_type: "v2-8"
# state: READY
# ...

运行启动脚本

gcloud

您可以在创建 TPU 虚拟机时指定 --metadata startup-script 标志,以便在每个 TPU 虚拟机上运行启动脚本。以下命令会使用启动脚本创建 TPU 虚拟机。

$ gcloud compute tpus tpu-vm create tpu-name \
  --zone=us-central2-b \
  --accelerator-type=tpu-type \
  --version=tpu-vm-tf-2.18.0-pjrt \
  --metadata startup-script='#! /bin/bash
     pip3 install numpy
     EOF'

Java

如需向 Cloud TPU 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

import com.google.cloud.tpu.v2.CreateNodeRequest;
import com.google.cloud.tpu.v2.Node;
import com.google.cloud.tpu.v2.TpuClient;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.ExecutionException;

public class CreateTpuVmWithStartupScript {
  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    // Project ID or project number of the Google Cloud project you want to create a node.
    String projectId = "YOUR_PROJECT_ID";
    // The zone in which to create the TPU.
    // For more information about supported TPU types for specific zones,
    // see https://cloud.google.com/tpu/docs/regions-zones
    String zone = "us-central1-f";
    // The name for your TPU.
    String nodeName = "YOUR_TPU_NAME";
    // The accelerator type that specifies the version and size of the Cloud TPU you want to create.
    // For more information about supported accelerator types for each TPU version,
    // see https://cloud.google.com/tpu/docs/system-architecture-tpu-vm#versions.
    String acceleratorType = "v2-8";
    // Software version that specifies the version of the TPU runtime to install.
    // For more information, see https://cloud.google.com/tpu/docs/runtimes
    String tpuSoftwareVersion = "tpu-vm-tf-2.14.1";

    createTpuVmWithStartupScript(projectId, zone, nodeName, acceleratorType, tpuSoftwareVersion);
  }

  // Create a TPU VM with a startup script.
  public static Node createTpuVmWithStartupScript(String projectId, String zone,
      String nodeName, String acceleratorType, String tpuSoftwareVersion)
      throws IOException, ExecutionException, InterruptedException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (TpuClient tpuClient = TpuClient.create()) {
      String parent = String.format("projects/%s/locations/%s", projectId, zone);

      String startupScriptContent = "#!/bin/bash\necho \"Hello from the startup script!\"";
      // Add startup script to metadata
      Map<String, String> metadata = new HashMap<>();
      metadata.put("startup-script", startupScriptContent);

      Node tpuVm =
          Node.newBuilder()
             .setName(nodeName)
             .setAcceleratorType(acceleratorType)
             .setRuntimeVersion(tpuSoftwareVersion)
             .putAllMetadata(metadata)
             .build();

      CreateNodeRequest request =
          CreateNodeRequest.newBuilder()
             .setParent(parent)
             .setNodeId(nodeName)
             .setNode(tpuVm)
             .build();

      return tpuClient.createNodeAsync(request).get();
    }
  }
}

Node.js

如需向 Cloud TPU 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

// Import the TPUClient
// TODO(developer): Uncomment below line before running the sample.
// const {TpuClient} = require('@google-cloud/tpu').v2;

const {Node, NetworkConfig} =
  require('@google-cloud/tpu').protos.google.cloud.tpu.v2;

// Instantiate a tpuClient
// TODO(developer): Uncomment below line before running the sample.
// tpuClient = new TpuClient();

// TODO(developer): Update these variables before running the sample.
// Project ID or project number of the Google Cloud project you want to create a node.
const projectId = await tpuClient.getProjectId();

// The name of the network you want the TPU node to connect to. The network should be assigned to your project.
const networkName = 'compute-tpu-network';

// The region of the network, that you want the TPU node to connect to.
const region = 'europe-west4';

// The name for your TPU.
const nodeName = 'node-name-1';

// The zone in which to create the TPU.
// For more information about supported TPU types for specific zones,
// see https://cloud.google.com/tpu/docs/regions-zones
const zone = 'europe-west4-a';

// The accelerator type that specifies the version and size of the Cloud TPU you want to create.
// For more information about supported accelerator types for each TPU version,
// see https://cloud.google.com/tpu/docs/system-architecture-tpu-vm#versions.
const tpuType = 'v2-8';

// Software version that specifies the version of the TPU runtime to install. For more information,
// see https://cloud.google.com/tpu/docs/runtimes
const tpuSoftwareVersion = 'tpu-vm-tf-2.17.0-pod-pjrt';

async function callCreateTpuVMStartupScript() {
  // Create a node
  const node = new Node({
    name: nodeName,
    zone,
    acceleratorType: tpuType,
    runtimeVersion: tpuSoftwareVersion,
    // Define network
    networkConfig: new NetworkConfig({
      enableExternalIps: true,
      network: `projects/${projectId}/global/networks/${networkName}`,
      subnetwork: `projects/${projectId}/regions/${region}/subnetworks/${networkName}`,
    }),
    metadata: {
      // The script updates numpy to the latest version and logs the output to a file.
      'startup-script': `#!/bin/bash
        echo "Hello World" > /var/log/hello.log
        sudo pip3 install --upgrade numpy >> /var/log/hello.log 2>&1`,
    },
  });

  const parent = `projects/${projectId}/locations/${zone}`;
  const request = {parent, node, nodeId: nodeName};

  const [operation] = await tpuClient.createNode(request);

  // Wait for the create operation to complete.
  const [response] = await operation.promise();

  console.log(JSON.stringify(response));
  return response;
}
return await callCreateTpuVMStartupScript();

Python

如需向 Cloud TPU 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

from google.cloud import tpu_v2

# TODO(developer): Update and un-comment below lines
# project_id = "your-project-id"
# zone = "us-central1-b"
# tpu_name = "tpu-name"
# tpu_type = "v2-8"
# runtime_version = "tpu-vm-tf-2.17.0-pjrt"

node = tpu_v2.Node()
node.accelerator_type = tpu_type
node.runtime_version = runtime_version

# This startup script updates numpy to the latest version and logs the output to a file.
metadata = {
    "startup-script": """#!/bin/bash
echo "Hello World" > /var/log/hello.log
sudo pip3 install --upgrade numpy >> /var/log/hello.log 2>&1
"""
}

# Adding metadata with startup script to the TPU node.
node.metadata = metadata
# Enabling external IPs for internet access from the TPU node.
node.network_config = tpu_v2.NetworkConfig(enable_external_ips=True)

request = tpu_v2.CreateNodeRequest(
    parent=f"projects/{project_id}/locations/{zone}",
    node_id=tpu_name,
    node=node,
)

client = tpu_v2.TpuClient()
operation = client.create_node(request=request)
print("Waiting for operation to complete...")

response = operation.result()
print(response.metadata)
# Example response:
# {'startup-script': '#!/bin/bash\n    echo "Hello World" > /var/log/hello.log\n
# ...

连接到 Cloud TPU

gcloud

使用 SSH 连接到 Cloud TPU:

$ gcloud compute tpus tpu-vm ssh tpu-name --zone=zone

当您请求的切片大于单个主机时,Cloud TPU 会为每个主机创建一个 TPU 虚拟机。每个主机的 TPU 芯片数量取决于 TPU 版本

如需安装二进制文件或运行代码,请使用 tpu-vm ssh command 连接到每个 TPU 虚拟机。

$ gcloud compute tpus tpu-vm ssh tpu-name

如需使用 SSH 连接到特定 TPU 虚拟机,请使用 --worker 标志,后跟从 0 开始的索引:

$ gcloud compute tpus tpu-vm ssh tpu-name --worker=1

如需使用单个命令在所有 TPU 虚拟机上运行命令,请使用 --worker=all--command 标志:

$ gcloud compute tpus tpu-vm ssh tpu-name \
  --project=your_project_ID \
  --zone=zone \
  --worker=all \
  --command='pip install "jax[tpu]==0.4.20" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html'

对于多切片,您可以使用枚举的 TPU 名称在单个虚拟机上运行命令,其中包含每个切片前缀和附加到其后面的数字。如需在所有切片中的所有 TPU 虚拟机上运行命令,请使用 --node=all--worker=all--command 标志,以及可选的 --batch-size 标志。

$ gcloud compute tpus queued-resources ssh ${QUEUED_RESOURCE_ID} \
  --project=project_ID \
  --zone=zone \
  --node=all \
  --worker=all \
  --command='pip install "jax[tpu]==0.4.20" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html' \
  --batch-size=4

控制台

如需在 Google Cloud 控制台中连接到 TPU,请使用浏览器中的 SSH:

  1. 在 Google Cloud 控制台中,前往 TPU 页面:

    前往 TPU

  2. 在 TPU 虚拟机列表中,点击要连接的 TPU 虚拟机所在行中的 SSH

列出 Cloud TPU 资源

您可以列出指定可用区中的所有 Cloud TPU。

gcloud

$ gcloud compute tpus tpu-vm list --zone=zone

控制台

在 Google Cloud 控制台中,前往 TPU 页面:

前往 TPU

Java

如需向 Cloud TPU 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

import com.google.cloud.tpu.v2.ListNodesRequest;
import com.google.cloud.tpu.v2.TpuClient;
import java.io.IOException;

public class ListTpuVms {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    // Project ID or project number of the Google Cloud project you want to use.
    String projectId = "YOUR_PROJECT_ID";
    // The zone where the TPUs are located.
    // For more information about supported TPU types for specific zones,
    // see https://cloud.google.com/tpu/docs/regions-zones
    String zone = "us-central1-f";

    listTpuVms(projectId, zone);
  }

  // Lists TPU VMs in the specified zone.
  public static TpuClient.ListNodesPage listTpuVms(String projectId, String zone)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (TpuClient tpuClient = TpuClient.create()) {
      String parent = String.format("projects/%s/locations/%s", projectId, zone);

      ListNodesRequest request = ListNodesRequest.newBuilder().setParent(parent).build();

      return tpuClient.listNodes(request).getPage();
    }
  }
}

Node.js

如需向 Cloud TPU 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

// Import the TPUClient
// TODO(developer): Uncomment below line before running the sample.
// const {TpuClient} = require('@google-cloud/tpu').v2;

// Instantiate a tpuClient
// TODO(developer): Uncomment below line before running the sample.
// tpuClient = new TpuClient();

// TODO(developer): Update these variables before running the sample.
// Project ID or project number of the Google Cloud project you want to retrive a list of TPU nodes.
const projectId = await tpuClient.getProjectId();

// The zone from which the TPUs are retrived.
const zone = 'europe-west4-a';

async function callTpuVMList() {
  const request = {
    parent: `projects/${projectId}/locations/${zone}`,
  };

  const [response] = await tpuClient.listNodes(request);

  return response;
}

return await callTpuVMList();

Python

如需向 Cloud TPU 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

from google.cloud import tpu_v2

# TODO(developer): Update and un-comment below lines
# project_id = "your-project-id"
# zone = "us-central1-b"

client = tpu_v2.TpuClient()

nodes = client.list_nodes(parent=f"projects/{project_id}/locations/{zone}")
for node in nodes:
    print(node.name)
    print(node.state)
    print(node.accelerator_type)
# Example response:
# projects/[project_id]/locations/[zone]/nodes/node-name
# State.READY
# v2-8

检索有关 Cloud TPU 的信息

您可以检索关于指定 Cloud TPU 的信息。

gcloud

$ gcloud compute tpus tpu-vm describe tpu-name \
  --zone=zone

控制台

  1. 在 Google Cloud 控制台中,前往 TPU 页面:

    前往 TPU

  2. 点击 Cloud TPU 的名称。控制台会显示 Cloud TPU 详情页面。

Java

如需向 Cloud TPU 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

import com.google.cloud.tpu.v2.GetNodeRequest;
import com.google.cloud.tpu.v2.Node;
import com.google.cloud.tpu.v2.NodeName;
import com.google.cloud.tpu.v2.TpuClient;
import java.io.IOException;

public class GetTpuVm {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    // Project ID or project number of the Google Cloud project you want to use.
    String projectId = "YOUR_PROJECT_ID";
    // The zone in which to create the TPU.
    // For more information about supported TPU types for specific zones,
    // see https://cloud.google.com/tpu/docs/regions-zones
    String zone = "europe-west4-a";
    // The name for your TPU.
    String nodeName = "YOUR_TPU_NAME";

    getTpuVm(projectId, zone, nodeName);
  }

  // Describes a TPU VM with the specified name in the given project and zone.
  public static Node getTpuVm(String projectId, String zone, String nodeName)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (TpuClient tpuClient = TpuClient.create()) {
      String name = NodeName.of(projectId, zone, nodeName).toString();

      GetNodeRequest request = GetNodeRequest.newBuilder().setName(name).build();

      return tpuClient.getNode(request);
    }
  }
}

Node.js

如需向 Cloud TPU 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

// Import the TPUClient
// TODO(developer): Uncomment below line before running the sample.
// const {TpuClient} = require('@google-cloud/tpu').v2;

// Instantiate a tpuClient
// TODO(developer): Uncomment below line before running the sample.
// tpuClient = new TpuClient();

// TODO(developer): Update these variables before running the sample.
// Project ID or project number of the Google Cloud project you want to retrive a node.
const projectId = await tpuClient.getProjectId();

// The name of TPU to retrive.
const nodeName = 'node-name-1';

// The zone, where the TPU is created.
const zone = 'europe-west4-a';

async function callGetTpuVM() {
  const request = {
    name: `projects/${projectId}/locations/${zone}/nodes/${nodeName}`,
  };

  const [response] = await tpuClient.getNode(request);

  console.log(`Node: ${nodeName} retrived.`);
  return response;
}

return await callGetTpuVM();

Python

如需向 Cloud TPU 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

from google.cloud import tpu_v2

# TODO(developer): Update and un-comment below lines
# project_id = "your-project-id"
# zone = "us-central1-b"
# tpu_name = "tpu-name"

client = tpu_v2.TpuClient()
node = client.get_node(
    name=f"projects/{project_id}/locations/{zone}/nodes/{tpu_name}"
)

print(node)
# Example response:
# name: "projects/[project_id]/locations/[zone]/nodes/tpu-name"
# state: "READY"
# runtime_version: ...

停用 Cloud TPU 资源

您可以停止单个 Cloud TPU 以停止产生费用,而不会丢失虚拟机的配置和软件。

gcloud

$ gcloud compute tpus tpu-vm stop tpu-name \
  --zone=zone

控制台

  1. 在 Google Cloud 控制台中,前往 TPU 页面:

    前往 TPU

  2. 选中 Cloud TPU 旁边的复选框。

  3. 点击 停止

Java

如需向 Cloud TPU 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

import com.google.cloud.tpu.v2.Node;
import com.google.cloud.tpu.v2.NodeName;
import com.google.cloud.tpu.v2.StopNodeRequest;
import com.google.cloud.tpu.v2.TpuClient;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

public class StopTpuVm {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    // Project ID or project number of the Google Cloud project you want to use.
    String projectId = "YOUR_PROJECT_ID";
    // The zone where the TPU is located.
    // For more information about supported TPU types for specific zones,
    // see https://cloud.google.com/tpu/docs/regions-zones
    String zone = "us-central1-f";
    // The name for your TPU.
    String nodeName = "YOUR_TPU_NAME";

    stopTpuVm(projectId, zone, nodeName);
  }

  // Stops a TPU VM with the specified name in the given project and zone.
  public static Node stopTpuVm(String projectId, String zone, String nodeName)
      throws IOException, ExecutionException, InterruptedException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (TpuClient tpuClient = TpuClient.create()) {
      String name = NodeName.of(projectId, zone, nodeName).toString();

      StopNodeRequest request = StopNodeRequest.newBuilder().setName(name).build();

      return tpuClient.stopNodeAsync(request).get();
    }
  }
}

Node.js

如需向 Cloud TPU 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

// Import the TPUClient
// TODO(developer): Uncomment below line before running the sample.
// const {TpuClient} = require('@google-cloud/tpu').v2;

// Instantiate a tpuClient
// TODO(developer): Uncomment below line before running the sample.
// tpuClient = new TpuClient();

// TODO(developer): Update these variables before running the sample.
// Project ID or project number of the Google Cloud project you want to stop a node.
const projectId = await tpuClient.getProjectId();

// The name of TPU to stop.
const nodeName = 'node-name-1';

// The zone, where the TPU is created.
const zone = 'europe-west4-a';

async function callStopTpuVM() {
  const request = {
    name: `projects/${projectId}/locations/${zone}/nodes/${nodeName}`,
  };

  const [operation] = await tpuClient.stopNode(request);
  // Wait for the operation to complete.
  const [response] = await operation.promise();

  console.log(`Node: ${nodeName} stopped.`);
  return response;
}

return await callStopTpuVM();

Python

如需向 Cloud TPU 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

from google.cloud import tpu_v2

# TODO(developer): Update and un-comment below lines
# project_id = "your-project-id"
# zone = "us-central1-b"
# tpu_name = "tpu-name"

client = tpu_v2.TpuClient()

request = tpu_v2.StopNodeRequest(
    name=f"projects/{project_id}/locations/{zone}/nodes/{tpu_name}",
)
try:
    operation = client.stop_node(request=request)
    print("Waiting for stop operation to complete...")
    response = operation.result()
    print(f"This TPU {tpu_name} has been stopped")
    print(response.state)
    # Example response:
    # State.STOPPED

    return response
except Exception as e:
    print(e)

启动 Cloud TPU 资源

您可以在 Cloud TPU 停止后启动它。

gcloud

$ gcloud compute tpus tpu-vm start tpu-name \
  --zone=zone

控制台

  1. 在 Google Cloud 控制台中,前往 TPU 页面:

    前往 TPU

  2. 选中 Cloud TPU 旁边的复选框。

  3. 点击 开始

Java

如需向 Cloud TPU 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

import com.google.cloud.tpu.v2.Node;
import com.google.cloud.tpu.v2.NodeName;
import com.google.cloud.tpu.v2.StartNodeRequest;
import com.google.cloud.tpu.v2.TpuClient;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

public class StartTpuVm {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    // Project ID or project number of the Google Cloud project you want to use.
    String projectId = "YOUR_PROJECT_ID";
    // The zone where the TPU is located.
    // For more information about supported TPU types for specific zones,
    // see https://cloud.google.com/tpu/docs/regions-zones
    String zone = "us-central1-f";
    // The name for your TPU.
    String nodeName = "YOUR_TPU_NAME";

    startTpuVm(projectId, zone, nodeName);
  }

  // Starts a TPU VM with the specified name in the given project and zone.
  public static Node startTpuVm(String projectId, String zone, String nodeName)
      throws IOException, ExecutionException, InterruptedException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (TpuClient tpuClient = TpuClient.create()) {
      String name = NodeName.of(projectId, zone, nodeName).toString();

      StartNodeRequest request = StartNodeRequest.newBuilder().setName(name).build();

      return tpuClient.startNodeAsync(request).get();
    }
  }
}

Node.js

如需向 Cloud TPU 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

// Import the TPUClient
// TODO(developer): Uncomment below line before running the sample.
// const {TpuClient} = require('@google-cloud/tpu').v2;

// Instantiate a tpuClient
// TODO(developer): Uncomment below line before running the sample.
// tpuClient = new TpuClient();

// TODO(developer): Update these variables before running the sample.
// Project ID or project number of the Google Cloud project you want to start a node.
const projectId = await tpuClient.getProjectId();

// The name of TPU to start.
const nodeName = 'node-name-1';

// The zone, where the TPU is created.
const zone = 'europe-west4-a';

async function callStartTpuVM() {
  const request = {
    name: `projects/${projectId}/locations/${zone}/nodes/${nodeName}`,
  };

  const [operation] = await tpuClient.startNode(request);

  // Wait for the operation to complete.
  const [response] = await operation.promise();

  console.log(`Node: ${nodeName} started.`);
  return response;
}

return await callStartTpuVM();

Python

如需向 Cloud TPU 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

from google.cloud import tpu_v2

# TODO(developer): Update and un-comment below lines
# project_id = "your-project-id"
# zone = "us-central1-b"
# tpu_name = "tpu-name"

client = tpu_v2.TpuClient()

request = tpu_v2.StartNodeRequest(
    name=f"projects/{project_id}/locations/{zone}/nodes/{tpu_name}",
)
try:
    operation = client.start_node(request=request)
    print("Waiting for start operation to complete...")
    response = operation.result()
    print(f"TPU {tpu_name} has been started")
    print(response.state)
    # Example response:
    # State.READY

    return response
except Exception as e:
    print(e)

删除 Cloud TPU

在会话结束时删除您的 TPU VM slice。

gcloud

$ gcloud compute tpus tpu-vm delete tpu-name \
  --project=project-id \
  --zone=zone \
  --quiet

命令标志说明

zone
拟在其中删除 Cloud TPU 的可用区

控制台

  1. 在 Google Cloud 控制台中,前往 TPU 页面:

    前往 TPU

  2. 选中 Cloud TPU 旁边的复选框。

  3. 点击 删除

Java

如需向 Cloud TPU 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

import com.google.api.gax.longrunning.OperationTimedPollAlgorithm;
import com.google.api.gax.retrying.RetrySettings;
import com.google.cloud.tpu.v2.DeleteNodeRequest;
import com.google.cloud.tpu.v2.NodeName;
import com.google.cloud.tpu.v2.TpuClient;
import com.google.cloud.tpu.v2.TpuSettings;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import org.threeten.bp.Duration;

public class DeleteTpuVm {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    // Project ID or project number of the Google Cloud project you want to create a node.
    String projectId = "YOUR_PROJECT_ID";
    // The zone in which to create the TPU.
    // For more information about supported TPU types for specific zones,
    // see https://cloud.google.com/tpu/docs/regions-zones
    String zone = "europe-west4-a";
    // The name for your TPU.
    String nodeName = "YOUR_TPU_NAME";

    deleteTpuVm(projectId, zone, nodeName);
  }

  // Deletes a TPU VM with the specified name in the given project and zone.
  public static void deleteTpuVm(String projectId, String zone, String nodeName)
      throws IOException, ExecutionException, InterruptedException {
    // With these settings the client library handles the Operation's polling mechanism
    // and prevent CancellationException error
    TpuSettings.Builder clientSettings =
        TpuSettings.newBuilder();
    clientSettings
        .deleteNodeOperationSettings()
        .setPollingAlgorithm(
            OperationTimedPollAlgorithm.create(
                RetrySettings.newBuilder()
                    .setInitialRetryDelay(Duration.ofMillis(5000L))
                    .setRetryDelayMultiplier(1.5)
                    .setMaxRetryDelay(Duration.ofMillis(45000L))
                    .setInitialRpcTimeout(Duration.ZERO)
                    .setRpcTimeoutMultiplier(1.0)
                    .setMaxRpcTimeout(Duration.ZERO)
                    .setTotalTimeout(Duration.ofHours(24L))
                    .build()));

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (TpuClient tpuClient = TpuClient.create(clientSettings.build())) {
      String name = NodeName.of(projectId, zone, nodeName).toString();

      DeleteNodeRequest request = DeleteNodeRequest.newBuilder().setName(name).build();

      tpuClient.deleteNodeAsync(request).get();
      System.out.println("TPU VM deleted");
    }
  }
}

Node.js

如需向 Cloud TPU 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

// Import the TPUClient
// TODO(developer): Uncomment below line before running the sample.
// const {TpuClient} = require('@google-cloud/tpu').v2;

// Instantiate a tpuClient
// TODO(developer): Uncomment below line before running the sample.
// tpuClient = new TpuClient();

// TODO(developer): Update these variables before running the sample.
// Project ID or project number of the Google Cloud project you want to delete a node.
const projectId = await tpuClient.getProjectId();

// The name of TPU to delete.
const nodeName = 'node-name-1';

// The zone, where the TPU is created.
const zone = 'europe-west4-a';

async function callDeleteTpuVM() {
  const request = {
    name: `projects/${projectId}/locations/${zone}/nodes/${nodeName}`,
  };

  const [operation] = await tpuClient.deleteNode(request);

  // Wait for the delete operation to complete.
  const [response] = await operation.promise();

  console.log(`Node: ${nodeName} deleted.`);
  return response;
}

return await callDeleteTpuVM();

Python

如需向 Cloud TPU 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

from google.cloud import tpu_v2

# TODO(developer): Update and un-comment below lines
# project_id = "your-project-id"
# zone = "us-central1-b"
# tpu_name = "tpu-name"

client = tpu_v2.TpuClient()
try:
    client.delete_node(
        name=f"projects/{project_id}/locations/{zone}/nodes/{tpu_name}"
    )
    print("The TPU node was deleted.")
except Exception as e:
    print(e)