Se usó la API de Cloud Translation para traducir esta página.

Ejecuta código de TensorFlow en porciones de pod de TPU

En este documento, se muestra cómo realizar un cálculo con TensorFlow en un pod de TPU. Seguirás estos pasos:

Crea una porción de pod de TPU con software de TensorFlow
Conéctate a la VM de TPU mediante SSH
Crea y ejecuta una secuencia de comandos de ejemplo

La VM de TPU se basa en una cuenta de servicio para obtener permisos para llamar a la API de Cloud TPU. De forma predeterminada, tu VM de TPU usará la cuenta de servicio predeterminada de Compute Engine, que incluye todos los permisos de Cloud TPU necesarios. Si usas tu propia cuenta de servicio, debes agregar el rol de Visualizador de TPU a tu cuenta de servicio. Para obtener más información sobre los Google Cloud roles, consulta Información sobre los roles. Puedes especificar tu propia cuenta de servicio con la marca --service-account cuando creas tu VM de TPU.

Configura tu entorno

En Cloud Shell, ejecuta el siguiente comando para asegurarte de ejecutar la versión actual de gcloud:
```
$ gcloud components update
```
Si necesitas instalar gcloud, usa el siguiente comando:
```
$ sudo apt install -y google-cloud-sdk
```

Crea algunas variables de entorno:

$ export PROJECT_ID=project-id
$ export TPU_NAME=tpu-name
$ export ZONE=europe-west4-a
$ export RUNTIME_VERSION=tpu-vm-tf-2.18.0-pod-pjrt
$ export ACCELERATOR_TYPE=v3-32

Crea una porción de pod de TPU v3-32 con el entorno de ejecución de TensorFlow

$ gcloud compute tpus tpu-vm create ${TPU_NAME}} \
  --zone=${ZONE} \
  --accelerator-type=${ACCELERATOR_TYPE} \
  --version=${RUNTIME_VERSION}

Descripciones de las marcas de comandos

zone: Es la zona en la que deseas crear la Cloud TPU.
accelerator-type: El tipo de acelerador especifica la versión y el tamaño de la Cloud TPU que deseas crear. Para obtener más información sobre los tipos de aceleradores compatibles con cada versión de TPU, consulta Versiones de TPU.
version: La versión de software de Cloud TPU.

Conéctate a tu VM de Cloud TPU con SSH

$ gcloud compute tpus tpu-vm ssh ${TPU_NAME} \
      --zone=${ZONE}

Crea y ejecuta una secuencia de comandos de ejemplo

Configura las siguientes variables de entorno:

(vm)$ export TPU_NAME=tpu-name
(vm)$ export TPU_LOAD_LIBRARY=0

Crea un archivo llamado tpu-test.py en el directorio actual y copia y pega la siguiente secuencia de comandos en él.

import tensorflow as tf
print("Tensorflow version " + tf.__version__)

cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver()
print('Running on TPU ', cluster_resolver.cluster_spec().as_dict()['worker'])

tf.config.experimental_connect_to_cluster(cluster_resolver)
tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
strategy = tf.distribute.experimental.TPUStrategy(cluster_resolver)

@tf.function
def add_fn(x,y):
  z = x + y
  return z

x = tf.constant(1.)
y = tf.constant(1.)
z = strategy.run(add_fn, args=(x,y))
print(z)

Ejecuta esta secuencia de comandos con el siguiente comando:

(vm)$ python3 tpu-test.py

Esta secuencia de comandos realiza un cálculo en cada TensorCore de una porción de pod de TPU. El resultado se verá similar al siguiente código:

PerReplica:{
  0: tf.Tensor(2.0, shape=(), dtype=float32),
  1: tf.Tensor(2.0, shape=(), dtype=float32),
  2: tf.Tensor(2.0, shape=(), dtype=float32),
  3: tf.Tensor(2.0, shape=(), dtype=float32),
  4: tf.Tensor(2.0, shape=(), dtype=float32),
  5: tf.Tensor(2.0, shape=(), dtype=float32),
  6: tf.Tensor(2.0, shape=(), dtype=float32),
  7: tf.Tensor(2.0, shape=(), dtype=float32),
  8: tf.Tensor(2.0, shape=(), dtype=float32),
  9: tf.Tensor(2.0, shape=(), dtype=float32),
  10: tf.Tensor(2.0, shape=(), dtype=float32),
  11: tf.Tensor(2.0, shape=(), dtype=float32),
  12: tf.Tensor(2.0, shape=(), dtype=float32),
  13: tf.Tensor(2.0, shape=(), dtype=float32),
  14: tf.Tensor(2.0, shape=(), dtype=float32),
  15: tf.Tensor(2.0, shape=(), dtype=float32),
  16: tf.Tensor(2.0, shape=(), dtype=float32),
  17: tf.Tensor(2.0, shape=(), dtype=float32),
  18: tf.Tensor(2.0, shape=(), dtype=float32),
  19: tf.Tensor(2.0, shape=(), dtype=float32),
  20: tf.Tensor(2.0, shape=(), dtype=float32),
  21: tf.Tensor(2.0, shape=(), dtype=float32),
  22: tf.Tensor(2.0, shape=(), dtype=float32),
  23: tf.Tensor(2.0, shape=(), dtype=float32),
  24: tf.Tensor(2.0, shape=(), dtype=float32),
  25: tf.Tensor(2.0, shape=(), dtype=float32),
  26: tf.Tensor(2.0, shape=(), dtype=float32),
  27: tf.Tensor(2.0, shape=(), dtype=float32),
  28: tf.Tensor(2.0, shape=(), dtype=float32),
  29: tf.Tensor(2.0, shape=(), dtype=float32),
  30: tf.Tensor(2.0, shape=(), dtype=float32),
  31: tf.Tensor(2.0, shape=(), dtype=float32)
}

Limpia

Cuando termines de usar la VM de TPU, sigue estos pasos para limpiar los recursos.

Desconéctate de Compute Engine:
```
(vm)$ exit
```

Borra tu Cloud TPU.

$ gcloud compute tpus tpu-vm delete ${TPU_NAME} \
  --zone=${ZONE}

Ejecuta el siguiente comando para verificar que los recursos se hayan borrado. Asegúrate de que tu TPU ya no aparezca en la lista. La eliminación puede tardar varios minutos.
```
$ gcloud compute tpus tpu-vm list \
  --zone=${ZONE}
```