Can I use the TPU for inference?
Yes, TPUs can be used for both training and inference. For example, the ResNet tutorial performs periodic evaluation during the training loop. For model serving, there are a few caveats to be aware of. In particular, the TPU software stack is currently optimized for throughput, not latency. Executing inference on a single batch of input and waiting for the result currently has an overhead of at least 10 ms, which can be problematic for low-latency serving.
This overhead will be reduced significantly in upcoming TensorFlow releases.
Are there any built-in TensorFlow ops that are not available on the TPU?
A small number of built-in TensorFlow ops are not currently available on the TPU. See the guide to available TensorFlow Ops, which details the current workarounds.
How can I write a custom op for the TPU?
TensorFlow ops that run on the TPU are implemented in XLA HLO, which is a language for defining high-level tensor ops using a small set of low-level functions. XLA is included in TensorFlow's open source release, so it is technically possible to write your op in HLO. The majority of existing implementations can be found in the tf2xla directory. However, this only allows for execution of a limited set of tensor ops on the TPU, not arbitrary C++ or Python code. Most common tensor ops that can be implemented in HLO have already been written. An upcoming release of TensorFlow will support the ability to efficiently execute standard CPU ops during TPU training/inference.
Can I use placeholders and feed dictionaries with a TPU?
Although this usage pattern is technically available on the TPU, we
strongly recommend against using it, as it uses only a single TPU core
and results in excessive overhead. Instead, to create a
training pipeline, use the
API and the
API. See the ResNet tutorial for an example of how
to create a simple training loop with
Can I train a reinforcement learning (RL) model with a TPU?
Reinforcement learning covers a wide array of techniques, some of which currently are not compatible with the software abstractions for TPUs. Some reinforcement learning configurations require executing a black-box "simulation environment" using a CPU as part of the training loop. Our experience is that these cannot keep up with the TPU and result in significant inefficiencies. Future releases of TensorFlow will include abstractions to make "off-policy" reinforcement learning easier.
Can I use word embeddings with a TPU?
Yes, the TPU supports
tf.nn.embedding_lookup() since it is just a wrapper
tf.gather(), which has an implementation on the TPU. However,
the TPU does not support
tf.nn.embedding_lookup_sparse(). Note that the input
id tensor to
tf.embedding_lookup() must have a static shape during training
(that is, the batch size and sequence length must be the same for every batch).
This is a more general restriction on all tensors when using the TPU.
Can I use variable-length sequences with a TPU?
There are several methods for representing variable-length sequences in
TensorFlow, including padding,
tf.while_loop(), inferred tensor dimensions,
and bucketing. Unfortunately, the current TPU execution engine only supports a
subset of these. Variable-length sequences must be implemented using
tf.dynamic_rnn(), bucketing, padding, or sequence
Can I train a Recurrent Neural Network (RNN) on a TPU?
In certain configurations,
compatible with the current TPU execution engine. More generally, the TPU
TensorArray, which are used to implement
tf.dynamic_rnn(). Specialized toolkits such as CuDNN are not supported on the
TPU, as they contain GPU-specific code. Using
tf.while_loop() on the TPU does
require specifying an upper bound on the number of loop iterations so that the
TPU execution engine can statically determine the memory usage.
Can I train a generative adversarial network (GAN) with a TPU?
Training GANs typically requires frequently alternating between training the generator and training the discriminator. The current TPU execution engine only supports a single execution graph. Alternating between graphs requires a complete re-compilation, which can take 30 seconds or more. This limitation will be improved in an upcoming TensorFlow release.
One potential workaround is to always compute the sum of losses for both the
generate and discriminator, but multiply these losses them by two input tensors
d_w. In batches where the generator should be trained, you can pass
d_w=0.0, and vice-versa for batches where the discriminator
should be trained.
Can I train a multi-task learning model with a TPU?
If the tasks can be represented as one large graph with an aggregate loss function, then no special support is needed for multi-task learning. However, the TPU execution engine currently only supports a single execution graph. Therefore, it is not possible to quickly alternate between multiple execution graphs which share variables but have different structure. Changing execution graphs requires re-running the graph compilation step, which can take 30 seconds or more.
Does the TPU support eager mode?
No, eager mode uses a new dynamic execution engine, while the TPU uses XLA, which performs static compilation of the execution graph.
Does the TPU support model parallelism?
Model parallelism (or executing non-identical TPU programs on the multiple cores within a single TPU device) is not currently supported on the TPU, but will be supported in an upcoming TensorFlow release.
How can I inspect the actual value of intermediate tensors on the TPU, as with
This capability is currently not supported on the TPU. The suggested pattern for
development on the TPU is to implement the model using the
framework, which allows for effortless transition between the TPU and CPU/GPU
use_tpu flag. You are encouraged to debug your models on the CPU/GPU
using the standard TensorFlow tools, and then switch to the TPU when your model
is ready for a full-scale training.
My training scheme is too complex or specialized for
TPUEstimator API, is
there a lower-level API that I can use?
TPUEstimator is the primary framework for TPU training on a Cloud TPU.
TPUEstimator wraps the
tpu API, which is part of open source
TensorFlow, so it is technically possible (but unsupported) to use the low-level
tpu API directly. If your training pipeline requires frequent communication
between the TPU and CPU, or requires frequently changing the execution graph,
your computation cannot run efficiently on the TPU. Upcoming releases of
TensorFlow will improve both capabilities.