Speed up machine learning workloads with Google’s custom-developed hardware accelerators.
View documentation for this product.
Accelerate machine learning models with Google supercomputers
Quickly train and iterate on machine learning models
Cloud TPU minimizes the time-to-accuracy when you train large, complex neural network models. Models that would have taken weeks to train on other hardware.
Handle large-scale workloads with flexibility
TPUs were specifically designed for models with matrix computations and large models with large effective batch sizes. TPU VMs make it easy to employ popular ML frameworks.
Cloud TPUs for every workload and budget
Cloud TPU is designed to run cutting-edge machine learning models with AI services on Google Cloud. And its custom high-speed network offers over 100 petaflops of performance in a single pod—enough computational power to transform your business or create the next research breakthrough.
Full backwards compatibility
Cloud TPU v4 Pods are the latest generation of Google’s custom ML accelerators and are now available in GA. It retains backwards compatibility with Cloud TPU v2 and v3, but has a >2x increase over Cloud TPU v3 in raw compute performance per chip. Each TPU v4 chip also contains a single logical core, enabling utilization of a full 32 GiB of memory from one program, compared to 8 GiB on v2 and 16 GiB on v3. Learn which of Cloud TPU products works best for your unique project needs.
Runs popular ML frameworks
Run machine learning workloads on Cloud TPUs using machine learning frameworks such as TensorFlow, Pytorch, and JAX. Our quickstarts provides a brief introduction to working with Cloud TPU VMs and explains how to install an ML framework and run a sample application on a Cloud TPU VM.
Save money by using preemptible Cloud TPUs
You can save money by using preemptible Cloud TPUs for fault-tolerant machine learning workloads, such as long training runs with checkpointing or batch prediction on large datasets. Preemptible Cloud TPUs are 70% cheaper than on-demand instances, making everything from your first experiments to large-scale hyperparameter searches more affordable than ever. Visit our pricing page to get a sense of how Cloud TPU can process your machine learning workloads in a cost-effective manner.
Learn how TPU v4 has enabled our customers
“At Cohere, we build cutting-edge natural language processing (NLP) services, including APIs for language generation, classification, and search. These tools are built on top of a set of language models that Cohere trains from scratch on Cloud TPUs using JAX. We saw a 70% improvement in training time for our largest model when moving from Cloud TPU v3 Pods to Cloud TPU v4 Pods, allowing faster iterations for our researchers and higher quality results for our customers. The exceptionally low carbon footprint of Cloud TPU v4 Pods was another key factor for us.”
Aidan Gomez, CEO and co-founder
“LG AI Research, as a strategic research partner, participated in testing TPU v4 before commercialization, Google’s latest machine learning supercomputer, to train LG EXAONE, a super-giant AI that has 300 billion parameters scale. Equipped with multimodal capabilities, LG EXAONE had been training with TPU v4 and a huge amount of data, more than 600 billion text corpus and 250 million images, aiming to surpass human experts in terms of communication, productivity, creativity, and many more aspects. Not only did the performance of TPU v4 outperform other best-in-class computing architectures, but also the customer-oriented support was beyond our expectations. We were very excited to collaborate with Google and expect to solidify the strategic partnership to achieve our ultimate vision, advancing AI for a better life.”
Kyunghoon Bae, PhD, Chief of LG AI research
“Early access to TPU v4 has enabled us to achieve breakthroughs in conversational AI programming with our CodeGen project, a 16-billion parameter auto-regressive language model that turns simple English prompts into executable code. The large size of this model is motivated by the empirical observation that scaling the number of model parameters proportional to the number of training samples appears to strictly improve the performance of the model. The phenomenon is known as the scaling law. TPU v4 is an outstanding platform for this kind of scale-out ML training, providing significant performance advantages over other comparable AI hardware alternatives.”
Erik Nijkamp, Research scientist, Salesforce
Take the next step
Start building on Google Cloud with $300 in free credits and 20+ always free products.
Take the next step
Start your next project, explore interactive tutorials, and manage your account.