Google Cloud first to offer NVIDIA Tesla T4 GPUs
Today, we are happy to announce that Google Cloud Platform (GCP) is the first major cloud vendor to offer availability for the NVIDIA Tesla T4 GPU. Now available in alpha, the T4 GPU is optimized for machine learning (ML) inference, distributed training of models, and computer graphics.
Fast, cost-effective ML inference
Compared to other artificial intelligence techniques, ML inference requires particularly high-performance and low-latency compute. With Turing Tensor Core support for FP32, FP16, INT8 precision modes, the NVIDIA Tesla T4 allows for up to 130 TFLOPS of ML inference compute performance, with latency as low as 1.1 ms*. In addition, the T4’s 16GB of high-speed GPU memory helps support both large ML models and performing inference on multiple ML models simultaneously, for greater overall inference efficiency. Finally, the T4 is the only GPU that currently offers INT4 and INT1 precision support, for even greater performance.
Offering a low-cost option for training ML models
Many of you have also told us that you want a GPU that supports mixed-precision computation (both FP32 and FP16) for ML training with great price/performance. The T4’s 65 TFLOPS of hybrid FP32/FP16 ML training performance and 16GB of GPU memory addresses this need for many distributed training, reinforcement learning and other ML training workloads. Pricing on the T4 will be shared at time of beta announcement.
Supercharging graphics workloads
Thanks to new hardware-accelerated graphics features, the Tesla T4 is also an excellent choice for demanding graphics workloads such as real-time ray tracing, offline rendering, or any application that takes advantage of NVIDIA's RTX technology. The T4’s Turing architecture fuses real-time ray tracing, AI, simulation, and rasterization to provide a novel, multi-pronged approach to rendering computer graphics. At the same time, dedicated ray tracing processors called RT Cores can render how rays of light travel in 3D environments.
We want to make it easy for you to start using the Tesla T4. You can get started quickly on Compute Engine (GCE) with our Deep Learning VM images that come preconfigured with everything you need to run high-performance inference workloads. In addition, T4 support will be coming shortly to Google Kubernetes Engine (GKE) and other GCP services.
Real-time visualization and online inference workloads need low latency for their end users. GCP’s industry-leading network capabilities together with our T4 offering enable you to innovate in new ways, speeding up your applications while reducing your costs. GCP’s scale lets you progress from development to deployment to production on up to thousands of GPUs with a simple API call. You can also optimize price and performance by attaching up to 4 T4 GPUs to any Custom VM shape.
* 1.1ms inference latency measured with Resnet50 model, INT8 precision, batch size=1