Deploy GPU workloads across all your clouds with Anthos and NVIDIA
Amr Abdelrazik
Product Manager, Anthos
We are very excited to announce a joint solution with NVIDIA now publicly available to all users in beta that allows customers to run NVIDIA GPU workloads on Anthos across hybrid cloud and on-premises environments.
Running GPU workloads between clouds
Machine learning is one of the fastest growing application segments in the market today, powering many industries such as biotech, retail, manufacturing and many more.
With such unprecedented growth, customers are facing multiple challenges. The first is the difficult choice of where to run your ML and HPC workloads. While the cloud offers elasticity and flexibility for ML workloads, some applications have latency, data size, or even regulatory requirements that mean they need to reside within certain data centers and at edge locations.
The other challenge is high demand for on-prem GPU resources; no matter how fast organizations onboard GPU hardware, demand is always greater than supply, so you need to always maximize investment in your GPUs.
Organizations are also looking for a hybrid architecture that maximizes both cloud and on-prem resources. In this architecture, bursty, and transient model development and training can run in the cloud, while inference and steady state runtime can be on-prem, or vice versa.
Anthos and ML workloads
Anthos was built to enable customers to easily run applications both in the cloud and on-prem. Built on Kubernetes, Anthos’ advanced cluster management and multi-tenancy capabilities allows you to share your ML infrastructure across teams, increasing utilization and reducing the overhead of managing bespoke environments.
Anthos also allows you to run applications anywhere, whether they reside on-prem, other cloud providers, or even at the edge. The flexibility of deployment options with Anthos combined with open-source ML frameworks such as Tensorflow and Kubeflow lets you build truly cloud-portable ML solutions and applications.
In addition to in-house developed applications, you can use Anthos to deploy Google Cloud’s best-in-class ML services such as Vision AI, Document AI, and many others in your data center and at edge locations, turbocharging ML initiatives in your organizations.
Our collaboration with NVIDIA
For this solution, we’ve built on our strong relationship with NVIDIA, a leader in AI/ML acceleration. The solution uses the NVIDIA GPU Operator to deploy GPU drivers and software components required to enable GPUs in Kubernetes. The solution works with many popular NVIDIA data center GPUs such as the V100 and T4. This broad support allows you to take advantage of your existing and future investments in NVIDIA GPUs with Anthos. For more information about supported NVIDIA platforms, please check the NVIDIA GPU Operator documentation. You can also learn more about other Google Cloud and NVIDIA collaborations.
Getting started
This solution is available as beta and will work with Anthos on-prem 1.4 or later. For instructions on getting started using NVIDIA GPUs with Google Cloud’s Anthos and supported NVIDIA GPUs, please refer to the documentation here.