Jump to Content
AI & Machine Learning

Google Cloud and NVIDIA bring next-generation AI infrastructure and software for large scale models and generative AI applications to enterprises

March 29, 2023
https://storage.googleapis.com/gweb-cloudblog-publish/images/ai_summit.max-2500x2500.jpg
Dave Salvator

Director of Accelerated Computing Products, NVIDIA

Warren Barkley

Sr. Director, Product Management, Cloud AI

Generative AI is quickly becoming a strategic imperative for businesses and organizations across all industries. Yet many organizations face a barrier to adopting generative AI because they don’t have access to the latest models or the AI infrastructure to support large workloads. These barriers prevent organizations from innovating into the next era of AI. 

We’re excited to continue the Google Cloud and NVIDIA collaboration to help companies accelerate generative AI and other modern AI workloads in a cost-effective, scalable, and sustainable way. We’re bringing together best-in-class GPUs from NVIDIA for large model inference and training with the latest AI models and managed tools for generative AI from Google Cloud. We believe for customers to innovate with AI, they also need the best supporting technology—that’s why NVIDIA and Google Cloud are also coming together to offer leading capabilities for data analytics and integration of the best open-source tools.  

In this blog, we highlight ways Google Cloud and NVIDIA are teaming up to help the most innovative AI companies succeed.

Accelerate and scale Generative AI in production 

Google Cloud and NVIDIA are partnering to provide leading capabilities across the AI stack that will help customers take advantage of one of the most influential technologies of our generation: generative AI. In March, Google Cloud launched Generative AI support in Vertex AI, which makes it possible for developers to access, tune, and deploy foundation models. For companies to effectively scale generative AI in production, they need high-efficiency, performant GPUs to support these large AI workloads. 

At GTC, NVIDIA announced that Google Cloud is the first cloud provider to offer the NVIDIA L4 Tensor Core GPU, which is purpose-built for large inference AI workloads like generative AI. L4 GPU will be integrated with Vertex AI and delivers cutting-edge performance-per-dollar for AI inference workloads that run on GPUs in the cloud. Compared to previous-generation instances, the new G2 VM powered by NVIDIA L4 instance delivers up to 4x more performance. As a universal GPU offering, G2 VM instances also accelerate other workloads, offering significant performance improvements on HPC, graphics, and video transcoding. Currently in private preview, G2 VMs are both powerful and flexible, and scale easily from one up to eight GPUs. 

We are also excited to work with NVIDIA to bring our customers the highest performance GPU offering for generative AI training workloads. With optimized support on Vertex AI for both A100 and L4 GPUs, users can both train and deploy generative AI models with the highest performance available on GPUs today. 

We’re excited to offer NVIDIA AI Enterprise software on Google Marketplace. NVIDIA AI Enterprise is a suite of software that  accelerates the data science pipeline and streamlines development and deployment of production AI. With over 50 frameworks, pretrained models and development tools, NVIDIA AI Enterprise is designed to accelerate enterprises to the leading edge of AI, while also simplifying AI to make it accessible to every enterprise.

The latest release supports NVIDIA L4 and H100 Tensor Core GPUs, as well as prior GPU generations including A100 and more.   

Access to wide variety of open source tools 

We’ve worked with NVIDIA to make a wide range of GPUs accessible across Vertex AI’s Workbench, Training, Serving, and Pipeline services to support a variety of open-source models and frameworks. Whether an organization wants to accelerate their Spark, Dask and XGBoost pipelines or leverage PyTorch, TensorFlow, Keras or Ray frameworks for larger deep learning workloads, we have a range of GPUs throughout the Vertex AI Platform that can meet both performance and budget needs. These offerings allow users to take advantage of OSS frameworks and models in a managed and scalable way to accelerate the ML development and deployment lifecycle.

Improve efficiency of data preparation and model training 

Different workloads require different cluster configurations, owing to different goals, data sets, complexities, and timeframes. So, having a one-size-fits-all Spark cluster always at the ready is just not cost-effective or appropriate. Google has partnered with NVIDIA to make GPU-accelerated Spark available to Dataproc customers using the RAPIDS suite of open-source software libraries and APIs for executing data science pipelines entirely on GPUs, so that customers can tailor their Spark clusters to AI/ML workloads.

NVIDIA has been working with the Spark open-source community to implement GPU acceleration in the latest Spark version (3.x). This new version of Spark will let Dataproc customers accelerate various Spark-based AI/ML and ETL workloads without any code changes. Running on GPUs provides latency and cost improvements during the data preparation and model training. Data science teams can tackle larger data sets, iterate faster, and tune models to maximize prediction accuracy and business value.

Reduce carbon footprint of intensive AI workloads 

Google and NVIDIA are focused on helping users reduce the carbon footprint of their digital workloads. In addition to operating the cleanest cloud infrastructure in the industry, Google partners with NVIDIA to offer GPUs that can help increase the energy efficiency of computationally intensive workloads like AI. Accelerated computing not only delivers the best performance, it is also the most energy-efficient compute as well, and is essential to realizing AI’s full potential. For example, looking at the Green500 list of the world’s most efficient supercomputers: GPU-accelerated systems are 10x more energy-efficient compared to CPU-only systems. And when you carefully choose the Google Cloud region and the right GPU for training large models, Google researchers found you can reduce the carbon emissions of AI/ML training by as much as 1,000x. 

Since data center location is such an important factor in reducing carbon emissions of the workload, Google Cloud users are presented with low-carbon icons in the resource creation workflow to help them choose the most carbon-free location to place NVIDIA GPUs on Google Cloud. 

Posted in