AI Hypercomputer provides a software stack that contains common tools and libraries that are pre-configured to support your artificial intelligence (AI), machine learning (ML), and high performance computing (HPC) workloads. The software stack is arranged as follows:
data:image/s3,"s3://crabby-images/66f24/66f24ddfca56ca4aefecdc94dbe8d6066f2493f2" alt="AI Hypercomputer software stack."
As outlined in the preceding diagram, the software stack has two main components:
- Deep Learning Software Layer (DLSL) container images: these images package NVIDIA CUDA, NCCL, and ML frameworks like PyTorch, providing a ready-to-use environment for deep learning workloads. These prebuilt DLSL container images are tested and verified to work seamlessly on Google Kubernetes Engine (GKE) node pools.
- Operating system (OS) images: these image are either an Ubuntu LTS or Rocky Linux image which is pre-configured with NVIDIA MOFED drivers, NVIDIA GPU drivers, and an RDMA core. These images are suitable for deploying workloads on Compute Engine instances.
DLSL container images
When working with Google Kubernetes Engine environments, DLSL container images provide the following benefits for your workloads:
- Simpler configuration: by replicating the setup used by internal reproducibility and regression testing, DLSL containers simplify the configuration of your environments setup to run machine learning workloads
- Version management of the pre-configured docker images
- Sample recipes to demonstrate how to start your workloads using the pre-configured docker images
You can access the DLSL container images from the DLSL artifact registry or in the sample recipe guides.
NVIDIA NeMo + NCCL gIB Plugin
These docker images are based on the
NVIDIA NeMo NGC image.
They contain Google's NCCL gIB plugin, and bundles all of the NCCL binaries
that are required for running workloads on each supported accelerator machine.
They also include Google Cloud tools such as
gcsfuse
and
gcloud CLI for deploying workloads to Google Kubernetes Engine.
Model version | Dependencies version | Machine series | Release date | End of support date | Image name | Sample recipes |
---|---|---|---|---|---|---|
nemo24.07-gib1.0.2-A3U |
|
A3 Ultra | February 2, 2025 | February 2, 2026 | us-central1-docker.pkg.dev/deeplearning-images/reproducibility/pytorch-gpu-nemo-nccl:nemo24.07-gib1.0.2-A3U | |
nemo24.07-gib1.0.3-A3U |
|
A3 Ultra | February 2, 2025 | February 2, 2026 | us-central1-docker.pkg.dev/deeplearning-images/reproducibility/pytorch-gpu-nemo-nccl:nemo24.07-gib1.0.3-A3U | |
nemo24.12-gib1.0.3-A3U |
|
A3 Ultra | February 7, 2025 | February 7, 2026 | us-central1-docker.pkg.dev/deeplearning-images/reproducibility/pytorch-gpu-nemo-nccl:nemo24.12-gib1.0.3-A3U |
NVIDIA NeMo
This docker image is based on the
NVIDIA NeMo NGC image
and includes Google Cloud tools such as
gcsfuse
and
gcloud CLI for deploying workloads to Google Kubernetes Engine.
Model version | Dependencies version | Machine series | Release date | End of support date | Image name |
---|---|---|---|---|---|
nemo24.07--A3U |
NeMo NGC:24.07 | A3 Ultra | December 19, 2024 | December 19, 2025 | us-central1-docker.pkg.dev/deeplearning-images/reproducibility/pytorch-gpu-nemo:nemo24.07-A3U |
OS images
The following OS images are optimized for running artificial intelligence (AI) and machine learning (ML) workloads on Google Cloud. For more detailed information about each OS, see the Operating system details page in the Compute Engine documentation.
Rocky Linux
The following Rocky Linux OS images are available:
- Rocky Linux 9 Accelerated
- Image family:
rocky-linux-9-optimized-gcp-nvidia-latest
- Image project:
rocky-linux-accelerator-cloud
- Image family:
- Rocky Linux 8 Accelerated
- Image family:
rocky-linux-8-optimized-gcp-nvidia-latest
- Image project:
rocky-linux-accelerator-cloud
- Image family:
Ubuntu LTS
The following Ubuntu LTS OS images are available:
- Ubuntu 24.04 LTS Accelerated
- Image family:
ubuntu-accelerator-2404-amd64-with-nvidia-550
- Image project:
ubuntu-os-accelerator-images
- Image family:
- Ubuntu 22.04 LTS Accelerated
- Image family:
ubuntu-accelerator-2204-amd64-with-nvidia-550
- Image project:
ubuntu-os-accelerator-images
- Image family:
What's next?
- Review consumption options.
- To get started with creating VMs and clusters, see Create VMs and clusters overview.