Stay organized with collections
Save and categorize content based on your preferences.
AI Hypercomputer is a supercomputing system that is optimized to
support your artificial intelligence (AI) and machine learning (ML) workloads.
It's an integrated system of performance-optimized hardware, open software, ML
frameworks, and flexible consumption models.
AI Hypercomputer uses best practices and systems-level designs to boost
efficiency and productivity across AI pre-training, tuning, and serving.
System architecture
AI Hypercomputer is comprised of the following layers:
Performance-optimized infrastructure: contains accelerators,
networking, and storage resources that provide the computing capabilities
to support your workloads.
Open software: optimized versions of popular machine learning
frameworks such as TensorFlow, PyTorch, and JAX. Google provides
operating systems (OS) that are configured with essential software for
leveraging the compute resources provisioned in your clusters.
To deploy and manage a large number of accelerators as a single unit, you
can also use Cluster Director for Google Kubernetes Engine, or Cluster Director
for Slurm, or directly through Compute Engine APIs.
Consumption options: multiple options to provision clusters that
optimize costs and hardware availability based on your specific needs and
workload patterns.
Benefits
AI Hypercomputer has the following benefits:
High performance and goodput: Goodput metrics measure ML Productivity.
AI Hypercomputer optimizes the scheduling, runtime, and
orchestration layers.
Get up and running quickly: AI Hypercomputer provides tools and
blueprints that
let you reliably and repeatedly deploy large numbers of accelerator-optimized
resources that are configured to support your most demanding AI and ML
workloads.
Use cases
AI Hypercomputer was designed to meet the needs of the following use cases:
Use case
Example workloads
Large-scale AI and ML workloads
Generative AI distributed training
Generative AI inference
Fraud detection
Recommendation models
High performance computing (HPC)
Complex simulations
Drug discovery, protein folding, and genomic analysis
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eAI Hypercomputer is a high-performance computing architecture designed to support AI and ML workloads through an integrated system of optimized hardware, open software, and flexible consumption models.\u003c/p\u003e\n"],["\u003cp\u003eUnlike conventional methods, AI Hypercomputer employs systems-level codesign to enhance efficiency and productivity across AI pre-training, tuning, and serving.\u003c/p\u003e\n"],["\u003cp\u003eThe architecture includes performance-optimized infrastructure, open software with popular ML frameworks, and various consumption options to meet diverse workload needs.\u003c/p\u003e\n"],["\u003cp\u003eIt offers high performance and goodput by optimizing scheduling, runtime, and program goodput across various layers, providing immense computing power for large datasets and complex workloads.\u003c/p\u003e\n"],["\u003cp\u003eAI Hypercomputer caters to large-scale AI and ML workloads like LLM training and fraud detection, as well as HPC use cases such as complex simulations and drug discovery.\u003c/p\u003e\n"]]],[],null,["# AI Hypercomputer overview\n\nAI Hypercomputer is a supercomputing system that is optimized to\nsupport your artificial intelligence (AI) and machine learning (ML) workloads.\nIt's an integrated system of performance-optimized hardware, open software, ML\nframeworks, and flexible consumption models.\n\nAI Hypercomputer uses best practices and systems-level designs to boost\nefficiency and productivity across AI pre-training, tuning, and serving.\n\nSystem architecture\n-------------------\n\nAI Hypercomputer is comprised of the following layers:\n\n- **Performance-optimized infrastructure**: contains accelerators, networking, and storage resources that provide the computing capabilities to support your workloads.\n- **Open software**: optimized versions of popular machine learning frameworks such as TensorFlow, PyTorch, and JAX. Google provides operating systems (OS) that are configured with essential software for leveraging the compute resources provisioned in your clusters. To deploy and manage a large number of accelerators as a single unit, you can also use Cluster Director for Google Kubernetes Engine, or Cluster Director for Slurm, or directly through Compute Engine APIs.\n- **Consumption options**: multiple options to provision clusters that optimize costs and hardware availability based on your specific needs and workload patterns.\n\nBenefits\n--------\n\nAI Hypercomputer has the following benefits:\n\n- *High performance and goodput* : [Goodput](https://cloud.google.com/blog/products/ai-machine-learning/goodput-metric-as-measure-of-ml-productivity?e=48754805) metrics measure ML Productivity. AI Hypercomputer optimizes the scheduling, runtime, and orchestration layers.\n- *Get up and running quickly* : AI Hypercomputer provides tools and [blueprints](/cluster-toolkit/docs/setup/cluster-blueprint) that let you reliably and repeatedly deploy large numbers of accelerator-optimized resources that are configured to support your most demanding AI and ML workloads.\n\nUse cases\n---------\n\nAI Hypercomputer was designed to meet the needs of the following use cases:\n\nWhat's next?\n------------\n\n- Review [Performance-optimized infrastructure](/ai-hypercomputer/docs/gpu).\n- Review [optimized software](/ai-hypercomputer/docs/optimized-software).\n- Review [consumption models](/ai-hypercomputer/docs/consumption-models).\n- Learn about [Cluster Director](/ai-hypercomputer/docs/cluster-director)."]]