AI Hypercomputer is a high performance computing architecture that employs an integrated system of performance-optimized hardware, open software, leading machine learning frameworks, and flexible consumption models to support your artificial intelligence (AI) and machine learning (ML) workloads.
Conventional methods often tackle demanding AI and ML workloads through piecemeal, component-level enhancements, which can lead to inefficiencies and bottlenecks. In contrast, AI Hypercomputer employs best practices and systems-level codesign to boost efficiency and productivity across AI pre-training, tuning, and serving.
System architecture
The AI Hypercomputer architecture is comprised of the following stacks:
- Performance-optimized infrastructure: accelerator, networking, and storage resources that provide the computing capabilities to support your workloads.
- Open software: optimized versions of popular machine learning frameworks such as TensorFlow, PyTorch, and JAX. We also provide operating systems (OS) that are configured with essential software for leveraging the compute resources provisioned in your clusters.
- Consumption options: multiple options to provision clusters that optimize costs and hardware availability based on your specific needs and workload patterns.
Benefits
With AI Hypercomputer, you can get the following benefits:
- High performance and goodput: Goodput is a collection of metrics that measures the rate of ML Productivity. The AI Hypercomputer architecture optimizes the scheduling, runtime, and program Goodput across components such as the framework, runtime, and orchestration layers by leveraging performance-optimized hardware to provide immense computing power to handle massive datasets and complex workloads.
- Get up and running quickly: provides tools and cluster blueprints that allow you to reliably and repeatedly deploy large numbers of accelerator-optimized resources that are configured to support your most demanding AI and ML workloads
Use cases
The AI Hypercomputer architecture was designed to meet the needs for the following use cases:
Large-scale AI and ML workloads | High Performance Computing (HPC) |
---|---|
|
|
What's next?
- Review Performance-optimized infrastructure
- Review optimized software
- Review consumption models
- Learn about Hypercompute Cluster.