Easily configure, deploy, and manage AI or HPC clusters. Get the automation upsides of managed infrastructure without limiting your control.
Funzionalità
You can access Cluster Director capabilities in two ways:
Cluster Director provides fault-tolerant and highly scalable job scheduling out of the box. The controller node is managed for you. You can easily configure the login nodes for your cluster, including machine type, source image, and boot-disk size.
Use the control plane to easily create, update, and delete your cluster. It also simplifies networking by allowing you to deploy clusters on a new, purpose-built VPC network or an existing one. For storage, you can create and attach a new Filestore or Google Cloud Managed Lustre instance, or connect to an existing Cloud Storage bucket.
To maximize performance, Cluster Director is deeply integrated with Google's network topology. This ensures that VMs within a cluster are placed in close physical proximity, reducing network latency—critical for highly synchronized distributed training workloads.
Cluster Director's integrated observability dashboard provides a clear view of your cluster's health, utilization, and performance, so you can quickly understand your system's behavior and diagnose issues in a single place. The dashboard is designed to easily scale to tens of thousands of VMs.
Get foundational reliability by requesting a Bill of Health, plus additional features such as 3-tier checkpointing and advanced maintenance controls to help maximize training efficiency.
Come funziona
AI infrastructure users can spend weeks wrestling with configurations before hitting 'deploy,' but it doesn't have to be that way. Learn what you can expect as a first time Cluster Director user, from preparing an environment to deployment, to turning interruptions into managed events.
Utilizzi comuni
Before you spin up a cluster, you need assurance your accelerators will be performant and reliable from the get-go. Cluster Director provides intelligent, topology-aware placement for your TPUs and GPUs.
Every compute, networking, and storage component is validated through a rigorous, multi-stage qualification process, captured in a detailed Bill of Health that provides the ultimate proof of quality and readiness.
Before you spin up a cluster, you need assurance your accelerators will be performant and reliable from the get-go. Cluster Director provides intelligent, topology-aware placement for your TPUs and GPUs.
Every compute, networking, and storage component is validated through a rigorous, multi-stage qualification process, captured in a detailed Bill of Health that provides the ultimate proof of quality and readiness.
Remove the complexity of setting up a GKE or Slurm cluster. Start with validated reference architectures, choose your accelerator and storage resources, and let Cluster Director do the rest.
Deploy a fully optimized environment at any scale with Google’s best practices for performance and topology baked in, drastically reducing deployment time.
Remove the complexity of setting up a GKE or Slurm cluster. Start with validated reference architectures, choose your accelerator and storage resources, and let Cluster Director do the rest.
Deploy a fully optimized environment at any scale with Google’s best practices for performance and topology baked in, drastically reducing deployment time.
Bridge the gap between raw infrastructure and running a job with a single console for your Slurm cluster. Get a topology view of cluster health and utilization.
When issues arise, use job-centric observability to instantly correlate metrics across the full stack with a single job ID, turning hours of guesswork into a few clicks and quickly identifying the root cause of any slowdowns.
Bridge the gap between raw infrastructure and running a job with a single console for your Slurm cluster. Get a topology view of cluster health and utilization.
When issues arise, use job-centric observability to instantly correlate metrics across the full stack with a single job ID, turning hours of guesswork into a few clicks and quickly identifying the root cause of any slowdowns.
You can use Cluster Director to proactively detect, remediate, and recover from infrastructure issues.
For example, you get always-on health checks, straggler detection, and an AI health predictor to proactively identify issues.
You can use Cluster Director to proactively detect, remediate, and recover from infrastructure issues.
For example, you get always-on health checks, straggler detection, and an AI health predictor to proactively identify issues.
Prezzi
| How Cluster Director pricing works | There is no extra charge for using Cluster Director. You only pay for the underlying Google Cloud resources that your clusters use, such as compute, storage, and networking. | |
|---|---|---|
| Services | Description | Price (USD) |
Get started free | New users get $300 in free trial credits to use within 90 days. | Free |
The Compute Engine free tier gives you one e2-micro VM instance, up to 30 GB standard persistent disk storage, and up to 1 GB of outbound data transfers per month. | Free | |
VM instances, storage, and networking | Review our Compute Engine pricing for more information. Only pay for the services you use. No up-front fees. No termination charges. Pricing varies by product and usage. | A partire da $0.01 (e2-micro, pay-as-you-go) |
How Cluster Director pricing works
There is no extra charge for using Cluster Director. You only pay for the underlying Google Cloud resources that your clusters use, such as compute, storage, and networking.
Get started free
New users get $300 in free trial credits to use within 90 days.
Free
The Compute Engine free tier gives you one e2-micro VM instance, up to 30 GB standard persistent disk storage, and up to 1 GB of outbound data transfers per month.
Free
VM instances, storage, and networking
Review our Compute Engine pricing for more information.
Only pay for the services you use. No up-front fees. No termination charges. Pricing varies by product and usage.
Starting at
$0.01
(e2-micro, pay-as-you-go)