GPUs when you need them: Introducing Flex-start VMs
Ari Liberman
Group Product Manager
Satish Iyer
Senior Product Manager
Innovating with AI requires accelerators such as GPUs that can be hard to come by in times of extreme demand. To address this challenge, we offer Dynamic Workload Scheduler (DWS), a service that optimizes access to compute resources when and where you need them. In July, we announced Calendar mode in DWS to provide short-term ML capacity without long-term commitments, and today, we are taking the next step: the general availability (GA) of Flex-start VMs.
Available through the Compute Engine instance API, gcloud CLI, and the Google Cloud console, Flex-start VMs provide a simple and direct way to create single VM instances that can wait for in-demand GPUs. This makes it easy to integrate this flexible consumption option into your existing workflows and schedulers.
What are Flex-start VMs?
Flex-start VMs, powered by Dynamic Workload Scheduler, introduce a highly differentiated consumption model that’s a first among major cloud providers, letting you create single VM instances that provide fair and improved access to GPUs. Flex-start VMs are ideal for defined-duration tasks such as AI model fine-tuning, batch inference, HPC, and research experiments that don’t need to start immediately. In exchange for being flexible with start time, you get two major benefits:
- Dramatically improved resource obtainability: By allowing your capacity requests to persist in a queue for up to two hours, you increase the likelihood of securing resources, without needing to build your own retry logic.
- Cost-effective pricing: Flex-start VM SKUs offer significant discounts compared to standard on-demand pricing, making cutting-edge accelerators more accessible.
Flex-start VMs can run uninterrupted for a maximum of seven days and consume preemptible quota.
A new way to request capacity


request-valid-for-duration
.request-valid-for-duration
flag. Select a period between 90 seconds and 2 hours to instruct Compute Engine to hold your request in a queue. Your VM enters a PENDING state, and the system works to provision your resources as they become available within your specified timeframe. This “get-in-line” approach provides a fair and managed way to access hardware, transforming the user experience from one of repeated manual retries to a simple, one-time request.Key features of Flex-start VMs
-
Direct instance API access: Integration with instances.insert, or via a single CLI command, lets you create single Flex-start VMs simply and directly, making it easy to integrate them into custom schedulers and workflows.
-
Stop and start capabilities: You have full control over your Flex-start VMs. For instance, you can stop an instance to pause billing and release the underlying resources. Then, when you're ready to resume it, simply issue a start command to place a new capacity request. Once the capacity is successfully provisioned, the seven-day maximum run duration clock resets.
-
Configurable termination action: For many advanced use cases, you can set
instanceTerminationAction = STOP
so that when your VM's seven-day runtime expires, the instance is stopped rather than deleted. This preserves your VM's configuration, including its IP address and boot disk, saving on setup time for subsequent runs.
What customers have to say
Get started today
Getting started with a queued Flex-start VM is straightforward. You can create one using a gcloud command or directly through the API.
gcloud example (to wait in queue):
API Request Snippet (JSON):
Flex-start VMs in the Instance API is a direct response to the need for more efficient, reliable, and fair access to high-demand AI accelerators. By introducing a novel queuing mechanism,you can integrate the new Flex-start consumption model into your existing workflows easily, so you can spend less time architecting retry loops for on-demand access. To learn more and try Flex-start VMs today, see the documentation and pricing information.