This page describes the maximum, minimum, and default resource requests that you can specify for your Google Kubernetes Engine (GKE) Autopilot workloads, and how Autopilot automatically modifies those requests to maintain workload stability.
Overview of resource requests in Autopilot
Autopilot uses the resource requests that you specify in your workload configuration to configure the nodes that run your workloads. Autopilot enforces minimum and maximum resource requests based on the compute class or the hardware configuration that your workloads use. If you do not specify requests for some containers, Autopilot assigns default values to let those containers run correctly.
When you deploy a workload in an Autopilot cluster, GKE validates the workload configuration against the allowed minimum and maximum values for the selected compute class or hardware configuration (such as GPUs). If your requests are less than the minimum, Autopilot automatically modifies your workload configuration to bring your requests within the allowed range. If your requests are greater than the maximum, Autopilot rejects your workload and displays an error message.
The following list summarizes the categories of resource requests:
- Default resource requests: Autopilot adds these if you don't specify your own requests for workloads
- Minimum and maximum resource requests: Autopilot validates your specified requests to ensure that they're within these limits. If your requests are outside the limits, Autopilot modifies your workload requests.
- Workload separation requests: Autopilot has different default values and different minimum values for workloads that you separate from each other.
- Resource requests for DaemonSets: Autopilot has different default, minimum, and maximum values for containers in DaemonSets.
How to request resources
In Autopilot, you request resources in your Pod specification. The supported minimum and maximum resources that you can request change based on the hardware configuration of the node on which the Pods run. To learn how to request specific hardware configurations, refer to the following pages:
Default resource requests
If you do not specify resource requests for some containers in a Pod, Autopilot applies default values. These defaults are suitable for many smaller workloads.
Additionally, Autopilot applies the following default resource requests regardless of the selected compute class or hardware configuration:
Containers in DaemonSets
- CPU: 50 mCPU
- Memory: 100 MiB
- Ephemeral storage: 100 MiB
All other containers
- Ephemeral storage: 1 GiB
For more information about Autopilot cluster limits, see Quotas and limits.
Default requests for compute classes
Autopilot applies the following default values to resources that are not defined in the Pod specification for Pods that run on compute classes:
Compute class | Resource | Default request |
---|---|---|
General-purpose | CPU | 0.5 vCPU |
Memory | 2 GiB | |
Balanced | CPU | 0.5 vCPU |
Memory | 2 GiB | |
Scale-Out | CPU | 0.5 vCPU |
Memory | 2 GiB |
Default requests for other hardware configurations
Autopilot applies the following default values to resources that are not defined in the Pod specification for Pods that run on nodes with specialized hardware, such as GPUs:
Hardware | Resource | Total default request |
---|---|---|
A100 GPUsnvidia-tesla-a100 |
CPU |
|
Memory |
|
|
T4 GPUsnvidia-tesla-t4 |
CPU | 0.5 vCPU |
Memory | 2 GiB |
Minimum and maximum resource requests
The total resources requested by your deployment configuration should be within the supported minimum and maximum values that Autopilot allows. The following conditions apply:
- The ephemeral storage request must be between 10 MiB and 10 GiB for all compute classes and hardware configurations.
- For DaemonSet Pods, the minimum resource requests are 10 mCPU per Pod, 10 MiB of memory per Pod, and 10 MiB of ephemeral storage per container in the Pod.
- The CPU:memory ratio must be within the allowed range for the selected
compute class or hardware configuration. If your CPU:memory ratio is
outside the allowed range, Autopilot automatically increases the
smaller resource. For example, if you request 1 vCPU and 16 GiB of memory
(1:16 ratio) for Pods running on the
Scale-Out
class, Autopilot increases the CPU request to 4 vCPUs, which changes the ratio to 1:4.
Your requests must use specific value increments depending on the compute class, hardware configuration, or type of Pod that you deploy, as follows:
- Compute classes: CPU resources must use increments of 0.25 CPUs, or 250 mCPU. Autopilot automatically adjusts your requests to round up to the nearest 250 mCPU.
- GPUs: CPU resources must use increments of 0.5 CPUs, or 500 mCPU. Autopilot automatically adjusts your requests to round up to the nearest 500 mCPU.
- DaemonSets: For all DaemonSets, CPU resources must use increments of 10 mCPU. Autopilot automatically adjusts your requests to round up to the nearest 10 mCPU.
Minimums and maximums for compute classes
The following table describes the minimum, maximum, and allowed CPU:memory ratio for each compute class that Autopilot supports:
Compute class | CPU:memory ratio (vCPU:GiB) | Resource | Minimum | Maximum |
---|---|---|---|---|
General-purpose | Between 1:1 and 1:6.5 | CPU | 0.25 vCPU | 30 vCPU |
Memory | 0.5 GiB | 110 GiB | ||
Balanced | Between 1:1 and 1:8 | CPU | 0.25 vCPU | 222 vCPU If minimum CPU platform selected:
|
Memory | 0.5 GiB | 851 GiB If minimum CPU platform selected:
|
||
Scale-Out | 1:4 | CPU | 0.25 vCPU |
|
Memory | 1 GiB |
|
To learn how to request compute classes in your Autopilot Pods, refer to Choose compute classes for Autopilot Pods.
Minimums and maximums for other hardware configurations
The following table describes the minimum, maximum, and allowed CPU:memory ratio for Pods that run on nodes with specific hardware such as GPUs:
Hardware | CPU:memory ratio (vCPU:GiB) | Resource | Minimum | Maximum |
---|---|---|---|---|
A100 GPUsnvidia-tesla-a100 |
Not enforced | CPU |
|
The sum of CPU requests of all DaemonSets that run on an A100 GPU node must not exceed 2 vCPU. |
Memory |
|
The sum of memory requests of all DaemonSets that run on an A100 GPU node must not exceed 14 GiB. |
||
T4 GPUsnvidia-tesla-t4 |
Between 1:1 and 1:6.25 | CPU | 0.5 vCPU |
|
Memory | 0.5 GiB |
|
To learn how to request GPUs in your Autopilot Pods, refer to Deploy GPU workloads in Autopilot.
Resource requests for workload separation and Pod anti-affinity
Autopilot lets you use taints and tolerations, and node selectors to ensure that certain Pods only get placed on specific nodes. You can also use Pod anti-affinity to prevent Pods from co-locating on the same node. The default and minimum resource requests for workloads that use these methods to control scheduling behavior are higher than for workloads that don't. For instructions, refer to Configure workload separation in GKE.
If your specified requests are less than the minimums, the behavior of Autopilot changes based on the method that you used, as follows:
- Taints, tolerations, and selectors: Autopilot modifies your Pods to increase the requests when scheduling the Pods.
- Pod anti-affinity: Autopilot rejects the Pod and displays an error message.
The following table describes the default requests and the minimum resource requests that you can specify if you want to separate your Autopilot workloads:
Compute class | Resource | Default | Minimum |
---|---|---|---|
General-purpose | CPU | 0.5 vCPU | 0.5 vCPU |
Memory | 2 GiB | 0.5 GiB | |
Balanced | CPU | 2 vCPU | 1 vCPU |
Memory | 8 GiB | 4 GiB | |
Scale-Out | CPU | 0.5 vCPU | 0.5 vCPU |
Memory | 2 GiB | 1 GiB |
Init containers
Init containers run in serial and must complete before the application containers start. If you don't specify resource requests for your Autopilot init containers, GKE allocates the total resources available to the Pod to each init container. This behavior is different than in GKE Standard, where each init container can use any unallocated resources available on the node on which the Pod is scheduled.
Unlike application containers, GKE recommends that you don't specify resource requests for Autopilot init containers, so that each container gets the full resources available to the Pod. If you request less resources than the defaults, you constrain your init container. If you request more resources than the Autopilot defaults, you might increase your bill for the lifetime of the Pod.
Setting resource limits in Autopilot
Kubernetes lets you set both requests
and limits
for resources in your
PodSpec. Autopilot only considers requests
. The following table
describes what GKE does if you set a value for limits
in your
Autopilot Pod specification:
Values set | Autopilot behavior | Example |
---|---|---|
requests and limits set to different values |
Autopilot sets limits to the values in requests |
Before: resources: requests: cpu: "250m" limits: cpu: "400m" After: resources: requests: cpu: "250m" limits: cpu: "250m" |
limits set, requests not set |
Autopilot sets requests to values in limits |
Before: resources: limits: cpu: "250m" After: resources: requests: cpu: "250m" limits: cpu: "250m" |
This behavior means that Autopilot Pods can't burst to use extra resources on nodes. You should evaluate your workload requirements and set adequate resource requests.
Automatic resource management in Autopilot
If your specified resource requests for your workloads are outside of the allowed ranges, or if you don't request resources for some containers, Autopilot modifies your workload configuration to comply with the allowed limits. Autopilot calculates resource ratios, increments, and the resource scale up requirements after applying default values to containers with no request specified.
- Missing requests: If you don't request resources in some containers, Autopilot applies the default requests for the compute class or hardware configuration.
- CPU:memory ratio: Autopilot scales up the smaller resource to bring the ratio within the allowed range. The extra resources are allocated to the first container.
- Ephemeral storage: Autopilot modifies your ephemeral storage requests to meet the minimum amount required by each container. The cumulative value of storage requests across all containers cannot be more than the maximum allowed value. Autopilot scales the request down if the value exceeds the maximum.
- Requests below minimums: If you request fewer resources than the allowed minimum for the selected hardware configuration, Autopilot automatically modifies the Pod to request at least the minimum resource value.
Resource modification examples
The following example scenario shows how Autopilot modifies your workload configuration to meet the requirements of your running Pods and containers.
Single container with < 0.25 vCPU
Container number | Original request | Modified request |
---|---|---|
1 |
CPU: 180 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
CPU: 250 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
Multiple containers with total CPU < 0.25 vCPU
Container number | Original requests | Modified requests |
---|---|---|
1 | CPU: 0.07 vCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
CPU: 0.11 vCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
2 | CPU: 0.07 vCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
CPU: 0.07 vCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
3 | CPU: 0.07 vCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
CPU: 0.07 vCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
Total Pod resources | CPU: 0.25 vCPU Memory: 1.5 GiB Ephemeral storage: 30 MiB |
Multiple containers with more than 0.25 vCPU total
For multiple containers with total resources >= 0.25 vCPU, the CPU is rounded to multiples of 0.25 vCPU and the extra CPU is added to the first container. In this example, the original cumulative CPU is 0.32 vCPU and is modified to a total of 0.5 vCPU.
Container number | Original requests | Modified requests |
---|---|---|
1 | CPU: 0.17 vCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
CPU: 0.35 vCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
2 | CPU: 0.08 vCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
CPU: 0.08 vCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
3 | CPU: 0.07 vCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
CPU: 0.07 vCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB |
4 | Init container, resources not defined | Will receive Pod resources |
Total Pod resources | CPU: 0.5 vCPU Memory: 1.5 GiB Ephemeral storage: 30 MiB |
Single container with memory too low for requested CPU
In this example, the memory is too low for the amount of CPU (1 vCPU:1 GiB minimum). The minimum allowed ratio for CPU to memory is 1:1. If the ratio is lower than that, the memory request is increased.
Container number | Original request | Modified request |
---|---|---|
1 | CPU: 4 vCPU Memory: 1 GiB Ephemeral storage: 10 MiB |
CPU: 4 vCPU Memory: 4 GiB Ephemeral storage: 10 MiB |
Total Pod resources | CPU: 4 vCPU Memory: 4 GiB Ephemeral storage: 10 MiB |
What's next
- Learn how to select compute classes in your Autopilot workloads.
- Learn more about the supported Autopilot compute classes.
- Learn how to select GPUs in your Autopilot Pods.