Viewing cluster autoscaler events


This page describes the decisions the Google Kubernetes Engine (GKE) cluster autoscaler makes about autoscaling.

The GKE cluster autoscaler emits visibility events, which are available as log entries in Cloud Logging.

The events described in this guide are separate from the Kubernetes events produced by the cluster autoscaler.

Availability requirements

The ability to view logged events for cluster autoscaler is available in the following cluster versions:

Event type Cluster version
status, scaleUp, scaleDown, eventResult 1.15.4-gke.7 and later
nodePoolCreated, nodePoolDeleted. 1.15.4-gke.18 and later
noScaleUp 1.16.6-gke.3 and later
noScaleDown 1.16.8-gke.2 and later

To see autoscaler events, you must enable Cloud Logging in your cluster. The events won't be produced if Logging is disabled.

Viewing events

The visibility events for the cluster autoscaler are stored in a Cloud Logging log, in the same project as where your GKE cluster is located. You can also view these events from the notifications in the Google Kubernetes Engine page in Google Cloud console.

Viewing visibility event logs

To view the logs, perform the following:

  1. In the Google Cloud console, go to the Kubernetes Clusters page.

    Go to Kubernetes Clusters

  2. Select the name of your cluster to views its Cluster Details page.

  3. On the Cluster Details page, click on the Logs tab.

  4. On the Logs tab, click on the Autoscaler Logs tab to view the logs.

  5. (Optional) To apply more advanced filters to narrow the results, click the button with the arrow on the right side of the page to view the logs in Logs Explorer.

Viewing visibility event notifications

To view the visibility event notifications on the Google Kubernetes Engine page, perform the following:

  1. Go to the Google Kubernetes Engine page in the Google Cloud console:

    Go to Google Kubernetes Engine

  2. Check the Notifications column for specific clusters to find notifications related to scaling.

  3. Click on the notification for detailed information, recommended actions, and to access the logs for this event.

Types of events

All logged events are in the JSON format and can be found in the jsonPayload field of a log entry. All timestamps in the events are UNIX second timestamps.

Here's a summary of the types of events emitted by the cluster autoscaler:

Event type Description
status Occurs periodically and describes the size of all autoscaled node pools and the target size of all autoscaled node pools as observed by the cluster autoscaler.
scaleUp Occurs when cluster autoscaler scales the cluster up.
scaleDown Occurs when cluster autoscaler scales the cluster down.
eventResult Occurs when a scaleUp or a scaleDown event completes successfully or unsuccessfully.
nodePoolCreated Occurs when cluster autoscaler with node auto-provisioning enabled creates a new node pool.
nodePoolDeleted Occurs when cluster autoscaler with node auto-provisioning enabled deletes a node pool.
noScaleUp Occurs when there are unschedulable Pods in the cluster, and cluster autoscaler cannot scale the cluster up to accommodate the Pods.
noScaleDown Occurs when there are nodes that are blocked from being deleted by cluster autoscaler.

Status event

A status event is emitted periodically, and describes the actual size of all autoscaled node pools and the target size of all autoscaled node pools as observed by cluster autoscaler.

Example

The following log sample shows a status event:

{
  "status": {
    "autoscaledNodesCount": 4,
    "autoscaledNodesTarget": 4,
    "measureTime": "1582898536"
  }
}

ScaleUp event

A scaleUp event is emitted when the cluster autoscaler scales the cluster up. The autoscaler increases the size of the cluster's node pools by scaling up the underlying Managed Instance Groups (MIGs) for the node pools. To learn more about how scale up works, see How does scale up work? in the Kubernetes Cluster Autoscaler FAQ.

The event contains information on which MIGs were scaled up, by how many nodes, and which unschedulable Pods triggered the event.

The list of triggering Pods is truncated to 50 arbitrary entries. The actual number of triggering Pods can be found in the triggeringPodsTotalCount field.

Example

The following log sample shows a scaleUp event:

{
  "decision": {
    "decideTime": "1582124907",
    "eventId": "ed5cb16d-b06f-457c-a46d-f75dcca1f1ee",
    "scaleUp": {
      "increasedMigs": [
        {
          "mig": {
            "name": "test-cluster-default-pool-a0c72690-grp",
            "nodepool": "default-pool",
            "zone": "us-central1-c"
          },
          "requestedNodes": 1
        }
      ],
      "triggeringPods": [
        {
          "controller": {
            "apiVersion": "apps/v1",
            "kind": "ReplicaSet",
            "name": "test-85958b848b"
          },
          "name": "test-85958b848b-ptc7n",
          "namespace": "default"
        }
      ],
      "triggeringPodsTotalCount": 1
    }
  }
}

ScaleDown event

A scaleDown event is emitted when cluster autoscaler scales the cluster down. To learn more about how scale down works, see How does scale down work? in the Kubernetes Cluster Autoscaler FAQ.

The cpuRatio and memRatio fields describe the CPU and memory utilization of the node, as a percentage. This utilization is a sum of Pod requests divided by node allocatable, not real utilization.

The list of evicted Pods is truncated to 50 arbitrary entries. The actual number of evicted Pods can be found in the evictedPodsTotalCount field.

Use the following query to verify if the cluster autoscaler scaled down the nodes

resource.type="k8s_cluster" \
resource.labels.location=COMPUTE_REGION \
resource.labels.cluster_name=CLUSTER_NAME \
log_id("container.googleapis.com/cluster-autoscaler-visibility") \
( "decision" NOT "noDecisionStatus" )

Replace the following:

  • CLUSTER_NAME: the name of the cluster.

  • COMPUTE_REGION: the cluster's Compute Engine region, such as us-central1.

Example

The following log sample shows a scaleDown event:

{
  "decision": {
    "decideTime": "1580594665",
    "eventId": "340dac18-8152-46ff-b79a-747f70854c81",
    "scaleDown": {
      "nodesToBeRemoved": [
        {
          "evictedPods": [
            {
              "controller": {
                "apiVersion": "apps/v1",
                "kind": "ReplicaSet",
                "name": "kube-dns-5c44c7b6b6"
              },
              "name": "kube-dns-5c44c7b6b6-xvpbk"
            }
          ],
          "evictedPodsTotalCount": 1,
          "node": {
            "cpuRatio": 23,
            "memRatio": 5,
            "mig": {
              "name": "test-cluster-default-pool-c47ef39f-grp",
              "nodepool": "default-pool",
              "zone": "us-central1-f"
            },
            "name": "test-cluster-default-pool-c47ef39f-p395"
          }
        }
      ]
    }
  }
}

You can also view the scale-down event on the nodes with no workload running (typically only system pods created by DaemonSets). Use the following query to see the event logs:

resource.type="k8s_cluster" \
resource.labels.project_id=PROJECT_ID \
resource.labels.location=COMPUTE_REGION \
resource.labels.cluster_name=CLUSTER_NAME \
severity>=DEFAULT \
logName="projects/PROJECT_ID/logs/events" \
("Scale-down: removing empty node")

Replace the following:

  • PROJECT_ID: your project ID.

  • CLUSTER_NAME: the name of the cluster.

  • COMPUTE_REGION: the cluster's Compute Engine region, such as us-central1.

EventResult event

An eventResult event is emitted when a scaleUp or a scaleDown event completes successfully or unsuccessfully. This event contains a list of event IDs (from the eventId field in scaleUp or scaleDown events), along with error messages. An empty error message indicates the event completed successfully. A list of eventResult events are aggregated in the results field.

To diagnose errors, consult the ScaleUp errors and ScaleDown errors sections.

Example

The following log sample shows an eventResult event:

{
  "resultInfo": {
    "measureTime": "1582878896",
    "results": [
      {
        "eventId": "2fca91cd-7345-47fc-9770-838e05e28b17"
      },
      {
        "errorMsg": {
          "messageId": "scale.down.error.failed.to.delete.node.min.size.reached",
          "parameters": [
            "test-cluster-default-pool-5c90f485-nk80"
          ]
        },
        "eventId": "ea2e964c-49b8-4cd7-8fa9-fefb0827f9a6"
      }
    ]
  }
}

NodePoolCreated event

A nodePoolCreated event is emitted when cluster autoscaler with node auto-provisioning enabled creates a new node pool. This event contains the name of the created node pool and a list of its underlying MIGs. If the node pool was created because of a scaleUp event, the eventId of the corresponding scaleUp event is included in the triggeringScaleUpId field.

Example

The following log sample shows a nodePoolCreated event:

{
  "decision": {
    "decideTime": "1585838544",
    "eventId": "822d272c-f4f3-44cf-9326-9cad79c58718",
    "nodePoolCreated": {
      "nodePools": [
        {
          "migs": [
            {
              "name": "test-cluster-nap-n1-standard--b4fcc348-grp",
              "nodepool": "nap-n1-standard-1-1kwag2qv",
              "zone": "us-central1-f"
            },
            {
              "name": "test-cluster-nap-n1-standard--jfla8215-grp",
              "nodepool": "nap-n1-standard-1-1kwag2qv",
              "zone": "us-central1-c"
            }
          ],
          "name": "nap-n1-standard-1-1kwag2qv"
        }
      ],
      "triggeringScaleUpId": "d25e0e6e-25e3-4755-98eb-49b38e54a728"
    }
  }
}

NodePoolDeleted event

A nodePoolDeleted event is emitted when cluster autoscaler with node auto-provisioning enabled deletes a node pool.

Example

The following log sample shows a nodePoolDeleted event:

{
  "decision": {
    "decideTime": "1585830461",
    "eventId": "68b0d1c7-b684-4542-bc19-f030922fb820",
    "nodePoolDeleted": {
      "nodePoolNames": [
        "nap-n1-highcpu-8-ydj4ewil"
      ]
    }
  }
}

NoScaleUp event

A noScaleUp event is periodically emitted when there are unschedulable Pods in the cluster and cluster autoscaler cannot scale the cluster up to accommodate the Pods.

  • noScaleUp events are best-effort, that is, these events do not cover all possible reasons for why cluster autoscaler cannot scale up.
  • noScaleUp events are throttled to limit the produced log volume. Each persisting reason is only emitted every couple of minutes.
  • All the reasons can be arbitrarily split across multiple events. For example, there is no guarantee that all rejected MIG reasons for a single Pod group will appear in the same event.
  • The list of unhandled Pod groups is truncated to 50 arbitrary entries. The actual number of unhandled Pod groups can be found in the unhandledPodGroupsTotalCount field.

Reason fields

The following fields help to explain why scaling up did not occur:

  • reason: Provides a global reason for why cluster autoscaler is prevented from scaling up. Refer to the NoScaleUp top-level reasons section for details.
  • napFailureReason: Provides a global reason preventing cluster autoscaler from provisioning additional node pools (for example, node auto-provisioning is disabled). Refer to the NoScaleUp top-level node auto-provisioning reasons section for details.
  • skippedMigs[].reason: Provides information about why a particular MIG was skipped. Cluster autoscaler skips some MIGs from consideration for any Pod during a scaling up attempt (for example, because adding another node would exceed cluster-wide resource limits). Refer to the NoScaleUp MIG-level reasons section for details.
  • unhandledPodGroups: Contains information about why a particular group of unschedulable Pods does not trigger scaling up. The Pods are grouped by their immediate controller. Pods without a controller are in groups by themselves. Each Pod group contains an arbitrary example Pod and the number of Pods in the group, as well as the following reasons:
    • napFailureReasons: Reasons why cluster autoscaler cannot provision a new node pool to accommodate this Pod group (for example, Pods have affinity constraints). Refer to the NoScaleUp Pod-level node auto-provisioning reasons section for details
    • rejectedMigs[].reason: Per-MIG reasons why cluster autoscaler cannot increase the size of a particular MIG to accommodate this Pod group (for example, the MIG's node is too small for the Pods). Refer to the NoScaleUp MIG-level reasons section for details.

Example

The following log sample shows a noScaleUp event:

{
  "noDecisionStatus": {
    "measureTime": "1582523362",
    "noScaleUp": {
      "skippedMigs": [
        {
          "mig": {
            "name": "test-cluster-nap-n1-highmem-4-fbdca585-grp",
            "nodepool": "nap-n1-highmem-4-1cywzhvf",
            "zone": "us-central1-f"
          },
          "reason": {
            "messageId": "no.scale.up.mig.skipped",
            "parameters": [
              "max cluster cpu limit reached"
            ]
          }
        }
      ],
      "unhandledPodGroups": [
        {
          "napFailureReasons": [
            {
              "messageId": "no.scale.up.nap.pod.zonal.resources.exceeded",
              "parameters": [
                "us-central1-f"
              ]
            }
          ],
          "podGroup": {
            "samplePod": {
              "controller": {
                "apiVersion": "v1",
                "kind": "ReplicationController",
                "name": "memory-reservation2"
              },
              "name": "memory-reservation2-6zg8m",
              "namespace": "autoscaling-1661"
            },
            "totalPodCount": 1
          },
          "rejectedMigs": [
            {
              "mig": {
                "name": "test-cluster-default-pool-b1808ff9-grp",
                "nodepool": "default-pool",
                "zone": "us-central1-f"
              },
              "reason": {
                "messageId": "no.scale.up.mig.failing.predicate",
                "parameters": [
                  "NodeResourcesFit",
                  "Insufficient memory"
                ]
              }
            }
          ]
        }
      ],
      "unhandledPodGroupsTotalCount": 1
    }
  }
}

NoScaleDown event

A noScaleDown event is periodically emitted when there are nodes which are blocked from being deleted by cluster autoscaler.

  • Nodes that cannot be removed because their utilization is high are not included in noScaleDown events.
  • NoScaleDown events are best effort, that is, these events do not cover all possible reasons for why cluster autoscaler cannot scale down.
  • NoScaleDown events are throttled to limit the produced log volume. Each persisting reason will only be emitted every couple of minutes.
  • The list of nodes is truncated to 50 arbitrary entries. The actual number of nodes can be found in the nodesTotalCount field.

Reason fields

The following fields help to explain why scaling down did not occur:

  • reason: Provides a global reason for why cluster autoscaler is prevented from scaling down (for example, a backoff period after recently scaling up). Refer to the NoScaleDown top-level reasons section for details.
  • nodes[].reason: Provides per-node reasons for why cluster autoscaler is prevented from deleting a particular node (for example, there's no place to move the node's Pods to). Refer to the NoScaleDown node-level reasons section for details.

Example

The following log sample shows a noScaleDown event:

{
  "noDecisionStatus": {
    "measureTime": "1582858723",
    "noScaleDown": {
      "nodes": [
        {
          "node": {
            "cpuRatio": 42,
            "mig": {
              "name": "test-cluster-default-pool-f74c1617-grp",
              "nodepool": "default-pool",
              "zone": "us-central1-c"
            },
            "name": "test-cluster-default-pool-f74c1617-fbhk"
          },
          "reason": {
            "messageId": "no.scale.down.node.no.place.to.move.pods"
          }
        }
      ],
      "nodesTotalCount": 1,
      "reason": {
        "messageId": "no.scale.down.in.backoff"
      }
    }
  }
}

Troubleshooting scaling issues

This section provides guidance for how to troubleshoot scaling events.

Cluster not scaling up

Scenario: I created a Pod in my cluster but it's stuck in the Pending state for the past hour. Cluster autoscaler did not provision any new nodes to accommodate the Pod.

Solution:

  1. In the Logs Explorer, find the logging details for cluster autoscaler events, as described in the Viewing events section.
  2. Search for scaleUp events that contain the desired Pod in the triggeringPods field. You can filter the log entries, including filtering by a particular JSON field value. Learn more in Advanced logs queries.

    1. Find an EventResult that contains the same eventId as the scaleUp event.
    2. Look at the errorMsg field and consult the list of possible scaleUp error messages.

    ScaleUp error example: For a scaleUp event, you discover the error is "scale.up.error.quota.exceeded", which indicates that "A scaleUp event failed because some of the MIGs could not be increased due to exceeded quota". To resolve the issue, you review your quota settings and increase the settings that are close to being exceeded. Cluster autoscaler adds a new node and the Pod is scheduled.

  3. Otherwise, search for noScaleUp events and review the following fields:

    • unhandledPodGroups: contains information about the Pod (or Pod's controller).
    • reason: provides global reasons indicating scaling up could be blocked.
    • skippedMigs: provides reasons why some MIGs might be skipped.
  4. Refer to the following sections that contain possible reasons for noScaleUp events:

    NoScaleUp example: You found a noScaleUp event for your Pod, and all MIGs in the rejectedMigs field have the same reason message ID of "no.scale.up.mig.failing.predicate" with two parameters:"NodeAffinity" and "node(s) did not match node selector". After consulting the list of error messages, you discover that you "cannot scale up a MIG because a predicate failed for it"; the parameters are the name of the failing predicate and the reason why it failed. To resolve the issue, you review the Pod spec, and discover that it has a node selector that doesn't match any MIG in the cluster. You delete the selector from the Pod spec and recreate the Pod. Cluster autoscaler adds a new node and the Pod is scheduled.

  5. If there are no noScaleUp events, use other debugging methods to resolve the issue.

Cluster not scaling down

Scenario: I have a node in my cluster that has utilized only 10% of its CPU and memory for the past couple of days. Despite the low utilization, cluster autoscaler did not delete the node as expected.

Solution:

  1. In the Logs Explorer, find the logging details for cluster autoscaler events, as described in the Viewing events section.
  2. Search for scaleDown events that contain the desired node in the nodesToBeRemoved field. You can filter the log entries, including filtering by a particular JSON field value. Learn more in Advanced logs queries.
    1. In the scaleDown event, search for an EventResult event that contains the associated eventId.
    2. Look at the errorMsg field and consult the list of possible scaleDown error messages.
  3. Otherwise, search for noScaleDown events that have the desired node in the nodes field. Review the reason field for any global reasons indicating that scaling down could be blocked.
  4. Refer to the following sections that contain possible reasons for noScaleDown events:

    NoScaleDown example: You found a noScaleDown event that contains a per-node reason for your node. The message ID is "no.scale.down.node.pod.has.local.storage" and there is a single parameter: "test-single-pod". After consulting the list of error messages, you discover this means that the "Pod is blocking scale down because it requests local storage". You consult the Kubernetes Cluster Autoscaler FAQ and find out that the solution is to add a "cluster-autoscaler.kubernetes.io/safe-to-evict": "true" annotation to the Pod. After applying the annotation, cluster autoscaler scales down the cluster correctly.

  5. If there are no noScaleDown events, use other debugging methods to resolve the issue.

Messages

The events emitted by the cluster autoscaler use parameterized messages to provide explanations for the event. The parameters field is available with the messageId field, such as in this example log for a NoScaleUp event.

This section provides descriptions for various messageId and its corresponding parameters. However, this section does not contain all possible messages, and may be extended at any time.

ScaleUp errors

Error messages for scaleUp events are found in the corresponding eventResult event, in the resultInfo.results[].errorMsg field.

Message Description Mitigation
"scale.up.error.out.of.resources" The scaleUp event failed because some of the MIGs could not be increased due to lack of resources.

Parameters: Failing MIG IDs.

Follow the resource availability troubleshooting steps.
"scale.up.error.quota.exceeded" The scaleUp event failed because some of the MIGs could not be increased, due to exceeded Compute Engine quota.

Parameters: Failing MIG IDs.

Check the Errors tab of the MIG in Google Cloud console to see what quota is being exceeded. Follow the instructions to request a quota increase.
"scale.up.error.waiting.for.instances.timeout" The scaleUp event failed because instances in some of the MIGs failed to appear in time.

Parameters: Failing MIG IDs.

This message is transient. If it persists, engage Google Cloud Support for further investigation.
"scale.up.error.ip.space.exhausted" The scaleUp event failed because the cluster doesn't have enough unallocated IP address space to use to add new nodes or Pods.

Parameters: Failing MIG IDs.

Refer to the troubleshooting steps to address the lack of IP address space for the nodes or pods.
"scale.up.error.service.account.deleted" The scaleUp event failed because a service account used by Cluster Autoscaler has been deleted.

Parameters: Failing MIG IDs.

Engage Google Cloud Support for further investigation.

ScaleDown errors

Error messages for scaleDown events are found in the corresponding eventResult event, in the resultInfo.results[].errorMsg field.

Message Description Mitigation
"scale.down.error.failed.to.mark.to.be.deleted" The scaleDown event failed because a node could not be marked for deletion.

Parameters: Failing node name.

This message is transient. If it persists, engage Google Cloud Support for further investigation.
"scale.down.error.failed.to.evict.pods" The scaleDown event failed because some of the Pods could not be evicted from a node.

Parameters: Failing node name.

Review best practices for Pod Disruption Budgets to ensure that the rules allow for eviction of application replicas when acceptable.
"scale.down.error.failed.to.delete.node.min.size.reached" The scaleDown event failed because a node could not be deleted due to the cluster already being at minimal size.

Parameters: Failing node name.

Review the minimum value set for node pool autoscaling and adjust the settings as necessary.

Reasons for a NoScaleUp event

NoScaleUp top-level reasons

Top-level reason messages for noScaleUp events appear in the noDecisionStatus.noScaleUp.reason field. The message contains a top-level reason for why cluster autoscaler cannot scale the cluster up.

Message Description Mitigation
"no.scale.up.in.backoff" A noScaleUp occurred because scaling-up is in a backoff period (temporarily blocked). This is a transient message that may occur during scale up events with a large number of Pods. If this message persists, engage Google Cloud Support for further investigation.

NoScaleUp top-level node auto-provisioning reasons

Top-level node auto-provisioning reason messages for noScaleUp events appear in the noDecisionStatus.noScaleUp.napFailureReason field. The message contains a top-level reason for why cluster autoscaler cannot provision new node pools.

Message Description Mitigation
"no.scale.up.nap.disabled" Node auto-provisioning is not enabled at the cluster level. If node auto-provisioning is disabled, new nodes will not be automatically provisioned if the pending Pod has requirements that can't be satisfied by any existing node pools. Review the cluster configuration and see Enabling Node auto-provisioning.

NoScaleUp MIG-level reasons

MIG-level reason messages for noScaleUp events appear in the noDecisionStatus.noScaleUp.skippedMigs[].reason and noDecisionStatus.noScaleUp.unhandledPodGroups[].rejectedMigs[].reason fields. The message contains a reason why cluster autoscaler cannot increase the size of a particular MIG.

Message Description Mitigation
"no.scale.up.mig.skipped" Cannot scale up a MIG because it was skipped during the simulation.

Parameters: human-readable reasons why it was skipped (for example, missing a pod requirement).

Review the parameters included in the error message and address why the MIG was skipped.
"no.scale.up.mig.failing.predicate" Cannot scale up a MIG because it does not meet the predicate requirements for the pending Pods.

Parameters: Name of the failing predicate, human-readable reasons why it failed.

Review Pod requirements, such as affinity rules, taints or tolerations, and resource requirements.

NoScaleUp Pod-group-level node auto-provisioning reasons

Pod-group-level node auto-provisioning reason messages for noScaleUp events appear in the noDecisionStatus.noScaleUp.unhandledPodGroups[].napFailureReasons[] field. The message contains a reason why cluster autoscaler cannot provision a new node pool to accommodate a particular Pod group.

Message Description Mitigation
"no.scale.up.nap.pod.gpu.no.limit.defined" Node auto-provisioning could not provision any node group because a pending Pod has a GPU request, but GPU resource limits are not defined at the cluster level.

Parameters: Requested GPU type.

Review the pending Pod's GPU request, and update the cluster-level node auto-provisioning configuration for GPU limits.
"no.scale.up.nap.pod.gpu.type.not.supported" Node auto-provisioning did not provision any node group for the Pod because it has requests for an unknown GPU type.

Parameters: Requested GPU type.

Check the pending Pod's configuration for the GPU type to ensure that it matches a supported GPU type.
"no.scale.up.nap.pod.zonal.resources.exceeded" Node auto-provisioning did not provision any node group for the Pod in this zone because doing so would either violate the cluster-wide maximum resource limits, exceed the available resources in the zone, or there is no machine type that could fit the request.

Parameters: Name of the considered zone.

Review and update cluster-wide maximum resource limits, the Pod resource requests, or the available zones for node auto-provisioning.
"no.scale.up.nap.pod.zonal.failing.predicates" Node auto-provisioning did not provision any node group for the Pod in this zone because of failing predicates.

Parameters: Name of the considered zone, human-readable reasons why predicates failed.

Review the pending Pod's requirements, such as affinity rules, taints, tolerations, or resource requirements.

Reasons for a NoScaleDown event

NoScaleDown top-level reasons

Top-level reason messages for noScaleDown events appear in the noDecisionStatus.noScaleDown.reason field. The message contains a top-level reason why cluster autoscaler cannot scale the cluster down.

Message Description Mitigation
"no.scale.down.in.backoff" A noScaleDown event occurred because scaling-down is in a backoff period (temporarily blocked). This event should be transient, and may occur when there has been a recent scale up event. Follow the mitigation steps associated with the lower-level reasons for failure to scale down. When the underlying reasons are resolved, cluster autoscaler will exit backoff. If the message persists after addressing the underlying reasons, engage Google Cloud Support for further investigation.
"no.scale.down.in.progress" A noScaleDown event occurred because scaling down is blocked until the previous node scheduled for removal is deleted. This event should be transient, as the Pod will eventually be forcibly removed. If this message occurs frequently, you can review the gracefulTerminationPeriod value for the Pod(s) blocking scale down. If you would like to speed up the resolution, you can also forcibly delete the Pod if it is no longer needed.

NoScaleDown node-level reasons

Node-level reason messages for noScaleDown events appear in the noDecisionStatus.noScaleDown.nodes[].reason field. The message contains a reason why cluster autoscaler cannot remove a particular node.

Message Description Mitigation
"no.scale.down.node.scale.down.disabled.annotation" Node cannot be removed because it has a scale-down-disabled annotation. Review the annotation that is preventing scale down following the instructions in the Kubernetes Cluster Autoscaler FAQ.
"no.scale.down.node.node.group.min.size.reached" Node cannot be removed because its node group is already at its minimum size. Review and adjust the minimum value set for node pool autoscaling.
"no.scale.down.node.minimal.resource.limits.exceeded" Scale down of an underutilized node is blocked because it would violate cluster-wide minimum resource limits set for node auto-provisioning. Review the cluster-wide minimum resource limits.
"no.scale.down.node.no.place.to.move.pods" Scale down of an underutilized node is blocked because it is running a Pod which can't be moved to another node in the cluster. If you expect that the Pod should be rescheduled, review the scheduling requirements of the Pods on the underutilized node to determine if they can be moved to another node in the cluster. This message is expected if you do not expect the Pod to be rescheduled as there are no other nodes on which it could be scheduled.
"no.scale.down.node.pod.not.backed.by.controller" Pod is blocking scale down of an underutilized node because the Pod doesn't have a controller known to Kubernetes Cluster Autoscaler (ReplicationController, DaemonSet, Job, StatefulSet, or ReplicaSet). Learn more from the Kubernetes Cluster Autoscaler FAQ about what types of pods can prevent cluster autoscaler from removing a node.

Parameters: Name of the blocking pod.

Set an annotation "cluster-autoscaler.kubernetes.io/safe-to-evict": "true" for the Pod or define a controller (ReplicationController, DaemonSet, Job, StatefulSet, or ReplicaSet).
"no.scale.down.node.pod.has.local.storage" Pod is blocking scale down because it requests local storage. Learn more from the Kubernetes Cluster Autoscaler FAQ about what types of Pods can prevent cluster autoscaler from removing a node.

Parameters: Name of the blocking pod.

Set an annotation "cluster-autoscaler.kubernetes.io/safe-to-evict": "true" for the Pod if the data in the local storage for the Pod is not critical.
"no.scale.down.node.pod.not.safe.to.evict.annotation" Pod is blocking scale down because it has a "not safe to evict" annotation. See the Kubernetes Cluster Autoscaler FAQ for more details.

Parameters: Name of the blocking pod.

If the Pod can be safely evicted, update the annotation to "cluster-autoscaler.kubernetes.io/safe-to-evict": "true".
"no.scale.down.node.pod.kube.system.unmovable" Pod is blocking scale down because it's a non-DaemonSet, non-mirrored, Pod without a PodDisruptionBudget in the kube-system namespace.

Parameters: Name of the blocking pod.

Follow the instructions in the Kubernetes Cluster Autoscaler FAQ to set a PodDisruptionBudget to enable cluster autoscaler to move Pods in the kube-system namespace.
"no.scale.down.node.pod.not.enough.pdb" Pod is blocking scale down because it doesn't have enough PodDisruptionBudget left. See the Kubernetes Cluster Autoscaler FAQ for more details.

Parameters: Name of the blocking pod.

Review the PodDisruptionBudget for the Pod, see best practices for PodDisruptionBudget. You may be able to resolve the message by scaling the application or changing the PodDisruptionBudget to allow for more unavailable Pods.
"no.scale.down.node.pod.controller.not.found" Pod is blocking scale down because its controller (e.g. Deployment or ReplicaSet) can't be found. Review the logs to determine what actions were taken that left a Pod running after its controller was removed. To resolve, you can manually delete the Pod.
"no.scale.down.node.pod.unexpected.error" Scale down of an underutilized node is blocked because it has a Pod in an unexpected error state. Engage GCP Support for further investigation.

What's next