[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-09-01。"],[],[],null,["# Container instance autoscaling\n\nIn Knative serving, each [revision](/kubernetes-engine/enterprise/knative-serving/docs/resource-model#revisions)\nis automatically scaled to the number of container instances needed to handle\nall incoming requests. When a revision does not receive any traffic, by default\nit is scaled to zero container instances. However, if desired, you can\nchange this default to specify an instance to be kept idle or \"warm\" using\nthe [minimum instances](/kubernetes-engine/enterprise/knative-serving/docs/configuring/min-instances) setting.\n\nThe number of instances scheduled is impacted by:\n\n- The amount of CPU needed to process a request\n- The [concurrency setting](/kubernetes-engine/enterprise/knative-serving/docs/concurrency)\n- The [maximum number of container instances setting](/kubernetes-engine/enterprise/knative-serving/docs/configuring/max-instances)\n- The [minimum number of container instances setting](/kubernetes-engine/enterprise/knative-serving/docs/configuring/min-instances)\n\nIn some cases you may want to limit the total number of container instances\nthat can be started, for cost control reasons, or for better compatibility with\nother resources used by your service. For example, your Knative serving\nservice might interact with a database that can only handle a certain number of\nconcurrent open connections.\n\nAbout maximum container instances\n---------------------------------\n\nYou can use the maximum container instances setting to limit the total number of\ninstances that can be started in parallel, as documented in\n[Setting a maximum number of container instances](/kubernetes-engine/enterprise/knative-serving/docs/configuring/max-instances).\n\n### Exceeding maximum instances\n\nUnder normal circumstances, your revision scales out by creating new instances\nto handle incoming traffic load. But when you set a maximum instances limit, in some\nscenarios there will be insufficient instances to meet that traffic load. In\nthat case, incoming requests queue for up to 60 seconds. During this 60 second\nwindow, if an instance finishes processing requests, it becomes available to\nprocess queued requests. If no instances become available during the 60 second\nwindow, the request fails with a `429` error code on Cloud Run.\n\n### Scaling guarantees\n\nThe maximum instances limit is an upper limit. Setting a high limit does not mean\nthat your revision will scale out to the specified number of container instances.\nIt only means that the number of container instances at any point in time should\nnot exceed the limit.\n\n### Traffic spikes\n\nIn some cases, such as rapid traffic surges, Knative serving may, for a short\nperiod of time, create slightly *more* container instances than the specified\nmax instances value. If your service cannot tolerate this temporary behavior,\nyou may want to factor in a safety margin and set a lower max instances value.\n\nDeployments\n-----------\n\nWhen you deploy a new revision, Knative serving gradually migrates traffic\nfrom the old revision to the new one. Because maximum instances limits are set for\neach revision, you may temporarily exceed the specified limit during the period\nafter deployment.\n\nIdle instances and minimizing cold starts\n-----------------------------------------\n\nKubernetes resources are only consumed when an instance is handling a request,\nbut this does not mean that Knative serving immediately shuts down\ninstances once they have handled all requests. To minimize the impact of cold\nstarts, Knative serving may keep some instances idle. These\ninstances are ready to handle requests in case of a sudden traffic spike.\n\nFor example, when a container instance has finished handling requests, it may\nremain idle for a period of time in case another request needs to\nbe handled. An idle container instance may persist resources, such as open\ndatabase connections. However, for Cloud Run, the\n[CPU will not be available](/kubernetes-engine/enterprise/knative-serving/docs/reference/container-contract#cpu)\n\nTo keep idle instances *permanently* available, use the\n[`min-instance`](/kubernetes-engine/enterprise/knative-serving/docs/configuring/min-instances) setting.\n\nWhat's next\n-----------\n\n- To manage the maximum number of instances of your Knative serving services, see [Setting a maximum number of container instances](/kubernetes-engine/enterprise/knative-serving/docs/configuring/max-instances).\n- To manage the maximum number of simultaneous requests handled by each container instance, see [Setting concurrency](/kubernetes-engine/enterprise/knative-serving/docs/configuring/concurrency).\n- To optimize your concurrency setting, see [development tips for tuning concurrency](/kubernetes-engine/enterprise/knative-serving/docs/tips/general#tuning-concurrency).\n- To specify an idle instance to keep running to minimize latency or cold starts on first requests, see [Using `min-instance` to enable idle instances](/kubernetes-engine/enterprise/knative-serving/docs/configuring/min-instances)."]]