[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-04 (世界標準時間)。"],[],[],null,["# About maximum instances\n\nBy default, Cloud Run services have a maximum number of instances determined by the lowest of the following relevant quota limits. The maximum limit for each region is also impacted\nby the CPU and memory configuration for your Cloud Run service. Specifically,\nthe maximum number of instances available for your service is the minimum of each of the\nfollowing:\n\n- regional [instance limit quota](https://console.cloud.google.com/iam-admin/quotas?pageState=(%22allQuotasTable%22:(%22f%22:%22%255B%257B_22k_22_3A_22Metric_22_2C_22t_22_3A10_2C_22v_22_3A_22_5C_22run.googleapis.com%252Finstance_limit_regional_5C_22_22_2C_22s_22_3Atrue_2C_22i_22_3A_22metricName_22%257D%255D%22))) baseline divided by the requested multiple of 1 CPU\n- regional instance limit quota baseline divided by the requested multiple of 2GB memory\n- regional [CPU quota](https://console.cloud.google.com/iam-admin/quotas?pageState=(%22allQuotasTable%22:(%22f%22:%22%255B%257B_22k_22_3A_22Metric_22_2C_22t_22_3A10_2C_22v_22_3A_22_5C_22run.googleapis.com%252Fcpu_allocation_5C_22_22_2C_22s_22_3Atrue_2C_22i_22_3A_22metricName_22%257D%255D%22))) divided by the CPU configuration for the service.\n- regional [memory quota](https://console.cloud.google.com/iam-admin/quotas?pageState=(%22allQuotasTable%22:(%22f%22:%22%255B%257B_22k_22_3A_22_22_2C_22t_22_3A10_2C_22v_22_3A_22_5C_22run.googleapis.com%252Fmem_allocation_5C_22_22_2C_22s_22_3Atrue%257D%255D%22))) divided by the memory configuration for the service.\n- regional [GPU quota](https://console.cloud.google.com/iam-admin/quotas?pageState=(%22allQuotasTable%22:(%22f%22:%22%255B%257B_22k_22_3A_22Metric_22_2C_22t_22_3A10_2C_22v_22_3A_22_5C_22run.googleapis.com%252Fnvidia_l4_gpu_allocation_5C_22_22_2C_22i_22_3A_22metricName_22%257D_2C%257B_22k_22_3A_22_22_2C_22t_22_3A10_2C_22v_22_3A_22_5C_22OR_5C_22_22_2C_22o_22_3Atrue%257D_2C%257B_22k_22_3A_22_22_2C_22t_22_3A10_2C_22v_22_3A_22_5C_22run.googleapis.com%252Fnvidia_l4_gpu_allocation_no_zonal_redundancy_5C_22_22_2C_22s_22_3Atrue%257D%255D%22))), with or without zonal redundancy, divided by the GPU configuration for the service.\n\nFor example, a baseline instance limit quota of 1000 instances with either 4GB memory or 2 CPU and 2000 vCPU limit and 4000 GiBy limit\nwill get an effective limit of 500.\n\nYou can see the baseline instance limit per-region quota for your region in the [quotas page](https://console.cloud.google.com/iam-admin/quotas?service=run.googleapis.com&usage=ALL) in the console.\n\nHow to increase baseline regional quota\n---------------------------------------\n\nIf you need a greater maximum number of instances for your region your Cloud Run service is\ndeployed to, you can [request a quota increase](/run/quotas#increase).\n\nBest practices for setting maximum instances\n--------------------------------------------\n\nThe following section describes the best practices for configuring maximum\ninstance limits for your services.\n\n### Optimal maximum instance value for event-driven services\n\nEvent driven services, such as functions, can experience sporadic traffic spikes\nbased on incoming events. To determine an optimal maximum instance value for\nthese services, you need to consider factors such as, service invocation time,\nexpected average invocation, peak invocation frequency, and fault tolerance for\ninvocation failures.\n\nA good rule of thumb is to start with a maximum instances\nvalue of 3, then monitor for invocation failures and adjust the maximum\ninstances value upward as necessary.\n\n### Handle requests when all instances are busy\n\nUnder normal circumstances, your service scales up by creating new instances to\nhandle incoming traffic load. But when you have set a maximum instances limit,\nyou might encounter a scenario where there are insufficient instances to meet\nincoming traffic load.\n\nIn that scenario, Cloud Run attempts to serve a new inbound request for\nup to 30 seconds:\n\n- If an instance finishes processing its request during this time period, it might start to process the new inbound request.\n- If no instance becomes available, the request will fail.\n\nCloud Run automatically saves events destined for event-driven services\nuntil capacity is available.\n\n### Maximum instance limits that exceed Cloud Run's scaling ability\n\nWhen you specify a maximum instances limit, you are specifying an upper limit.\nSetting a large limit does not mean that your service will scale up to the\nspecified number of instances. It only means that the number of instances that\nco-exist at any point in time shouldn't exceed the limit.\n\nFurther, setting a maximum instances limit might affect the scaling strategies\nthat Cloud Run uses to meet your traffic demand. In general,\nCloud Run will prioritize honoring your specified limit rather than\nscaling up and potentially exceeding your limit.\n\n### Handle traffic spikes\n\nIn some cases, such as rapid traffic surges, Cloud Run might, for a short\nperiod of time, create more instances than the specified maximum instances\nlimit. If your service can't tolerate this temporary behavior, you might want\nto factor in a safety margin and set a lower maximum instances value than your\nservice can tolerate.\n\n### Deployments\n\nWhen you deploy a new revision, Cloud Run migrates traffic\nfrom the earlier revision to the new one. Because maximum instance limits are set\nfor each revision independently, you might temporarily exceed\nthe specified limit during the period after deployment.\n\nFor example, a service might have a maximum instances limit of 5. Under normal\ncircumstances, the service scales up to 5 instances as it handles requests.\nWhen you deploy a new revision, the new revision has its own max\ninstances limit of 5.\n\nRequests that are already being handled by the previous revision\naren't interrupted when you deploy a new revision. Instead,\nthese requests continue to make progress. New inbound requests will be\nhandled by the newly-deployed revision of your service.\n\nThus, the service in the previous example might have up to 10 total instances\n(5 for each revision) during the period after deploying the new\nrevision. The amount of time required for instances of the previous revision to\nterminate depends on the time required for those instances to finish handling\nany active requests. This is an additional factor to take into account when\nselecting an appropriate max instances limit."]]