Managing cost and reliability in fully managed applications
In both good times and challenging ones, running an application on a fully managed serverless environment has lots of benefits. If you experience extremely high demand, your application scales automatically, avoiding crashes or downtime. And if you see a contraction of demand, then the application scales down and saves you money.
But big changes in customer demand can lead to unexpected system behavior—and bills. In times of uncertainty, you may want to temporarily reduce your overall spend, or simply gain a measure of predictability—all while maintaining an acceptable service level.
At Google Cloud, we have several serverless compute products in our portfolio—App Engine, Cloud Run, and Cloud Functions—all used for different use cases, and each one featuring different ways to help you control costs and plan for traffic spikes. In this blog post, we present a set of simple tasks and checks you can perform to both minimize downtime and mitigate unexpected costs for your serverless applications.
Whether you want to reduce your overall serverless bill, or simply want to put safeguards in place to prevent cost overruns, here are some approaches you can use.
Set maximum instances
Google Cloud serverless infrastructure tries to optimize both the number of instances in your application (fewer instances will cost less) as well as the request latency (more instances can lower latency). All of our serverless offerings allow you to set a maximum number of instances for a given application, service or function.
This is a powerful feature, but one that you should use wisely. Setting a ’max-instances’ value low may result in a lower overall bill, but may also increase request latency or request timeouts, since requests which cannot be served by an instance will be queued, and may eventually time out.
Conversely, setting a high value or disabling max-instances will result in optimal request latency, but a higher overall cost—especially if there is a spike in traffic.
Choosing the right number of maximum instances depends on your traffic and your desired request latency. How you configure this setting varies by product:
App Engine provides a Cloud Monitoring metric (appengine.googleapis.com/system/instance_count) that you can use to estimate the number of instances your application needs under normal circumstances. You can then change the max instances value for App Engine via the app.yaml file:
Learn more about managing instances in App Engine.
You can use the "billable container instance time" metric to estimate how many instances are used to run your application; as an example, if you see "100s/s", it means around 100 instances were scheduled. You may want to set a buffer of up to 30% to preserve your application’s current performance characteristics (e.g. 130 max instances for 100s/s of traffic).
You can change the max instances value for Cloud Run via the command line:
gcloud run services update SERVICE --max-instances MAX-VALUE
Another element of managing Cloud Run costs is how it handles the automatic scaling of instances to handle incoming requests. By default Cloud Run container instances can receive several requests at the same time; you can control the maximum number of those requests that an instance can respond to with the concurrency setting. Cloud Run will automatically determine how many requests to send to a given instance based on the instance’s CPU and memory utilization. You can set a maximum to this value by adjusting the concurrency of your Cloud Run service. If you are using a lower value than the default (80), we recommend you try to increase the concurrency setting prior to changing max instances, as simply increasing concurrency can reduce the number of instances required.
Learn more about Cloud Run's instance automatic scaling.
Cloud Functions provides a Cloud Monitoring metric (cloudfunctions.googleapis.com/function/active_instances) that you can use to estimate the number of instances your function needs under normal circumstances.
You can change the max instances value for Cloud Functions via the command line:
gcloud functions deploy FUNCTION_NAME --max-instances MAX-VALUE
Learn more about managing instances in Cloud Functions.
Set budget alerts
With or without changes to your application to reduce its footprint, budget alerts can provide an important early-warning signal of unexpected increases in your bill. Setting a budget alert is a straightforward process, and you can configure them to alert you via email or via Cloud Pub/Sub. That, in turn, can trigger a Cloud Function, so you can handle the alert programmatically.
Labels allow you to assign a simple text value to a particular resource that you can then use to filter charges on your bill. For example, you may have an application that consists of several Cloud Run services and a Cloud Function. By applying a consistent label to these resources, you can see the overall impact of this multi-service application on your bill. This will help identify areas of your Google Cloud usage that contribute the most to your bill and allow you to take targeted action on them. For more, see
Set instance class sizing
All of our serverless compute products allow some amount of choice when it comes to how much memory or CPU is available to your application. Provisioning larger values for these resources typically results in a higher price. However, in some cases choosing more powerful instances can actually reduce your overall bill.
For workloads that consume a lot of CPU, a large allocation of CPU (or more specifically, a greater number of CPU cycles per second) can result in shorter execution times and therefore result in fewer instances of your application being created. While there isn't a one-size-fits-all recommendation for instance class sizing, in general applications that use a lot of CPU benefit from being granted a larger allocation of CPU. Conversely, you may also be over-provisioned on CPU that your application is not fully utilizing, which may suggest that a smaller instance (at lower cost) would be able to serve the traffic to your application.
Let’s take a look at how to size instances across the various Google Cloud serverless platforms.
App Engine standard environment
At this time App Engine standard environment doesn’t provide a per-instance metric for CPU utilization. However, you can track an application’s overall CPU usage across all instances using the appengine.googleapis.com/system/cpu/usage metric. An application that is largely CPU-bound may benefit from larger instance classes, which would result in an overall reduction in CPU usage across an application due to requiring fewer instances and fewer instance creation events.
App Engine flexible environment
App Engine flexible environment provides a CPU Utilization metric (appengine.googleapis.com/flex/instance/cpu/utilization) that allows you to track the per-instance CPU utilization of your application.
Cloud Run provides a CPU Utilization distribution metric (run.googleapis.com/container/cpu/utilizations) that shows a percentile distribution of CPU utilization across all instances of a Cloud Run service.
At this time, Cloud Functions does not provide a metric to report CPU utilization, and the best way to determine the optimal instance class is via experimentation. You can monitor the impact of an increase in allocated CPU by monitoring the execution time of your functions (cloudfunctions.googleapis.com/function/execution_times). CPU-bound functions typically report shorter execution times if they are granted larger CPU resources.
Regardless of whether you may need larger, or smaller instances, we recommend using traffic management to help find the optimal configuration. First, create a new revision (or version in the case of App Engine) of your service or application with the changes to your configuration. That done, monitor the afore-mentioned metrics to see if there is an improvement.
Preparing to scale
If you're experiencing higher than anticipated demand for your service, there are a few things you should check to ensure your application is well-prepared to handle significant increases in traffic.
Check max instances
As a corollary to the cost management advice above, if you’re more concerned about application performance and reliability than cost control, then you should double-check that any max instances setting you have in place is appropriate.
Learn more about managing instances in App Engine
Learn more about managing instances in Cloud Run
Learn more about managing instances in Cloud Functions
Resource quotas are set up to make sure you don’t consume more resources than you forecast and avoid facing a higher than expected bill. But if your application is getting more traffic than was forecast, you may need to increase your resource quotas to avoid going down when your customers need you the most. You can change some quotas directly via the Google Cloud Console, while you need to change others via support ticket. You can check your current usage against the quotas for your service via the Quotas page in the Cloud Console.
Learn more about quotas in App Engine
Learn more about quotas in Cloud Run
Learn more about quotas in Cloud Functions
Putting it all together
If what you want is an application that scales automatically with demand, building on a serverless platform is a great place to start. But there are lots of actions you can take to make sure it scales efficiently, without sacrificing performance or incurring in unintended costs. To learn more about how to use serverless compute products for your next application, explore our other serverless offerings.