Most App Engine apps use automatic scaling to serve user traffic, so that the infrastructure can adapt to spikes and allocate resources efficiently throughout the day. You can tune the performance of the scaling infrastructure depending on your needs.
There are several settings you can specify in your app configuration—or more specifically, your frontend module's automatic scaling configuration—in the configuration file. The manual for your chosen runtime environment describes these in detail: Java, Python, Go, PHP. (We'll use
app.yaml examples in this article, but the same settings are available for Java.) These settings take effect when you deploy a new version of your app.
In some cases, you want to optimize to minimize cost. In other cases, you want to serve heavy request volume quickly. These settings allow you to
- Set the Frontend Instance Class. Your application can use faster, "bigger," more expensive servers.
- Configure the Scheduler. The App Engine scheduler controls how your application responds to increased load. As more requests come in, the scheduler might start up more servers or queue incoming requests.
Setting the Frontend Instance Class
App Engine provides several different classes of frontend instances, each with different memory and and CPU limits. These classes allow you to configure your frontend instance with the processing capacity you need to perform your work. Each class has a specific hourly billing rate. Please see Billable Quota Unit Costs for pricing.
Important: Currently, when you are billed for instance hours, you will not see any instance classes in your billing line items. Instead, you will see the appropriate multiple of instance hours. For example, if you use an F4 instance for one hour, you do not see "F4" listed, but you will see billing for four instance hours at the F1 rate.
The default class for frontends is F1, which gives you 128MB of memory and 600MHz of CPU capacity.
The instance class setting is configured for each module by its settings file. The instance class chosen represents a memory size and processing power, with larger memory sizes and processing power providing extra performance but at an increased cost. The same instance class is used for all of the instances used by all versions of your module.
Frontend instances are priced based on an hourly rate determined by the frontend class. The following table describes the cost for each class:
|Frontend class||Memory limit||CPU limit||Cost per hour per instance|
When charging in local currency, Google will convert the prices listed into applicable local currency pursuant to the conversion rates published by leading financial institutions.
Here is an example of setting the instance class for the default module to use a frontend
F2 class in the
Configuring the Scheduler
App Engine's scheduler is responsible for routing incoming requests to be served by your app's instances. Sometimes the volume of incoming requests exceeds the capacity of the instances currently available to your app. When this happens, incoming requests may have to wait in the Pending Queue until busy instances become available, or until the scheduler starts new instances.
The scheduler is responsible for deciding how to serve your app's request load. Under regular conditions, it may spin up new idle instances to absorb traffic and minimize latency in the event of a sudden load spike. Because new instances take time to create, unusually heavy surges of traffic may consume all available idle instances faster than the scheduler can create new ones. This can cause your users to experience delays (latency) in the serving of requests.
The default settings enable App Engine's scheduling algorithm to scale the number of instances based on your recent request load and latency profile. If you use manual settings instead, you may need to adjust them continually as your request volume changes.
Setting the Number of Idle Instances
You can set a minimum number of idle instances available to the app for absorbing changes in load.
Note: In order to specify the minimum number of idle instances, you must have a paid app.
- A low minimum helps keep your running costs down during idle periods, but means that fewer instances may be immediately available to respond to a sudden load spike.
- A high minimum allows you to prime the application for rapid spikes in request load. App Engine keeps that number of instances in reserve at all times, so an instance is always available to serve an incoming request, but you pay for those instances. Once you've set the minimum number of idle instances, you can see these instances marked as "Resident" in the Instances tab of the Admin Console.
Note: If you set a minimum number of idle instances, the pending latency setting will have less effect on your application's performance. Because App Engine keeps idle instances in reserve, it is unlikely that requests will enter the pending queue except in exceptionally high load spikes. You will need to test your application and expected traffic volume to determine the ideal number of instances to keep in reserve.
You can also set a maximum number of idle instances.
- A high maximum reduces the number of idle instances more gradually when load levels return to normal after a spike. This helps your application maintain steady performance through fluctuations in request load, but also raises the number of idle instances (and consequent running costs) during such periods of heavy load.
- A low maximum keeps running costs lower, but can degrade performance in the face of volatile load levels.
- You can set the maximum to
automaticto let App Engine decide what's best under the current traffic conditions. This is the default.
Note: When settling back to normal levels after a load spike, the number of idle instances may temporarily exceed your specified maximum. However, you will not be charged for more instances than the maximum number you've specified.
Here is what these settings look like in
automatic_scaling: min_idle_instances: 5 max_idle_instances: automatic
Setting the Pending Latency
Note: In order to specify the maximum pending latency, you must have a paid app.
Pending request latency arises when all of your application's available instances are too busy to serve new requests. When this happens, incoming requests go to a pending request queue. The scheduler automatically manages creation of new instances for pending requests, but you can adjust its behavior through minimum and maximum latency settings. These settings effectively control how long a request waits in the pending queue when there are no available instances: no less than the minimum, and no more than the maximum.
The App Engine scheduler will always wait until the specified minimum latency for a free instance to become available. Once the minimum is reached, it applies heuristics to determine if and when to start a new instance. (Waiting for an existing instance to become free may be faster than starting a new one.) If the request is still pending when the specified maximum latency is reached, App Engine immediately starts a new instance to serve it.
Note: If you set a minimum number of idle instances, the Pending Latency controls will have little or no effect on your app's performance. See Minimum Idle Instances for more information.
The minimum pending latency setting is the amount of time (at least 10 milliseconds) that the scheduler will wait for a free instance to serve the request.
- A low minimum means requests must spend less time in the pending queue when all existing instances are active. This improves performance but increases the cost of running your application.
- A high minimum means requests will remain pending longer if all existing instances are active. This lowers running costs but increases the time users must wait for their requests to be served.
The maximum pending latency setting is the amount of time (at most 15 seconds) that the scheduler will wait before resolving to create a new instance for the request.
- A low maximum means App Engine will start new instances sooner for pending requests, improving performance but raising running costs.
- A high maximum means users may wait longer for their requests to be served (if there are pending requests and no idle instances to serve them), but your application will cost less to run.
Here is what these settings look like in
automatic_scaling: min_pending_latency: 30ms max_pending_latency: automatic