Instances are the computing units that App Engine uses to automatically scale your application. At any given time, your application can be running on one instance or many instances, with requests being spread across all of them.
Your instances with manual and basic scaling should run indefinitely, but there is no uptime guarantee. Hardware or software failures that cause early termination or frequent restarts can occur without warning and can take considerable time to resolve.
All flexible instances are restarted on a weekly basis. During restarts, critical, backwards-compatible updates are automatically rolled out to the underlying operating system. Your application's image will remain the same across restarts.
App Engine sends periodic health check requests to confirm that an instance has been successfully deployed, and to check that a running instance maintains a healthy status. Each health check must be answered within a specified time interval. An instance is unhealthy when it fails to respond to a specified number of consecutive health check requests. An unhealthy instance will not receive any client requests, but health checks will still be sent. If an unhealthy instance continues to fail to respond to a predetermined number of consecutive health checks, it will be restarted.
There are two types of health checks: updated and legacy.
Updated health check requests are enabled by default and have default threshold
values. You can customize health checking by adding an optional health check
section to your app's
app.yaml file. You can also disable health checks entirely.
Whichever type of health check you decide to use, a healthy application should
respond with an HTTP status code of
Monitoring resource usage
The Instances page of the GCP Console provides visibility into how your instances are performing. You can see the memory and CPU usage of each instance, uptime, number of requests, and other statistics. You can also manually initiate the shutdown process for any instance.
Instances are automatically located by geographical region according to the project settings.
While an application is running, incoming requests are routed to an existing or
new instance of the appropriate service/version. The scaling type of a
service/version controls how instances are created. Scaling settings are
configured in the
file. There are two scaling types:
- Manual scaling
- A service with manual scaling uses resident instances that continuously run the specified number of instances irrespective of the load level. This allows tasks such as complex initializations and applications that rely on the state of the memory over time.
- Automatic scaling
- Auto scaling services use dynamic instances that get created based on request rate, response latencies, and other application metrics. However, if you specify a number of minimum idle instances, that specified number of instances run as resident instances while any additional instances are dynamic.