How Instances are Managed

Instances are the computing units that App Engine uses to automatically scale your application. At any given time, your application can be running on one instance or many instances, with requests being spread across all of them.

Your instances with manual and basic scaling should run indefinitely, but there is no uptime guarantee. Hardware or software failures that cause early termination or frequent restarts can occur without warning and can take considerable time to resolve.

All flexible instances are restarted on a weekly basis. During restarts, critical, backwards-compatible updates are automatically rolled out to the underlying operating system. Your application's image will remain the same across restarts.

Health checking

App Engine sends periodic health check requests to confirm that an instance has been successfully deployed, and to check that a running instance maintains a healthy status. Each health check must be answered within a specified time interval. An instance is unhealthy when it fails to respond to a specified number of consecutive health check requests. An unhealthy instance will not receive any client requests, but health checks will still be sent. If an unhealthy instance continues to fail to respond to a predetermined number of consecutive health checks, it will be restarted.

There are two types of health checks used: liveness and readiness. Liveness checks confirm that an instance and its container are running, restarting any instances failing the check. Readiness checks confirm that an instance is ready to accept incoming requests, not forwarding requests to any instances failing the check. Both are customizable through your app's app.yaml file.

A healthy application should respond to a health check with an HTTP status code of 200.

Monitoring resource usage

The Instances page of the Cloud Console provides visibility into how your instances are performing. You can see the memory and CPU usage of each instance, uptime, number of requests, and other statistics. You can also manually initiate the shutdown process for any instance.

Instance location

Instances are automatically located by geographical region according to the project settings.

Instance scaling

While an application is running, incoming requests are routed to an existing or new instance of the appropriate service/version. Each active version must have at least one instance running, and the scaling type of a service/version controls how additional instances are created. Scaling settings are configured in the app.yaml file. There are two scaling types:

Automatic scaling
Automatic scaling creates instances based on request rate, response latencies, and other application metrics. You can specify thresholds for each of these metrics, as well as a minimum number instances to keep running at all times.
Manual scaling
Manual scaling specifies the number of instances that continuously run regardless of the load level. This allows tasks such as complex initializations and applications that rely on the state of the memory over time.