Instances are the computing units that App Engine uses to automatically scale your application. At any given time, your application can be running on one instance or many instances, with requests being spread across all of them.
Your instances with manual and basic scaling should run indefinitely, but there is no uptime guarantee. Hardware or software failures that cause early termination or frequent restarts can occur without warning and can take considerable time to resolve.
All flexible instances are restarted on a weekly basis. During restarts, critical, backwards-compatible updates are automatically rolled out to the underlying operating system. Your application's image will remain the same across restarts.
App Engine sends periodic health check requests to confirm that an instance has been successfully deployed, and to check that a running instance maintains a healthy status. Each health check must be answered within a specified time interval. An instance is unhealthy when it fails to respond to a specified number of consecutive health check requests. An unhealthy instance will not receive any client requests, but health checks will still be sent. If an unhealthy instance continues to fail to respond to a predetermined number of consecutive health checks, it will be restarted.
Health check requests are enabled by default, with default threshold values. You can customize health checking by adding an optional health check section to your app's configuration file.
You do not have to do anything special to implement health checking. If your app
does not handle health checks, a HTTP
404 response is interpreted as a
You can write your own custom health-checking code.
It should reply to
/_ah/health requests with a HTTP status code
response must include a message body, however, the value of the body is ignored
(it can be empty).
Monitoring resource usage
The Instances page of the GCP Console provides visibility into how your instances are performing. You can see the memory and CPU usage of each instance, uptime, number of requests, and other statistics. You can also manually initiate the shutdown process for any instance.
Instances are automatically located by geographical region according to the project settings.
While an application is running, incoming requests are routed to an existing or
new instance of the appropriate service/version. The scaling type of a
service/version controls how instances are created. Scaling settings are
configured in the
file. There are two scaling types:
- Manual scaling
- A service with manual scaling use resident instances that continuously run the specified number of instances irrespective of the load level. This allows tasks such as comple initializations and applications that rely on the state of the memory over time.
- Automatic scaling
- Auto scaling services use dynamic instances that get created based on request rate, response latencies, and other application metrics. However, if you specify a number of minimum idle instances, that specified number of instances run as resident instances while any additional instances are dynamic.