Instances are the computing units that App Engine uses to automatically scale your application. At any given time, your application can be running on one instance or many instances, with requests being spread across all of them.
Your instances with manual and basic scaling should run indefinitely, but there is no uptime guarantee. Hardware or software failures that cause early termination or frequent restarts can occur without warning and can take considerable time to resolve.
All flexible instances are restarted on a weekly basis. During restarts, critical, backwards-compatible updates are automatically rolled out to the underlying operating system. Your application's image will remain the same across restarts.
App Engine sends periodic health check requests to confirm that an instance is running, and to check that an instance is fully started and ready to accept incoming requests. By default, these health checks are enabled and are known as split health checks. An instance that receives a health check must answer the health check within a specified time interval.
If you need to extend the default behavior of split health checks to your
application, you can customize the
app.yaml file to configure two types of health checks:
- Liveness checks detect that a VM instance and its container are running. When a VM instance fails the liveness check, the instance is restarted automatically. Liveness checks can fail due to the configured thresholds and time intervals, or due to the container crashing.
- Readiness checks detect that a VM instance is ready to accept incoming requests. If a VM instance fails the readiness check, it means that the VM instance has not finished its startup and is not ready to receive requests. When the VM instance passes the readiness check and has completed its startup, it is added to the pool of available instances.
Learn more about split health check behaviors in the Migrating to Split Health Checks guide.
As the instance goes through these health checks, the App Engine logs can indicate that the instance is in any of the following states:
- Healthy. The instance received the health check requests and is processing
the requests. A healthy application should respond to a health check with
an HTTP status code of
- Unhealthy. The instance refused the health check requests and failed to respond to a specified number of consecutive health check requests. App Engine continues to send health check requests and restarts the instance if an unhealthy instance continues to fail to respond to a predetermined number of consecutive health checks.
- Lameduck. The instance is scheduled to be shut down or restarted.
During shutdowns, the instance finishes up ongoing requests, and refuses new
requests. The app returns a
503code to indicate that the instance is unable to handle requests. Before an instance shuts down or restarts, the shutdown script has a limited time period to run, and cannot be configured to be shorter or longer.
- App Lameduck. The instance is preparing to serve traffic. The app
503code to indicate that the instance is unable to handle requests. When a VM instance has completed startup and is ready to serve traffic, the instance will become healthy and process requests. If a VM instance does not start up in time, the instance changes to unhealthy and gets removed.
Both lameduck and app lameduck behaviors are part of a normal process that the VM instance goes through.
Monitoring resource usage
The Instances page of the Cloud Console provides visibility into how your instances are performing. You can see the memory and CPU usage of each instance, uptime, number of requests, and other statistics. You can also manually initiate the shutdown process for any instance.
Instances are automatically located by geographical region according to the project settings.
While an application is running, incoming requests are routed to an existing or
new instance of the appropriate service/version. Each active version must have
at least one instance running, and the scaling type of a service/version
controls how additional instances are created. Scaling settings are
configured in the
file. There are two scaling types:
- Automatic scaling
- Automatic scaling creates instances based on request rate, response latencies, and other application metrics. You can specify thresholds for each of these metrics, as well as a minimum number instances to keep running at all times.
- Manual scaling
- Manual scaling specifies the number of instances that continuously run regardless of the load level. This allows tasks such as complex initializations and applications that rely on the state of the memory over time.