How instances are managed

Instances are the computing units that App Engine uses to automatically scale your application. At any given time, your application can be running on one instance or many instances, with requests being spread across all of them.

Your instances with manual scaling should run indefinitely, but there is no uptime guarantee as instances can get early termination due to failures or restart for the updates. Hardware or software failures that cause early termination or frequent restarts can occur without warning and can take considerable time to resolve.

All flexible instances may be restarted on a weekly basis if there are updates available. This schedule is not guaranteed. During restarts, critical, backwards-compatible updates are automatically rolled out to the underlying operating system. Your application's image will remain the same across restarts.

Health checking

App Engine sends periodic health check requests to confirm that an instance is running, and to check that an instance is fully started and ready to accept incoming requests. By default, these health checks are enabled and are known as split health checks. An instance that receives a health check must answer the health check within a specified time interval.

If you need to extend the default behavior of split health checks to your application, you can customize the app.yaml file to configure two types of health checks:

  • Liveness checks detect that a VM instance and its container are running. When a VM instance fails the liveness check, the instance is restarted automatically. Liveness checks can fail due to the configured thresholds and time intervals, or due to the container crashing.
  • Readiness checks detect that a VM instance is ready to accept incoming requests. If a VM instance fails the readiness check, it means that the VM instance has not finished its startup and is not ready to receive requests. When the VM instance passes the readiness check and has completed its startup, it is added to the pool of available instances.

Learn more about split health check behaviors in the Migrating to Split Health Checks guide.

As the instance goes through these health checks, the App Engine logs can indicate that the instance is in any of the following states:

  • Healthy. The instance received the health check requests and is processing the requests. A healthy application should respond to a health check with an HTTP status code of 200.
  • Unhealthy. The instance refused the health check requests and failed to respond to a specified number of consecutive health check requests. App Engine continues to send health check requests and restarts the instance if an unhealthy instance continues to fail to respond to a predetermined number of consecutive health checks.
  • Lameduck. The instance is scheduled to be shut down or restarted. During shutdowns, the instance finishes up ongoing requests, and refuses new requests. The app returns a 503 code to indicate that the instance is unable to handle requests. Before an instance shuts down or restarts, the shutdown script has a limited time period to run, and cannot be configured to be shorter or longer.
  • App Lameduck. The instance is preparing to serve traffic. The app returns a 503 code to indicate that the instance is unable to handle requests. When a VM instance has completed startup and is ready to serve traffic, the instance will become healthy and process requests. If a VM instance does not start up in time, the instance changes to unhealthy and gets removed.

Both lameduck and app lameduck behaviors are part of a normal process that the VM instance goes through.

Monitoring resource usage

The Instances page of the Google Cloud console provides visibility into how your instances are performing. You can see the memory and CPU usage of each instance, uptime, number of requests, and other statistics. You can also manually initiate the shutdown process for any instance.

NTP with App Engine flexible environment

The App Engine flexible environment has network time protocol (NTP) services which use Google NTP servers. However, the NTP services in the flexible environment is not editable.

Instance location

Instances are automatically located by geographical region according to the project settings.

Instance scaling

While an application is running, incoming requests are routed to an existing or new instance of the appropriate service/version. Each active version must have at least one instance running, and the scaling type of a service/version controls how additional instances are created. You specify the scaling type in your app's app.yaml. By default, your app uses automatic scaling, which means App Engine will manage the number of idle instances.

Automatic scaling
Automatic scaling creates instances based on request rate, response latencies, and other application metrics. You can specify thresholds for each of these metrics, as well as a minimum number instances to keep running at all times by configuring the automatic_scaling element.
Manual scaling
Manual scaling specifies the number of instances that continuously run regardless of the load level. This allows tasks such as complex initializations and applications that rely on the state of the memory over time.