An Overview of App Engine

App Engine and Services

At the highest level, an App Engine application is made up of one or more services. Services let developers factor large applications into logical components. These components can then share App Engine features, such as Memcache, and communicate in a secure fashion, but can, if desired, be configured to use different runtimes and to operate with different performance settings.

A deployed service behaves like a microservice. By using multiple services you can deploy your app as a set of microservices.

An app that handles customer requests might include separate services to handle other tasks, such as:

  • API requests from mobile devices
  • Internal, admin-like requests
  • Backend processing such as billing pipelines and data analysis

Pieces of code are defined as services by an entry in an app.yaml configuration file. When you deploy your code, the related configuration file is deployed alongside it.

Versions and instances

Each service consists of source code and the configuration file. The files used by a service represent a version of the service. When you deploy a service, you always deploy a specific version of the service. Having versions for each of your services allows you to roll back with a single click in the GCP Console, or to use traffic splitting to gradually increase traffic to the newly deployed version of a service.

Each service and each version must have a name. Choose a unique name for each service and each version. Don't reuse names between services and versions.

While running, a particular version will have one or more instances. App Engine by default scales the number of instances running up and down to match the load, thus providing consistent performance for your app at all times while minimizing idle instances and thus reducing cost.

The diagram below illustrates the hierarchy of a running App Engine application:

Hierarchy graph of services/versions/instances

Scaling types and instance classes

When you upload a version of a service, the configuration file specifies a scaling type and instance class that apply to every instance of that version. The scaling type controls how instances are created. The instance class determines compute resources (memory size and CPU speed) and pricing. There are three scaling types: manual, basic, and automatic. The available instance classes depend on the scaling type.

Automatic Scaling
Automatic scaling is based on request rate, response latencies, and other application metrics.
Manual Scaling
A service with manual scaling runs continuously, allowing you to perform complex initialization and rely on the state of its memory over time.
Basic Scaling
A service with basic scaling will create an instance when the application receives a request. The instance will be turned down when the app becomes idle. Basic scaling is ideal for work that is intermittent or driven by user activity.

This table compares the performance features of the three scaling types:

Feature Automatic scaling Manual scaling Basic scaling
Deadlines 60-second deadline for HTTP requests, 10-minute deadline for task queue tasks. Requests can run for up to 24 hours. A manually-scaled instance can choose to handle /_ah/start and execute a program or script for many hours without returning an HTTP response code. Task queue tasks can run up to 24 hours. Same as manual scaling.
Background threads Not allowed Allowed Allowed
Residence Instances are evicted from memory based on usage patterns. Instances remain in memory, and state is preserved across requests. When instances are restarted, an /_ah/stop request appears in the logs. If there is a registered shutdown hook, it has 30 seconds to complete before shutdown occurs. Instances are evicted based on the idle_timeout parameter. If an instance has been idle, for example it has not received a request, for more than idle_timeout, then the instance is evicted.
Startup and shutdown Instances are created on demand to handle requests and automatically turned down when idle. Instances are sent a start request automatically by App Engine in the form of an empty GET request to /_ah/start. An instance that is manually stopped has 30 seconds to finish handling requests before it is forcibly terminated. Instances are created on demand to handle requests and automatically turned down when idle, based on the idle_timeout configuration parameter. As with manual scaling, an instance that is manually stopped, has 30 seconds to finish handling requests before it is forcibly terminated.
Instance addressability Instances are anonymous. Instance "i" of version "v" of service "s" is addressable at the URL: If you have set up a wildcard subdomain mapping for a custom domain, you can also address a service or any of its instances via a URL of the form or You can reliably cache state in each instance and retrieve it in subsequent requests. Same as manual scaling.
Scaling App Engine scales the number of instances automatically in response to processing volume. This scaling factors in the automatic_scaling settings that are provided on a per-version basis in the configuration file. You configure the number of instances of each version in that service's configuration file. The number of instances usually corresponds to the size of a dataset being held in memory or the desired throughput for offline work. You can adjust the number of instances of a manually-scaled version very quickly, without stopping instances that are currently running, using the Modules API set_num_instances function. A service with basic scaling is configured by setting the maximum number of instances in the max_instances parameter of the basic_scaling setting. The number of live instances scales with the processing volume.
Free daily usage quota 28 instance-hours 8 instance-hours 8 instance-hours

Communication between services

Every service, version, and instance has its own unique URI, for example, Incoming user requests are routed to an instance of a particular service/version according to URL addressing conventions and an optional customized dispatch file.

You can also pass requests between services and from services to external endpoints using the URL Fetch API.

All the services in an application share the state of the Datastore and Memcache services. They can also collaborate by assigning work between them to Task Queues. To access these shared services, use the corresponding App Engine APIs. Calls to these APIs are automatically mapped to the application’s namespace.


The maximum number of services and versions that you can deploy depends on your app's pricing:

Limit Free app Paid app
Maximum services per app 5 105
Maximum versions per app 15 210

There is also a limit to the number of instances for each service with basic or manual scaling:

Maximum instances per manual/basic scaling version
Free app Paid app US Paid app EU
20 25 (200 for us-central) 25

Send feedback about...

App Engine standard environment for Python