Python 2 is no longer supported by the community. We recommend that you migrate Python 2 apps to Python 3.

How Requests are Handled

Region ID

The REGION_ID is a code that Google assigns based on the region you select when you create your app. Including REGION_ID.r in App Engine URLs is optional for existing apps and will soon be required for all new apps.

To ensure a smooth transition, we are slowly updating App Engine to use region IDs. If we haven't updated your Google Cloud project yet, you won't see a region ID for your app. Since the ID is optional for existing apps, you don't need to update URLs or make other changes once the region ID is available for your existing apps.

Learn more about region IDs.

This document describes how your App Engine application receives requests and sends responses. For more details, see the Request Headers and Responses reference.

If your application uses services, you can address requests to a specific service or a specific version of that service. For more information about service addressability, see How Requests are Routed.

Handling requests

Your application is responsible for starting a webserver and handling requests. You can use any web framework that is available for your development language.

App Engine runs multiple instances of your application, and each instance has its own web server for handling requests. Any request can be routed to any instance, so consecutive requests from the same user are not necessarily sent to the same instance. An instance can handle multiple requests concurrently. The number of instances can be adjusted automatically as traffic changes. You can also change the number of concurrent requests an instance can handle by setting the max_concurrent_requests element in your app.yaml file.

When App Engine receives a web request for your application, it calls the handler script that corresponds to the URL, as described in the application's app.yaml configuration file. The Python 2.7 runtime supports the WSGI standard and the CGI standard for backwards compatibility. WSGI is preferred, and some features of Python 2.7 do not work without it. The configuration of your application's script handlers determines whether a request is handled using WSGI or CGI.

The following Python script responds to a request with an HTTP header and the message Hello, World!.

import webapp2

class MainPage(webapp2.RequestHandler):
    def get(self):
        self.response.headers['Content-Type'] = 'text/plain'
        self.response.write('Hello, World!')

app = webapp2.WSGIApplication([
    ('/', MainPage),
], debug=True)

To dispatch multiple requests to each web server in parallel, mark your application as threadsafe by adding a threadsafe: true to your app.yaml file. Concurrent requests are not available if any script handler uses CGI.

Quotas and limits

App Engine automatically allocates resources to your application as traffic increases. However, this is bound by the following restrictions:

  • App Engine reserves automatic scaling capacity for applications with low latency, where the application responds to requests in less than one second. Applications with very high latency, such as over one second per request for many requests, and high throughput require Silver, Gold, or Platinum support. Customers with this level of support can request higher throughput limits by contacting their support representative.

  • Applications that are heavily CPU-bound may also incur some additional latency in order to efficiently share resources with other applications on the same servers. Requests for static files are exempt from these latency limits.

Each incoming request to the application counts toward the Requests limit. Data sent in response to a request counts toward the Outgoing Bandwidth (billable) limit.

Both HTTP and HTTPS (secure) requests count toward the Requests, Incoming Bandwidth (billable), and Outgoing Bandwidth (billable) limits. The Cloud Console Quota Details page also reports Secure Requests, Secure Incoming Bandwidth, and Secure Outgoing Bandwidth as separate values for informational purposes. Only HTTPS requests count toward these values. For more information, see the Quotas page.

The following limits apply specifically to the use of request handlers:

Limit Amount
request size 32 megabytes
response size 32 megabytes
request timeout depends on the type of scaling your app uses
maximum total number of files (app files and static files) 10,000 total
1,000 per directory
maximum size of an application file 32 megabytes
maximum size of a static file 32 megabytes
maximum total size of all application and static files first 1 gigabyte is free
$ 0.026 per gigabyte per month after first 1 gigabyte

Response limits

Dynamic responses are limited to 32MB. If a script handler generates a response larger than this limit, the server sends back an empty response with a 500 Internal Server Error status code. This limitation does not apply to responses that serve data from the Blobstore or Cloud Storage .

Request headers

An incoming HTTP request includes the HTTP headers sent by the client. For security purposes, some headers are sanitized or amended by intermediate proxies before they reach the application.

For more information, see the Request headers reference.

Request responses

App Engine calls the handler script with a Request and waits for the script to return; all data written to the standard output stream is sent as the HTTP response.

There are limits that apply to the response you generate, and the response may be modified before it is returned to the client.

For more information, see the Request responses reference.

Streaming Responses

App Engine does not support streaming responses where data is sent in incremental chunks to the client while a request is being processed. All data from your code is collected as described above and sent as a single HTTP response.

Response compression

If the client sends HTTP headers with the original request indicating that the client can accept compressed (gzipped) content, App Engine compresses the handler response data automatically and attaches the appropriate response headers. It uses both the Accept-Encoding and User-Agent request headers to determine if the client can reliably receive compressed responses.

Custom clients can indicate that they are able to receive compressed responses by specifying both Accept-Encoding and User-Agent headers with a value of gzip. The Content-Type of the response is also used to determine whether compression is appropriate; in general, text-based content types are compressed, whereas binary content types are not.

When responses are compressed automatically by App Engine, the Content-Encoding header is added to the response.

Specifying a request deadline

App Engine is optimized for applications with short-lived requests, typically those that take a few hundred milliseconds. An efficient app responds quickly for the majority of requests. An app that doesn't will not scale well with App Engine's infrastructure.

All requests to your app must return a response within the maximum request timeout. If your app doesn't respond by the timeout, App Engine interrupts the request handler. The Python runtime environment interrupts the request handler by raising a DeadlineExceededError, from the package google.appengine.runtime. If the request handler does not catch this exception, as with all uncaught exceptions, the runtime environment will return an HTTP 500 server error to the client.

The request handler can catch this error to customize the response. The runtime environment gives the request handler a little bit more time (less than a second) after raising the exception to prepare a custom response.

class TimerHandler(webapp2.RequestHandler):
    def get(self):
        from google.appengine.runtime import DeadlineExceededError

        except DeadlineExceededError:
                'The request did not complete in time.')

If the handler hasn't returned a response or raised an exception by the second deadline, the handler is terminated and a default error response is returned.


The App Engine web server captures everything the handler script writes to the standard output stream for the response to the web request. It also captures everything the handler script writes to the standard error stream, and stores it as log data. Each request is assigned a request_id, a globally unique identifier based on the request's start time. Log data for your application can be viewed in the Cloud Console using Cloud Logging.

The App Engine Python runtime environment includes special support for the logging module from the Python standard library to understand logging concepts such as log levels ("debug", "info", "warning", "error", "critical").

import logging

import webapp2

class MainPage(webapp2.RequestHandler):
    def get(self):
        logging.debug('This is a debug message')'This is an info message')
        logging.warning('This is a warning message')
        logging.error('This is an error message')
        logging.critical('This is a critical message')

            raise ValueError('This is a sample value error.')
        except ValueError:
            logging.exception('A example exception log.')

        self.response.out.write('Logging example.')

app = webapp2.WSGIApplication([
    ('/', MainPage)
], debug=True)

The environment

The execution environment automatically sets several environment variables; you can set more in app.yaml. Of the automatically-set variables, some are special to App Engine, while others are part of the WSGI or CGI standards. Python code can access these variables using the os.environ dictionary.

The following environment variables are specific to App Engine:

  • CURRENT_VERSION_ID: The major and minor version of the currently running application, as "X.Y". The major version number ("X") is specified in the app's app.yaml file. The minor version number ("Y") is set automatically when each version of the app is uploaded to App Engine. On the development web server, the minor version is always "1".

  • AUTH_DOMAIN: The domain used for authenticating users with the Users API. Apps hosted on have an AUTH_DOMAIN of, and accept any Google account. Apps hosted on a custom domain have an AUTH_DOMAIN equal to the custom domain.

  • INSTANCE_ID: Contains the instance ID of the frontend instance handling a request. The ID is a hex string (for example, 00c61b117c7f7fd0ce9e1325a04b8f0df30deaaf). A logged-in admin can use the id in a url: The request will be routed to that specific frontend instance. If the instance cannot handle the request it returns an immediate 503.

The following environment variables are part of the WSGI and CGI standards, with special behavior in App Engine:

  • SERVER_SOFTWARE: In the development web server, this value is "Development/X.Y" where "X.Y" is the version of the runtime. When running on App Engine, this value is "Google App Engine/X.Y.Z".

Additional environment variables are set according to the WSGI or CGI standard. For more information on these variables, see the WSGI standard or the CGI standard, as appropriate.

You can also set environment variables in the app.yaml file:

  DJANGO_SETTINGS_MODULE: 'myapp.settings'

The following webapp2 request handler displays every environment variable visible to the application in the browser:

class PrintEnvironmentHandler(webapp2.RequestHandler):
    def get(self):
        self.response.headers['Content-Type'] = 'text/plain'
        for key, value in os.environ.iteritems():
                "{} = {}\n".format(key, value))

Request IDs

At the time of the request, you can save the request ID, which is unique to that request. The request ID can be used later to look up the logs for that request in Cloud Logging.

The following sample code shows how to get the request ID in the context of a request:

class RequestIdHandler(webapp2.RequestHandler):
    def get(self):
        self.response.headers['Content-Type'] = 'text/plain'
        request_id = os.environ.get('REQUEST_LOG_ID')

App caching

The Python runtime environment caches imported modules between requests on a single web server, similar to how a standalone Python application loads a module only once even if the module is imported by multiple files. Since WSGI handlers are modules, they are cached between requests. CGI handler scripts are only cached if they provide a main() routine; otherwise, the CGI handler script is loaded for every request.

App caching provides a significant benefit in response time. We recommend that all CGI handler scripts use a main() routine, as described below.

Imports are cached

For efficiency, the web server keeps imported modules in memory and does not re- load or re-evaluate them on subsequent requests to the same application on the same server. Most modules do not initialize any global data or have other side effects when they are imported, so caching them does not change the behavior of the application.

If your application imports a module that depends on the module being evaluated for every request, the application must accommodate this caching behavior.

Caching CGI handlers

You can tell App Engine to cache the CGI handler script itself, in addition to imported modules. If the handler script defines a function named main(), then the script and its global environment will be cached like an imported module. The first request for the script on a given web server evaluates the script normally. For subsequent requests, App Engine calls the main() function in the cached environment.

To cache a handler script, App Engine must be able to call main() with no arguments. If the handler script does not define a main() function, or the main() function requires arguments (that don't have defaults), then App Engine loads and evaluates the entire script for every request.

Keeping the parsed Python code in memory saves time and allows for faster responses. Caching the global environment has other potential uses as well:

  • Compiled regular expressions. All regular expressions are parsed and stored in a compiled form. You can store compiled regular expressions in global variables, then use app caching to re-use the compiled objects between requests.

  • GqlQuery objects. The GQL query string is parsed when the GqlQuery object is created. Re-using a GqlQuery object with parameter binding and the bind() method is faster than re-constructing the object each time. You can store a GqlQuery object with parameter binding for the values in a global variable, then re-use it by binding new parameter values for each request.

  • Configuration and data files. If your application loads and parses configuration data from a file, it can retain the parsed data in memory to avoid having to re-load the file with each request.

The handler script should call main() when imported. App Engine expects that importing the script calls main(), so App Engine does not call it when loading the request handler for the first time on a server.

App caching with main() provides a significant improvement in your CGI handler's response time. We recommend it for all applications that use CGI.