Edit on GitHub
Report issue
Page history

Serve a machine learning model on App Engine flexible environment

Author(s): @dizcology ,   Published: 2017-12-18

Yu-Han Liu | Developer Programs Engineer | Google

Contributed by Google employees.

This tutorial takes a deeper look at the sample app Model serve. It helps you build your own service serving a trained machine learning model for online prediction.


  1. Deploy a service with Cloud Endpoints.
  2. Deploy a Python app on App Engine which loads a trained machine learning model.
  3. Send requests to the service and get responses.

Before you begin

Follow the links in the requirements section to install Google Cloud SDK and enable the APIs for App Engine, Cloud Endpoints, and Cloud Storage.


So you trained a machine learning model. Now what?

If the model's performance is good enough, consider deploying it as a service to a production system where one or more clients can use it. Some possible scenarios include:

  • The model needs to simultaneously process user requests from your web application in real time, and batch process interaction logs you have previously stored in a database.
  • The model's output is used by multiple other machine learning models in your application.

App Engine offers rolling updates, networking, and auto scaling.

Cloud Endpoints helps you monitor the service's consumers, as well as manage their permissions and quotas.

You can follow the steps of the sample app to deploy a service. Below we will look at some key pieces of the code to understand how it works.

A closer look


Our app expects POST requests to the path /predict. It will look for the value of 'X' in the JSON data, send it to the trained model, and return the result as the value of 'y':

@app.route('/predict', methods=['POST'])
def predict():
    X = request.get_json()['X']
    y = MODEL.predict(X).tolist()
    return json.dumps({'y': y}), 200

Here MODEL is a global variable for the trained machine learning model we are serving. To make sure the model is loaded, we use the before_first_request decorator, which will be triggered by App Engine's health check requests:

MODEL = None

def _load_model():
    global MODEL
    client = storage.Client()
    bucket = client.get_bucket(MODEL_BUCKET)
    blob = bucket.get_blob(MODEL_FILENAME)
    s = blob.download_as_string()

    MODEL = pickle.loads(s)

We store the model as a pickled file on Cloud Storage. To make sure the App Engine service knows where to find the model, we pass in the model's bucket and file name as environment variables. This is done in the app.yaml configuration file.


The app.yaml configuration file defines the App Engine service. We define environment variables pointing to the trained model:

    # The app will look for the model file at: gs://MODEL_BUCKET/MODEL_FILENAME
    MODEL_FILENAME: lr.pkl

You should replace BUCKET_NAME with a bucket owned by your project. lr.pkl is a simple linear regression model included in the sample app.

To use Cloud Endpoints to manage the service, we need to specify a config_id under the endpoints_api_service field:

  name: modelserve-dot-PROJECT_ID.appspot.com
  config_id: CONFIG_ID

The CONFIG_ID is the configuration ID of a Cloud Endpoints service deployment. To find existing deployments and their configuration IDs, go to the Cloud Endpoints console, click on the service's title, then click on the "Deployment history" tab.

To deploy a service to Cloud Endpoints, we configure it with the modelserve.yaml file.


The modelserve.yaml configuration file defines the service according to the OpenAPI specification. We highlight only some of the key settings below.

  • Specify the host:

    host: "modelserve-dot-PROJECT_ID.appspot.com"

    This host means we will handle the requests with an App Engine service called modelserve.

  • Enforce authentication with an API key by adding the security and securityDefinitions fields:

      - api_key: []
        type: "apiKey"
        name: "key"
        in: "query"

    This is optional, but allows you to grant service consumer permissions on the Cloud Endpoints console. The API key must be associated with a Google Cloud project. You can create API keys on the credentials page.

  • Additionally, you can configure quota for each consumer at the project level. First we specify a service level metric in order to track the number of requests:

        - name: "modelserve-predict"
          displayName: "modelserve predict"
          valueType: INT64
          metricKind: DELTA
          - name: "modelserve-predict-limit"
            metric: "modelserve-predict"
            unit: "1/min/{project}"
              STANDARD: 1000

    This configurations declares a metric modelserve-predict and sets its limit to 1000 units per minute per project.

    Only requests to specified paths are counted towards this metric. Specify these paths by adding the following in the paths field:

              modelserve-predict: 1

    Each time a POST request is sent to modelserve-dot-PROJECT_ID.appspot.com/predict, the metric modelserve-predict increments by 1, and each project is limited to 1000 calls per minute with the configuration above.

    You can manage quotas for individual projects on the Cloud Endpoints console. For more information on configuring the quota, see the documentation.

Submit a tutorial

Share step-by-step guides

Submit a tutorial

Request a tutorial

Ask for community help

Submit a request

View tutorials

Search Google Cloud tutorials

View tutorials

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see our Site Policies. Java is a registered trademark of Oracle and/or its affiliates.