Regional endpoints

To use online prediction, you can interact with the AI Platform Training and Prediction API through its global endpoint (ml.googleapis.com) or through one of its regional endpoints (REGION-ml.googleapis.com). Using a regional endpoint for online prediction provides additional protection for your model against outages in other regions, because it isolates your model and version resources from other regions.

AI Platform Prediction currently supports the following regional endpoints:

  • us-central1
  • us-east1
  • us-east4
  • us-west1
  • northamerica-northeast1
  • europe-west1
  • europe-west2
  • europe-west3
  • europe-west4
  • asia-east1
  • asia-northeast1
  • asia-southeast1
  • australia-southeast1

This guide compares the benefits and limitations of using regional endpoints versus the global endpoint. The guide also walks through using a regional endpoint for online prediction.

Understanding regional endpoints

Regional endpoints have several key differences from the global endpoint:

  • Regional endpoints only support Compute Engine (N1) machine types. You cannot use legacy (MLS1) machine types on regional endpoints. This means that all the benefits and limitations of using Compute Engine (N1) machine types apply. For example, you can use GPUs on regional endpoints, but you cannot currently enable stream (console) logging.

    To use a Compute Engine (N1) machine type, you must use a regional endpoint.

  • Regional endpoints only support online prediction and AI Explanations. Models deployed to regional endpoints do not support batch prediction.

    AI Platform Prediction shares the AI Platform Training and Prediction API with AI Platform Training and AI Platform Vizier. Note that regional endpoints do not currently support AI Platform Training. Only the us-central1 endpoint supports AI Platform Vizier.

    See the API reference for more details about which API methods are available on which endpoints.

AI Platform Prediction resource names are unique for your Google Cloud project on any given endpoint, but they can be duplicated on various endpoints. For example, you can create a model named "hello-world" on the europe-west4 endpoint and another model named "hello-world" on the us-central1 endpoint.

When you list models on a regional endpoint, you only see models created on that endpoint. Similarly, when you list models on the global endpoint, you only see models created on the global endpoint.

Regional endpoints versus global endpoint regions

When you create a model resource on the global endpoint, you can specify a region for your model. When you create versions within this model and serve predictions, the prediction nodes run in the specified region.

When you use a regional endpoint, AI Platform Prediction runs your prediction nodes in the endpoint's region. However, in this case AI Platform Prediction provides additional isolation by running all AI Platform Prediction infrastructure in that region.

For example, if you use the us-east1 region on the global endpoint, your prediction nodes run in us-east1. But the AI Platform Prediction infrastructure managing your resources (routing requests; handling model and version creation, updates, and deletion; etc.) does not necessarily run in us-east1. On the other hand, if you use the europe-west4 regional endpoint, your prediction nodes and all AI Platform Prediction infrastructure run in europe-west4.

Using regional endpoints

To use a regional endpoint, you must first create a model on the regional endpoint. Then perform all actions related to that model (like creating a model version and sending prediction requests) on the same endpoint.

If you are using the Google Cloud console, make sure to select the Use regional endpoint checkbox when you create your model. Perform all other Google Cloud console actions like you would on the global endpoint.

If you are using the Google Cloud CLI, --region flag to the region of your endpoint on every command that interacts with your model and its child resources. This includes the following:

Alternatively, you can set the ai_platform/region property to a specific region in order to make sure the gcloud CLI always uses the corresponding regional endpoint for AI Platform Prediction commands, even when you don't specify the --region flag. (This configuration doesn't apply to commands in the gcloud ai-platform operations command group.)

If you are interacting directly with the AI Platform Training and Prediction API (for example, by using the Google API Client Library for Python), make all API requests like you would to the global endpoint, but use the regional endpoint instead. See the API reference for more details about which API methods are available on regional endpoints.

The following examples demonstrate how to use a regional endpoint to create a model, create a version, and send an online prediction request. To use the examples, replace REGION wherever it appears with one of the regions where regional endpoints are available:

  • us-central1
  • us-east1
  • us-east4
  • us-west1
  • northamerica-northeast1
  • europe-west1
  • europe-west2
  • europe-west3
  • europe-west4
  • asia-east1
  • asia-northeast1
  • asia-southeast1
  • australia-southeast1

Creating a model

Google Cloud console

  1. In the Google Cloud console, go to the Create model page and select your Google Cloud project:

    Go to the Create model page

  2. Name your model, select the Use regional endpoint checkbox, and select the region of the endpoint that you want to use from the Region drop-down list.

  3. Click the Create button.

gcloud

Run the following command:

gcloud ai-platform models create MODEL_NAME \
  --region=REGION

In the command, replace the following placeholders:

  • MODEL_NAME: A name that you choose for your model.
  • REGION: The region of the endpoint that you want to use.

Python

This example uses the Google API Client Library for Python. Before you can use it, you must install the Google API Client Library for Python and set up authentication in your development environment.

Run the following Python code:

from google.api_core.client_options import ClientOptions
from googleapiclient import discovery

endpoint = 'https://REGION-ml.googleapis.com'
client_options = ClientOptions(api_endpoint=endpoint)
ml = discovery.build('ml', 'v1', client_options=client_options)

request_body = { 'name': 'MODEL_NAME' }
request = ml.projects().models().create(parent='projects/PROJECT_ID',
    body=request_body)

response = request.execute()
print(response)

In the code, replace the following placeholders:

  • REGION: The region of the endpoint that you want to use.
  • MODEL_NAME: A name that you choose for your model.
  • PROJECT_ID: The ID of your Google Cloud project.

Learn more about creating a model.

Creating a model version

This example assumes that you have already uploaded compatible model artifacts to Cloud Storage.

Google Cloud console

Using the model that you created in the previous section, follow the guide to creating a model version in the Google Cloud console.

gcloud

This example assumes that you have already uploaded compatible model artifacts to Cloud Storage. Run the following command:

gcloud ai-platform versions create VERSION_NAME \
  --region=REGION \
  --model=MODEL_NAME \
  --framework=FRAMEWORK \
  --machine-type=MACHINE_TYPE \
  --origin=MODEL_DIRECTORY \
  --python-version=3.7 \
  --runtime-version=2.11

In the command, replace the following placeholders:

  • REGION: The region of the endpoint that you used in the previous section.
  • VERSION_NAME: A name that you choose for your version.
  • MODEL_NAME: The name of the model that you created in the previous section.
  • FRAMEWORK: The framework used to create your model artifacts.
  • MACHINE_TYPE: A Compute Engine (N1) machine type.
  • MODEL_DIRECTORY: A Cloud Storage URI to your model directory (starting with "gs://").

Python

Run the following Python code:

from google.api_core.client_options import ClientOptions
from googleapiclient import discovery

endpoint = 'https://REGION-ml.googleapis.com'
client_options = ClientOptions(api_endpoint=endpoint)
ml = discovery.build('ml', 'v1', client_options=client_options)

request_body = { 'name': 'VERSION_NAME',
    'deploymentUri': 'MODEL_DIRECTORY',
    'runtimeVersion': '2.11',
    'machineType': 'MACHINE_TYPE',
    'framework': 'FRAMEWORK',
    'pythonVersion': '3.7'}
request = ml.projects().models().versions().create(
    parent='projects/PROJECT_ID/models/MODEL_NAME',
    body=request_body)

response = request.execute()
print(response)

In the code, replace the following placeholders:

  • REGION: The region of the endpoint that you used in the previous section.
  • VERSION_NAME: A name that you choose for your version.
  • MODEL_DIRECTORY: A Cloud Storage URI to your model directory (starting with "gs://").
  • MACHINE_TYPE: A Compute Engine (N1) machine type.
  • FRAMEWORK: The framework used to create your model artifacts.
  • PROJECT_ID: The ID of your Google Cloud project.
  • MODEL_NAME: The name of the model that you created in the previous section.

Learn more about creating a model version.

Sending an online prediction request

Google Cloud console

  1. In the Google Cloud console, go to the Models page:

    Go to the Models page

  2. In the Region drop-down list, select the region of the endpoint that your model uses. Click the name of the model that you created in a previous section to navigate to its Model Details page.

  3. Click the name of the version that you created in a previous section to navigate to its Version Details page.

  4. Click the Test & use tab. Enter one or more instances of input data and click the Test button to send an online prediction request.

gcloud

This examples assumes you have saved prediction input in a newline-delimited JSON file in your local environment. Run the following command:

gcloud ai-platform predict \
  --region=REGION \
  --model=MODEL_NAME \
  --version=VERSION_NAME \
  --json-request=INPUT_PATH

In the command, replace the following placeholders:

  • REGION: The region of the endpoint that you used in the previous sections.
  • MODEL_NAME: The name of the model that you created in a previous section.
  • VERSION_NAME: The name of the model version that you created in the previous section.
  • INPUT_PATH: The path on your local filesystem to a JSON file with prediction input.

Python

Run the following Python code:

from google.api_core.client_options import ClientOptions
from googleapiclient import discovery

endpoint = 'https://REGION-ml.googleapis.com'
client_options = ClientOptions(api_endpoint=endpoint)
ml = discovery.build('ml', 'v1', client_options=client_options)

request_body = { 'instances': INSTANCES }
request = ml.projects().predict(
    name='projects/PROJECT_ID/models/MODEL_NAME/VERSION_NAME',
    body=request_body)

response = request.execute()
print(response)

In the code, replace the following placeholders:

  • REGION: The region of the endpoint that you used in the previous sections.
  • INSTANCES: A list of prediction input instances.
  • MODEL_NAME: The name of the model that you created in a previous section.
  • VERSION_NAME: The name of the version that you created in the previous section.

Learn more about getting online predictions.

What's next