To use online prediction, you can interact with the AI Platform Training and Prediction API through its
global endpoint (ml.googleapis.com
) or through one of its regional endpoints
(REGION-ml.googleapis.com
). Using a regional endpoint
for online prediction provides additional protection for your model against
outages in other regions, because it isolates your model and version resources
from other regions.
AI Platform Prediction currently supports the following regional endpoints:
us-central1
us-east1
us-east4
us-west1
northamerica-northeast1
europe-west1
europe-west2
europe-west3
europe-west4
asia-east1
asia-northeast1
asia-southeast1
australia-southeast1
This guide compares the benefits and limitations of using regional endpoints versus the global endpoint. The guide also walks through using a regional endpoint for online prediction.
Understanding regional endpoints
Regional endpoints have several key differences from the global endpoint:
Regional endpoints only support Compute Engine (N1) machine types. You cannot use legacy (MLS1) machine types on regional endpoints. This means that all the benefits and limitations of using Compute Engine (N1) machine types apply. For example, you can use GPUs on regional endpoints, but you cannot currently enable stream (console) logging.
To use a Compute Engine (N1) machine type, you must use a regional endpoint.
Regional endpoints only support online prediction and AI Explanations. Models deployed to regional endpoints do not support batch prediction.
AI Platform Prediction shares the AI Platform Training and Prediction API with AI Platform Training and AI Platform Vizier. Note that regional endpoints do not currently support AI Platform Training. Only the
us-central1
endpoint supports AI Platform Vizier.See the API reference for more details about which API methods are available on which endpoints.
AI Platform Prediction resource names are unique for your Google Cloud project on any
given endpoint, but they can be duplicated on various endpoints. For example,
you can create a model named "hello-world" on the europe-west4
endpoint and
another model named "hello-world" on the us-central1
endpoint.
When you list models on a regional endpoint, you only see models created on that endpoint. Similarly, when you list models on the global endpoint, you only see models created on the global endpoint.
Regional endpoints versus global endpoint regions
When you create a model resource on the global endpoint, you can specify a region for your model. When you create versions within this model and serve predictions, the prediction nodes run in the specified region.
When you use a regional endpoint, AI Platform Prediction runs your prediction nodes in the endpoint's region. However, in this case AI Platform Prediction provides additional isolation by running all AI Platform Prediction infrastructure in that region.
For example, if you use the us-east1
region on the global endpoint, your
prediction nodes run in us-east1
. But the AI Platform Prediction infrastructure
managing your resources (routing requests; handling model and version creation,
updates, and deletion; etc.) does not necessarily run in us-east1
. On the
other hand, if you use the europe-west4
regional endpoint, your prediction
nodes and all AI Platform Prediction infrastructure run in europe-west4
.
Using regional endpoints
To use a regional endpoint, you must first create a model on the regional endpoint. Then perform all actions related to that model (like creating a model version and sending prediction requests) on the same endpoint.
If you are using the Google Cloud console, make sure to select the Use regional endpoint checkbox when you create your model. Perform all other Google Cloud console actions like you would on the global endpoint.
If you are using the Google Cloud CLI, --region
flag to the region of your
endpoint on every command that interacts with your model and its child
resources. This includes the following:
- Every command in the
gcloud ai-platform models
command group. - Every command in the
gcloud ai-platform versions
command group. - Every command in the
gcloud ai-platform operations
command group when interacting with long-running operations associated with a version of the model. - The
gcloud ai-platform predict
command. - The
gcloud beta ai-platform explain
command.
Alternatively, you can set the ai_platform/region
property to a specific region in
order to make sure the gcloud CLI always uses the
corresponding regional endpoint for AI Platform Prediction commands, even when
you don't specify the --region
flag. (This configuration doesn't apply to
commands in the
gcloud ai-platform operations
command group.)
If you are interacting directly with the AI Platform Training and Prediction API (for example, by using the Google API Client Library for Python), make all API requests like you would to the global endpoint, but use the regional endpoint instead. See the API reference for more details about which API methods are available on regional endpoints.
The following examples demonstrate how to use a regional endpoint to create a model, create a version, and send an online prediction request. To use the examples, replace REGION wherever it appears with one of the regions where regional endpoints are available:
us-central1
us-east1
us-east4
us-west1
northamerica-northeast1
europe-west1
europe-west2
europe-west3
europe-west4
asia-east1
asia-northeast1
asia-southeast1
australia-southeast1
Creating a model
Google Cloud console
In the Google Cloud console, go to the Create model page and select your Google Cloud project:
Name your model, select the Use regional endpoint checkbox, and select the region of the endpoint that you want to use from the Region drop-down list.
Click the Create button.
gcloud
Run the following command:
gcloud ai-platform models create MODEL_NAME \
--region=REGION
In the command, replace the following placeholders:
- MODEL_NAME: A name that you choose for your model.
- REGION: The region of the endpoint that you want to use.
Python
This example uses the Google API Client Library for Python. Before you can use it, you must install the Google API Client Library for Python and set up authentication in your development environment.
Run the following Python code:
from google.api_core.client_options import ClientOptions
from googleapiclient import discovery
endpoint = 'https://REGION-ml.googleapis.com'
client_options = ClientOptions(api_endpoint=endpoint)
ml = discovery.build('ml', 'v1', client_options=client_options)
request_body = { 'name': 'MODEL_NAME' }
request = ml.projects().models().create(parent='projects/PROJECT_ID',
body=request_body)
response = request.execute()
print(response)
In the code, replace the following placeholders:
- REGION: The region of the endpoint that you want to use.
- MODEL_NAME: A name that you choose for your model.
- PROJECT_ID: The ID of your Google Cloud project.
Learn more about creating a model.
Creating a model version
This example assumes that you have already uploaded compatible model artifacts to Cloud Storage.
Google Cloud console
Using the model that you created in the previous section, follow the guide to creating a model version in the Google Cloud console.
gcloud
This example assumes that you have already uploaded compatible model artifacts to Cloud Storage. Run the following command:
gcloud ai-platform versions create VERSION_NAME \
--region=REGION \
--model=MODEL_NAME \
--framework=FRAMEWORK \
--machine-type=MACHINE_TYPE \
--origin=MODEL_DIRECTORY \
--python-version=3.7 \
--runtime-version=2.11
In the command, replace the following placeholders:
- REGION: The region of the endpoint that you used in the previous section.
- VERSION_NAME: A name that you choose for your version.
- MODEL_NAME: The name of the model that you created in the previous section.
- FRAMEWORK: The framework used to create your model artifacts.
- MACHINE_TYPE: A Compute Engine (N1) machine type.
- MODEL_DIRECTORY: A Cloud Storage URI to your model directory (starting with "gs://").
Python
Run the following Python code:
from google.api_core.client_options import ClientOptions
from googleapiclient import discovery
endpoint = 'https://REGION-ml.googleapis.com'
client_options = ClientOptions(api_endpoint=endpoint)
ml = discovery.build('ml', 'v1', client_options=client_options)
request_body = { 'name': 'VERSION_NAME',
'deploymentUri': 'MODEL_DIRECTORY',
'runtimeVersion': '2.11',
'machineType': 'MACHINE_TYPE',
'framework': 'FRAMEWORK',
'pythonVersion': '3.7'}
request = ml.projects().models().versions().create(
parent='projects/PROJECT_ID/models/MODEL_NAME',
body=request_body)
response = request.execute()
print(response)
In the code, replace the following placeholders:
- REGION: The region of the endpoint that you used in the previous section.
- VERSION_NAME: A name that you choose for your version.
- MODEL_DIRECTORY: A Cloud Storage URI to your model directory (starting with "gs://").
- MACHINE_TYPE: A Compute Engine (N1) machine type.
- FRAMEWORK: The framework used to create your model artifacts.
- PROJECT_ID: The ID of your Google Cloud project.
- MODEL_NAME: The name of the model that you created in the previous section.
Learn more about creating a model version.
Sending an online prediction request
Google Cloud console
In the Google Cloud console, go to the Models page:
In the Region drop-down list, select the region of the endpoint that your model uses. Click the name of the model that you created in a previous section to navigate to its Model Details page.
Click the name of the version that you created in a previous section to navigate to its Version Details page.
Click the Test & use tab. Enter one or more instances of input data and click the Test button to send an online prediction request.
gcloud
This examples assumes you have saved prediction input in a newline-delimited JSON file in your local environment. Run the following command:
gcloud ai-platform predict \
--region=REGION \
--model=MODEL_NAME \
--version=VERSION_NAME \
--json-request=INPUT_PATH
In the command, replace the following placeholders:
- REGION: The region of the endpoint that you used in the previous sections.
- MODEL_NAME: The name of the model that you created in a previous section.
- VERSION_NAME: The name of the model version that you created in the previous section.
- INPUT_PATH: The path on your local filesystem to a JSON file with prediction input.
Python
Run the following Python code:
from google.api_core.client_options import ClientOptions
from googleapiclient import discovery
endpoint = 'https://REGION-ml.googleapis.com'
client_options = ClientOptions(api_endpoint=endpoint)
ml = discovery.build('ml', 'v1', client_options=client_options)
request_body = { 'instances': INSTANCES }
request = ml.projects().predict(
name='projects/PROJECT_ID/models/MODEL_NAME/VERSION_NAME',
body=request_body)
response = request.execute()
print(response)
In the code, replace the following placeholders:
- REGION: The region of the endpoint that you used in the previous sections.
- INSTANCES: A list of prediction input instances.
- MODEL_NAME: The name of the model that you created in a previous section.
- VERSION_NAME: The name of the version that you created in the previous section.
Learn more about getting online predictions.
What's next
- See differences in regional availability for regional endpoints and the global endpoint. This includes differences in GPU availability.
- Learn more about Compute Engine (N1) machine types, which are required for regional endpoints.
- Read about other additional options that you can configure when you create models and versions.