You must deploy a model to an endpoint before you can use that model to serve online predictions. Deploying a model associates physical resources to serve online predictions with low latency.
This page describes the steps you must follow to deploy a model to an endpoint using Online Prediction.
Before you begin
Before deploying your model to an endpoint, export your model artifacts for prediction and ensure you meet all the prerequisites from that page.
Create a resource pool
A ResourcePool custom resource lets you have fine-grained control over
the behavior of your model. You can define settings such as the following:
- Autoscaling configurations.
- The machine type, which defines CPU and memory requirements.
- Accelerator options such as GPU resources.
The machine type is essential for the node pool specification request you send to create the prediction cluster.
For the resource pool of the deployed model, the accelerator count and type
determine GPU usage. The machine type only dictates the requested CPU and memory
resources. For this reason, when including GPU accelerators in the
ResourcePool specification, the machineType field controls the CPU and
memory requirements for the model, while the acceleratorType field controls
the GPU. Furthermore, the acceleratorCount field controls the number of GPU
slices.
Follow these steps to create a ResourcePool custom resource:
- Create a YAML file defining the - ResourcePoolcustom resource. The following examples contain YAML files for resource pools with GPU accelerators (GPU-based models) and without GPU accelerators (CPU-based models):- GPU-based models- apiVersion: prediction.aiplatform.gdc.goog/v1 kind: ResourcePool metadata: name: RESOURCE_POOL_NAME namespace: PROJECT_NAMESPACE spec: resourcePoolID: RESOURCE_POOL_NAME enableContainerLogging: false dedicatedResources: machineSpec: # The system adds computing overhead to the nodes for mandatory components. # Choose a machineType value that allocates fewer CPU and memory resources # than those used by the nodes in the prediction cluster. machineType: a2-highgpu-1g-gdc acceleratorType: nvidia-a100-80gb # The accelerator count is a slice of the requested virtualized GPUs. # The value corresponds to one-seventh of 80 GB of GPUs for each count. acceleratorCount: 2 autoscaling: minReplica: 2 maxReplica: 10- CPU-based models- apiVersion: prediction.aiplatform.gdc.goog/v1 kind: ResourcePool metadata: name: RESOURCE_POOL_NAME namespace: PROJECT_NAMESPACE spec: resourcePoolID: RESOURCE_POOL_NAME enableContainerLogging: false dedicatedResources: machineSpec: # The system adds computing overhead to the nodes for mandatory components. # Choose a machineType value that allocates fewer CPU and memory resources # than those used by the nodes in the prediction cluster. machineType: n2-highcpu-8-gdc autoscaling: minReplica: 2 maxReplica: 10- Replace the following: - RESOURCE_POOL_NAME: the name you want to give to the- ResourcePooldefinition file.
- PROJECT_NAMESPACE: the name of the project namespace associated with the prediction cluster.
 - Modify the values on the - dedicatedResourcesfields according to your resource needs and what is available in your prediction cluster.
- Apply the - ResourcePooldefinition file to the prediction cluster:- kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f RESOURCE_POOL_NAME.yaml- Replace the following: - PREDICTION_CLUSTER_KUBECONFIG: the path to the kubeconfig file in the prediction cluster.
- RESOURCE_POOL_NAME: the name of the- ResourcePooldefinition file.
 
When you create the ResourcePool custom resource, the Kubernetes API and the
webhook service validate the YAML file and report success or failure. The
prediction operator provisions and reserves your resources from the resource
pool when you deploy your models to an endpoint.
Deploy your model to an endpoint
If you have a resource pool, you can deploy more than one model to an endpoint, and you can deploy a model to more than one endpoint. Deploy a prediction model targeting supported containers. Depending on whether the endpoint already exists or not, choose between one of the following two methods:
Deploy a model to a new endpoint
Follow these steps to deploy a prediction model to a new endpoint:
- Create a YAML file defining a - DeployedModelcustom resource:- TensorFlow- The following YAML file shows a sample configuration for a TensorFlow model: - apiVersion: prediction.aiplatform.gdc.goog/v1 kind: DeployedModel metadata: name: DEPLOYED_MODEL_NAME namespace: PROJECT_NAMESPACE spec: # The endpoint path structure is endpoints/<endpoint-id> endpointPath: endpoints/PREDICTION_ENDPOINT modelSpec: # The artifactLocation field must be the s3 path to the folder that # contains the various model versions. # For example, s3://my-prediction-bucket/tensorflow artifactLocation: s3://PATH_TO_MODEL # The value in the id field must be unique to each model. id: img-detection-model modelDisplayName: my_img_detection_model # The model resource name structure is models/<model-id>/<model-version-id> modelResourceName: models/img-detection-model/1 # The model version ID must match the name of the first folder in # the artifactLocation bucket, inside the 'tensorflow' folder. # For example, if the bucket path is # s3://my-prediction-bucket/tensorflow/1/, # then the value for the model version ID is "1". modelVersionID: "1" modelContainerSpec: args: - --model_config_file=/models/models.config - --rest_api_port=8080 - --port=8500 - --file_system_poll_wait_seconds=30 - --model_config_file_poll_wait_seconds=30 command: - /bin/tensorflow_model_server # The image URI field must contain one of the following values: # For CPU-based models: gcr.io/aiml/prediction/containers/tf2-cpu.2-14:latest # For GPU-based models: gcr.io/aiml/prediction/containers/tf2-gpu.2-14:latest imageURI: gcr.io/aiml/prediction/containers/tf2-gpu.2-14:latest ports: - 8080 grpcPorts: - 8500 resourcePoolRef: kind: ResourcePool name: RESOURCE_POOL_NAME namespace: PROJECT_NAMESPACE- Replace the following: - DEPLOYED_MODEL_NAME: the name you want to give to the- DeployedModeldefinition file.
- PROJECT_NAMESPACE: the name of the project namespace associated with the prediction cluster.
- PREDICTION_ENDPOINT: the name you want to give to the new endpoint, such as- my-img-prediction-endpoint.
- PATH_TO_MODEL: the path to your model in the storage bucket.
- RESOURCE_POOL_NAME: the name you gave to the- ResourcePooldefinition file when you created a resource pool to host the model.
 - Modify the values on the remaining fields according to your prediction model. - PyTorch- The following YAML file shows a sample configuration for a PyTorch model: - apiVersion: prediction.aiplatform.gdc.goog/v1 kind: DeployedModel metadata: name: DEPLOYED_MODEL_NAME namespace: PROJECT_NAMESPACE spec: endpointPath: PREDICTION_ENDPOINT endpointInfo: id: PREDICTION_ENDPOINT modelSpec: # The artifactLocation field must be the s3 path to the folder that # contains the various model versions. # For example, s3://my-prediction-bucket/pytorch artifactLocation: s3://PATH_TO_MODEL # The value in the id field must be unique to each model. id: "pytorch" modelDisplayName: my-pytorch-model # The model resource name structure is models/<model-id>/<model-version-id> modelResourceName: models/pytorch/1 modelVersionID: "1" modelContainerSpec: # The image URI field must contain one of the following values: # For CPU-based models: gcr.io/aiml/prediction/containers/pytorch-cpu.2-4:latest # For GPU-based models: gcr.io/aiml/prediction/containers/pytorch-gpu.2-4:latest imageURI: gcr.io/aiml/prediction/containers/pytorch-cpu.2-4:latest ports: - 8080 grpcPorts: - 7070 sharesResourcePool: false resourcePoolRef: kind: ResourcePool name: RESOURCE_POOL_NAME namespace: PROJECT_NAMESPACE- Replace the following: - DEPLOYED_MODEL_NAME: the name you want to give to the- DeployedModeldefinition file.
- PROJECT_NAMESPACE: the name of the project namespace associated with the prediction cluster.
- PREDICTION_ENDPOINT: the name you want to give to the new endpoint, such as- my-img-prediction-endpoint.
- PATH_TO_MODEL: the path to your model in the storage bucket.
- RESOURCE_POOL_NAME: the name you gave to the- ResourcePooldefinition file when you created a resource pool to host the model.
 - Modify the values on the remaining fields according to your prediction model. 
- Apply the - DeployedModeldefinition file to the prediction cluster:- kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f DEPLOYED_MODEL_NAME.yaml- Replace the following: - PREDICTION_CLUSTER_KUBECONFIG: the path to the kubeconfig file in the prediction cluster.
- DEPLOYED_MODEL_NAME: the name of the- DeployedModeldefinition file.
 - When you create the - DeployedModelcustom resource, the Kubernetes API and the webhook service validate the YAML file and report success or failure. The prediction operator reconciles the- DeployedModelcustom resource and serves it in the prediction cluster.
- Create a YAML file defining an - Endpointcustom resource.- The following YAML file shows a sample configuration: - apiVersion: aiplatform.gdc.goog/v1 kind: Endpoint metadata: name: ENDPOINT_NAME namespace: PROJECT_NAMESPACE spec: createDns: true id: PREDICTION_ENDPOINT destinations: - serviceRef: kind: DeployedModel name: DEPLOYED_MODEL_NAME namespace: PROJECT_NAMESPACE trafficPercentage: 50 grpcPort: 8501 httpPort: 8081 - serviceRef: kind: DeployedModel name: DEPLOYED_MODEL_NAME_2 namespace: PROJECT_NAMESPACE trafficPercentage: 50 grpcPort: 8501 httpPort: 8081- Replace the following: - ENDPOINT_NAME: the name you want to give to the- Endpointdefinition file.
- PROJECT_NAMESPACE: the name of the project namespace associated with the prediction cluster.
- PREDICTION_ENDPOINT: the name of the new endpoint. You defined this name on the- DeployedModeldefinition file.
- DEPLOYED_MODEL_NAME: the name you gave to the- DeployedModeldefinition file.
 - You can have one or more - serviceRefdestinations. If you have a second- serviceRefobject, add it to the YAML file on the- destinationsfield and replace- DEPLOYED_MODEL_NAME_2with the name you gave to the second- DeployedModeldefinition file you created. Keep adding or removing- serviceRefobjects as you need them, depending on the amount of models you are deploying.- Set the - trafficPercentagefields based on how you want to split traffic between the models on this endpoint. Modify the values on the remaining fields according to your endpoint configurations.
- Apply the - Endpointdefinition file to the prediction cluster:- kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f ENDPOINT_NAME.yaml- Replace - ENDPOINT_NAMEwith the name of the- Endpointdefinition file.
To get the endpoint URL path for the prediction model, run the following command:
kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG get endpoint PREDICTION_ENDPOINT -n PROJECT_NAMESPACE -o jsonpath='{.status.endpointFQDN}'
Replace the following:
- PREDICTION_CLUSTER_KUBECONFIG: the path to the kubeconfig file in the prediction cluster.
- PREDICTION_ENDPOINT: the name of the new endpoint.
- PROJECT_NAMESPACE: the name of the prediction project namespace.
Deploy a model to an existing endpoint
You can only deploy a model to an existing endpoint if you had previously deployed another model to that endpoint when it was new. The system requires this previous step to create the endpoint.
Follow these steps to deploy a prediction model to an existing endpoint:
- Create a YAML file defining a - DeployedModelcustom resource.- The following YAML file shows a sample configuration: - apiVersion: prediction.aiplatform.gdc.goog/v1 kind: DeployedModel metadata: name: DEPLOYED_MODEL_NAME namespace: PROJECT_NAMESPACE spec: # The endpoint path structure is endpoints/<endpoint-id> endpointPath: endpoints/PREDICTION_ENDPOINT modelSpec: # The artifactLocation field must be the s3 path to the folder that # contains the various model versions. # For example, s3://my-prediction-bucket/tensorflow artifactLocation: s3://PATH_TO_MODEL # The value in the id field must be unique to each model. id: img-detection-model-v2 modelDisplayName: my_img_detection_model # The model resource name structure is models/<model-id>/<model-version-id> modelResourceName: models/img-detection-model/2 # The model version ID must match the name of the first folder in # the artifactLocation bucket, # inside the 'tensorflow' folder. # For example, if the bucket path is # s3://my-prediction-bucket/tensorflow/2/, # then the value for the model version ID is "2". modelVersionID: "2" modelContainerSpec: args: - --model_config_file=/models/models.config - --rest_api_port=8080 - --port=8500 - --file_system_poll_wait_seconds=30 - --model_config_file_poll_wait_seconds=30 command: - /bin/tensorflow_model_server # The image URI field must contain one of the following values: # For CPU-based models: gcr.io/aiml/prediction/containers/tf2-cpu.2-6:latest # For GPU-based models: gcr.io/aiml/prediction/containers/tf2-gpu.2-6:latest imageURI: gcr.io/aiml/prediction/containers/tf2-gpu.2-6:latest ports: - 8080 grpcPorts: - 8500 resourcePoolRef: kind: ResourcePool name: RESOURCE_POOL_NAME namespace: PROJECT_NAMESPACE- Replace the following: - DEPLOYED_MODEL_NAME: the name you want to give to the- DeployedModeldefinition file.
- PROJECT_NAMESPACE: the name of the project namespace associated with the prediction cluster.
- PREDICTION_ENDPOINT: the name of the existing endpoint, such as- my-img-prediction-endpoint.
- PATH_TO_MODEL: the path to your model in the storage bucket.
- RESOURCE_POOL_NAME: the name you gave to the- ResourcePooldefinition file when you created a resource pool to host the model.
 - Modify the values on the remaining fields according to your prediction model. 
- Apply the - DeployedModeldefinition file to the prediction cluster:- kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f DEPLOYED_MODEL_NAME.yaml- Replace the following: - PREDICTION_CLUSTER_KUBECONFIG: the path to the kubeconfig file in the prediction cluster.
- DEPLOYED_MODEL_NAME: the name of the- DeployedModeldefinition file.
 - When you create the - DeployedModelcustom resource, the Kubernetes API and the webhook service validate the YAML file and report success or failure. The prediction operator reconciles the- DeployedModelcustom resource and serves it in the prediction cluster.
- Show details of the existing - Endpointcustom resource:- kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG describe -f ENDPOINT_NAME.yaml- Replace - ENDPOINT_NAMEwith the name of the- Endpointdefinition file.
- Update the YAML file of the - Endpointcustom resource definition by adding a new- serviceRefobject on the- destinationsfield. On the new object, include the appropriate service name based on your newly created- DeployedModelcustom resource.- The following YAML file shows a sample configuration: - apiVersion: aiplatform.gdc.goog/v1 kind: Endpoint metadata: name: ENDPOINT_NAME namespace: PROJECT_NAMESPACE spec: createDns: true id: PREDICTION_ENDPOINT destinations: - serviceRef: kind: DeployedModel name: DEPLOYED_MODEL_NAME namespace: PROJECT_NAMESPACE trafficPercentage: 40 grpcPort: 8501 httpPort: 8081 - serviceRef: kind: DeployedModel name: DEPLOYED_MODEL_NAME_2 namespace: PROJECT_NAMESPACE trafficPercentage: 50 grpcPort: 8501 httpPort: 8081 - serviceRef: kind: DeployedModel name: DEPLOYED_MODEL_NAME_3 namespace: PROJECT_NAMESPACE trafficPercentage: 10 grpcPort: 8501 httpPort: 8081- Replace the following: - ENDPOINT_NAME: the name of the existing- Endpointdefinition file.
- PROJECT_NAMESPACE: the name of the project namespace associated with the prediction cluster.
- PREDICTION_ENDPOINT: the name of the existing endpoint. You referenced this name on the- DeployedModeldefinition file.
- DEPLOYED_MODEL_NAME: the name of a previously created- DeployedModeldefinition file.
- DEPLOYED_MODEL_NAME_2: the name you gave to the newly created- DeployedModeldefinition file.
 - You can have one or more - serviceRefdestinations. If you have a third- serviceRefobject, add it to the YAML file on the- destinationsfield and replace- DEPLOYED_MODEL_NAME_3with the name you gave to the third- DeployedModeldefinition file you created. Keep adding or removing- serviceRefobjects as you need them, depending on the amount of models you are deploying.- Set the - trafficPercentagefields based on how you want to split traffic between the models of this endpoint. Modify the values on the remaining fields according to your endpoint configurations.
- Apply the - Endpointdefinition file to the prediction cluster:- kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f ENDPOINT_NAME.yaml- Replace - ENDPOINT_NAMEwith the name of the- Endpointdefinition file.
To get the endpoint URL path for the prediction model, run the following command:
kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG get endpoint PREDICTION_ENDPOINT -n PROJECT_NAMESPACE -o jsonpath='{.status.endpointFQDN}'
Replace the following:
- PREDICTION_CLUSTER_KUBECONFIG: the path to the kubeconfig file in the prediction cluster.
- PREDICTION_ENDPOINT: the name of the endpoint.
- PROJECT_NAMESPACE: the name of the prediction project namespace.