gcloud alpha ai endpoints deploy-model

NAME
gcloud alpha ai endpoints deploy-model - deploy a model to an existing Vertex AI endpoint
SYNOPSIS
gcloud alpha ai endpoints deploy-model (ENDPOINT : --region=REGION) --display-name=DISPLAY_NAME --model=MODEL [--accelerator=[count=COUNT],[type=TYPE]] [--autoscaling-metric-specs=[METRIC-NAME=TARGET,…]] [--deployed-model-id=DEPLOYED_MODEL_ID] [--enable-access-logging] [--enable-container-logging] [--machine-type=MACHINE_TYPE] [--max-replica-count=MAX_REPLICA_COUNT] [--min-replica-count=MIN_REPLICA_COUNT] [--reservation-affinity=[key=KEY],[reservation-affinity-type=RESERVATION-AFFINITY-TYPE],[values=VALUES]] [--service-account=SERVICE_ACCOUNT] [--spot] [--tpu-topology=TPU_TOPOLOGY] [--traffic-split=[DEPLOYED_MODEL_ID=VALUE,…]] [--shared-resources=SHARED_RESOURCES : --shared-resources-region=SHARED_RESOURCES_REGION] [GCLOUD_WIDE_FLAG]
EXAMPLES
To deploy a model 456 to an endpoint 123 under project example in region us-central1, run:
gcloud alpha ai endpoints deploy-model 123 --project=example --region=us-central1 --model=456 --display-name=my_deployed_model
POSITIONAL ARGUMENTS
Endpoint resource - The endpoint to deploy a model to. The arguments in this group can be used to specify the attributes of this resource. (NOTE) Some attributes are not given arguments in this group but can be set in other ways.

To set the project attribute:

  • provide the argument endpoint on the command line with a fully specified name;
  • provide the argument --project on the command line;
  • set the property core/project.

This must be specified.

ENDPOINT
ID of the endpoint or fully qualified identifier for the endpoint.

To set the name attribute:

  • provide the argument endpoint on the command line.

This positional argument must be specified if any of the other arguments in this group are specified.

--region=REGION
Cloud region for the endpoint.

To set the region attribute:

  • provide the argument endpoint on the command line with a fully specified name;
  • provide the argument --region on the command line;
  • set the property ai/region;
  • choose one from the prompted list of available regions.
REQUIRED FLAGS
--display-name=DISPLAY_NAME
Display name of the deployed model.
--model=MODEL
Id of the uploaded model.
OPTIONAL FLAGS
--accelerator=[count=COUNT],[type=TYPE]
Manage the accelerator config for GPU serving. When deploying a model with Compute Engine Machine Types, a GPU accelerator may also be selected.
type
The type of the accelerator. Choices are 'nvidia-a100-80gb', 'nvidia-h100-80gb', 'nvidia-h100-mega-80gb', 'nvidia-l4', 'nvidia-tesla-a100', 'nvidia-tesla-k80', 'nvidia-tesla-p100', 'nvidia-tesla-p4', 'nvidia-tesla-t4', 'nvidia-tesla-v100'.
count
The number of accelerators to attach to each machine running the job. This is usually 1. If not specified, the default value is 1.

For example: --accelerator=type=nvidia-tesla-k80,count=1

--autoscaling-metric-specs=[METRIC-NAME=TARGET,…]
Metric specifications that overrides a resource utilization metric's target value. At most one entry is allowed per metric.
METRIC-NAME
Resource metric name. Choices are 'cpu-usage', 'gpu-duty-cycle'.
TARGET
Target resource utilization in percentage (1% - 100%) for the given metric. If the value is set to 60, the target resource utilization is 60%.

For example: --autoscaling-metric-specs=cpu-usage=70

--deployed-model-id=DEPLOYED_MODEL_ID
User-specified ID of the deployed-model.
--enable-access-logging
If true, online prediction access logs are sent to Cloud Logging.

These logs are standard server access logs, containing information like timestamp and latency for each prediction request.

--enable-container-logging
If true, the container of the deployed model instances will send stderr and stdout streams to Cloud Logging.

Currently, only supported for custom-trained Models and AutoML Tabular Models.

--machine-type=MACHINE_TYPE
The machine resources to be used for each node of this deployment. For available machine types, see https://cloud.google.com/ai-platform-unified/docs/predictions/machine-types.
--max-replica-count=MAX_REPLICA_COUNT
Maximum number of machine replicas for the deployment resources the model will be deployed on.
--min-replica-count=MIN_REPLICA_COUNT
Minimum number of machine replicas for the deployment resources the model will be deployed on. If specified, the value must be equal to or larger than 1.

If not specified and the uploaded models use dedicated resources, the default value is 1.

--reservation-affinity=[key=KEY],[reservation-affinity-type=RESERVATION-AFFINITY-TYPE],[values=VALUES]
A ReservationAffinity can be used to configure a Vertex AI resource (e.g., a DeployedModel) to draw its Compute Engine resources from a Shared Reservation, or exclusively from on-demand capacity.
--service-account=SERVICE_ACCOUNT
Service account that the deployed model's container runs as. Specify the email address of the service account. If this service account is not specified, the container runs as a service account that doesn't have access to the resource project.
--spot
If true, online prediction access logs are sent to Cloud Logging.

These logs are standard server access logs, containing information like timestamp and latency for each prediction request.

--tpu-topology=TPU_TOPOLOGY
CloudTPU topology to use for this deployment. Required for multihost CloudTPU deployments: https://cloud.google.com/kubernetes-engine/docs/concepts/tpus#topology.
--traffic-split=[DEPLOYED_MODEL_ID=VALUE,…]
List of pairs of deployed model id and value to set as traffic split.
Deployment resource pool resource - The deployment resource pool to co-host a model on. The arguments in this group can be used to specify the attributes of this resource. (NOTE) Some attributes are not given arguments in this group but can be set in other ways.

To set the project attribute:

  • provide the argument --shared-resources on the command line with a fully specified name;
  • provide the argument --project on the command line;
  • set the property core/project.
--shared-resources=SHARED_RESOURCES
ID of the deployment_resource_pool or fully qualified identifier for the deployment_resource_pool.

To set the name attribute:

  • provide the argument --shared-resources on the command line.

This flag argument must be specified if any of the other arguments in this group are specified.

--shared-resources-region=SHARED_RESOURCES_REGION
Cloud region for the deployment_resource_pool.

To set the region attribute:

  • provide the argument --shared-resources on the command line with a fully specified name;
  • provide the argument --shared-resources-region on the command line;
  • provide the argument --region on the command line;
  • set the property ai/region;
  • choose one from the prompted list of available regions.
GCLOUD WIDE FLAGS
These flags are available to all commands: --access-token-file, --account, --billing-project, --configuration, --flags-file, --flatten, --format, --help, --impersonate-service-account, --log-http, --project, --quiet, --trace-token, --user-output-enabled, --verbosity.

Run $ gcloud help for details.

NOTES
This command is currently in alpha and might change without notice. If this command fails with API permission errors despite specifying the correct project, you might be trying to access an API with an invitation-only early access allowlist. These variants are also available:
gcloud ai endpoints deploy-model
gcloud beta ai endpoints deploy-model