Discover, test, tune, and deploy models by using Model Garden in the
Google Cloud console. You can also deploy Model Garden models by using the
Google Cloud CLI. In the Google Cloud console, go to the Model Garden page. Find a supported model that you want to test and click View details. Click Open prompt design. You're taken to the Prompt design page. In Prompt, enter the prompt that you want to test. Optional: Configure the model parameters. Click Submit. In the Google Cloud console, go to the Model Garden page. In Search models, enter BERT or T5-FLAN, then click the
magnifying glass to search. Click View details on the T5-FLAN or the BERT model card. Click Open fine-tuning pipeline. You're taken to the Vertex AI pipelines page. To start tuning, click Create run. The model cards for most open source foundation models and fine-tunable models
support tuning in a notebook. In the Google Cloud console, go to the Model Garden page. Find a supported model that you want to tune and go to its model card. Click Open notebook. You can deploy a model by using its model card in the Google Cloud console or
programmatically. For more information about setting up the Google Gen AI SDK or Google Cloud CLI,
see the Google Gen AI SDK overview or
Install the Google Cloud CLI. To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python.
For more information, see the
Python API reference documentation.
List the models that you can deploy and record the model ID to deploy. You
can optionally list the supported Hugging Face models in
Model Garden and even filter them by model names. The output
doesn't include any tuned models. View the deployment specifications for a model by using the model ID from
the previous step. You can view the machine type, accelerator type, and
container image URI that Model Garden has verified for a particular
model. Deploy a model to an endpoint. Model Garden uses the default
deployment configuration unless you specify additional argument and values. Before you begin, specify a quota project to run the following commands. The
commands you run are counted against the quotas for that project. For more
information, see Set the quota project. List the models that you can deploy by running the In the output, find the model ID to deploy. The following example shows an
abbreviated output. The output doesn't include any tuned models or Hugging Face models. To view
which Hugging Face models are supported, add the
To view the deployment specifications for a model, run the Replace MODEL_ID with the model ID from the previous list
command, such as Deploy a model to an endpoint by running the To run the command asynchronously, include the Replace the following placeholders: The output includes the deployment configuration that Model Garden
used, the endpoint ID, and the deployment operation ID, which you can use to
check the deployment status. To see details about your deployment, run the Replace LOCATION_ID with the region where you deployed the model. The output includes all endpoints that were created from
Model Garden and includes information such as the endpoint ID,
endpoint name, and whether the endpoint is associated with a deployed model.
To find your deployment, look for the endpoint name that was returned from
the previous command.
List all deployable models and then get the ID of the model to deploy. You can
then deploy the model with its default configuration and endpoint. Or, you can
choose to customize your deployment, such as setting a specific machine type or
using a dedicated endpoint.
Before using any of the request data,
make the following replacements:
HTTP method and URL:
To send your request, choose one of these options:
Execute the following command:
Execute the following command:
You receive a JSON response similar to the following. Deploy a model from Model Garden or a model from Hugging Face. You can
also customize the deployment by specifying additional JSON fields.
Before using any of the request data,
make the following replacements:
HTTP method and URL:
Request JSON body:
To send your request, choose one of these options:
Save the request body in a file named
Then execute the following command to send your REST request:
Save the request body in a file named
Then execute the following command to send your REST request:
You receive a JSON response similar to the following.
Before using any of the request data,
make the following replacements:
HTTP method and URL:
Request JSON body:
To send your request, choose one of these options:
Save the request body in a file named
Then execute the following command to send your REST request:
Save the request body in a file named
Then execute the following command to send your REST request:
You receive a JSON response similar to the following.
Before using any of the request data,
make the following replacements:
HTTP method and URL:
Request JSON body:
To send your request, choose one of these options:
Save the request body in a file named
Then execute the following command to send your REST request:
Save the request body in a file named
Then execute the following command to send your REST request:
You receive a JSON response similar to the following. In the Google Cloud console, go to the Model Garden page. Find a supported model that you want to deploy, and click its model card. Click Deploy to open the Deploy model pane. In the Deploy model pane, specify details for your deployment. For the Reservation type field, select a reservation type. The
reservation must match your specified machine specs. Click Deploy.
To learn how to apply or remove a Terraform configuration, see
Basic Terraform commands.
For more information, see the
Terraform provider reference documentation.
The following example deploys the To deploy a model with customization, see Vertex AI Endpoint
with Model Garden
Deployment
for details. After you apply the configuration, Terraform provisions a new
Vertex AI endpoint and deploys the specified open model. To delete the endpoint and model deployment, run the following command:Send test prompts
Tune a model
Tune in a notebook
Deploy an open model
Python
gcloud
gcloud ai model-garden
models list
command. This command lists all model IDs and which ones you
can self deploy.gcloud ai model-garden models list
MODEL_ID CAN_DEPLOY CAN_PREDICT
google/gemma2@gemma-2-27b Yes No
google/gemma2@gemma-2-27b-it Yes No
google/gemma2@gemma-2-2b Yes No
google/gemma2@gemma-2-2b-it Yes No
google/gemma2@gemma-2-9b Yes No
google/gemma2@gemma-2-9b-it Yes No
google/gemma3@gemma-3-12b-it Yes No
google/gemma3@gemma-3-12b-pt Yes No
google/gemma3@gemma-3-1b-it Yes No
google/gemma3@gemma-3-1b-pt Yes No
google/gemma3@gemma-3-27b-it Yes No
google/gemma3@gemma-3-27b-pt Yes No
google/gemma3@gemma-3-4b-it Yes No
google/gemma3@gemma-3-4b-pt Yes No
google/gemma3n@gemma-3n-e2b Yes No
google/gemma3n@gemma-3n-e2b-it Yes No
google/gemma3n@gemma-3n-e4b Yes No
google/gemma3n@gemma-3n-e4b-it Yes No
google/gemma@gemma-1.1-2b-it Yes No
google/gemma@gemma-1.1-2b-it-gg-hf Yes No
google/gemma@gemma-1.1-7b-it Yes No
google/gemma@gemma-1.1-7b-it-gg-hf Yes No
google/gemma@gemma-2b Yes No
google/gemma@gemma-2b-gg-hf Yes No
google/gemma@gemma-2b-it Yes No
google/gemma@gemma-2b-it-gg-hf Yes No
google/gemma@gemma-7b Yes No
google/gemma@gemma-7b-gg-hf Yes No
google/gemma@gemma-7b-it Yes No
google/gemma@gemma-7b-it-gg-hf Yes No
--can-deploy-hugging-face-models
flag.gcloud ai
model-garden models list-deployment-config
command. You can view the
machine type, accelorator type, and container image URI that
Model Garden supports for a particular model.gcloud ai model-garden models list-deployment-config \
--model=MODEL_ID
google/gemma@gemma-2b
or
stabilityai/stable-diffusion-xl-base-1.0
.gcloud ai model-garden models
deploy
command. Model Garden generates a display name for your
endpoint and uses the default deployment configuration unless you specify
additional argument and values.--asynchronous
flag.gcloud ai model-garden models deploy \
--model=MODEL_ID \
[--machine-type=MACHINE_TYPE] \
[--accelerator-type=ACCELERATOR_TYPE] \
[--endpoint-display-name=ENDPOINT_NAME] \
[--hugging-face-access-token=HF_ACCESS_TOKEN] \
[--reservation-affinity reservation-affinity-type=any-reservation] \
[--reservation-affinity reservation-affinity-type=specific-reservation, key="compute.googleapis.com/reservation-name", values=RESERVATION_RESOURCE_NAME] \
[--asynchronous]
stabilityai/stable-diffusion-xl-base-1.0
.g2-standard-4
.NVIDIA_L4
.any-reservation
.Using the default deployment configuration:
Machine type: g2-standard-12
Accelerator type: NVIDIA_L4
Accelerator count: 1
The project has enough quota. The current usage of quota for accelerator type NVIDIA_L4 in region us-central1 is 0 out of 28.
Deploying the model to the endpoint. To check the deployment status, you can try one of the following methods:
1) Look for endpoint `ENDPOINT_DISPLAY_NAME` at the [Vertex AI] -> [Online prediction] tab in Cloud Console
2) Use `gcloud ai operations describe OPERATION_ID --region=LOCATION` to find the status of the deployment long-running operation
gcloud ai endpoints list
--list-model-garden-endpoints-only
command:gcloud ai endpoints list --list-model-garden-endpoints-only \
--region=LOCATION_ID
REST
1. List models that you can deploy
listAllVersions=True&filter=is_deployable(true)
. To list
Hugging Face models, set the filter to
alt=json&is_hf_wildcard(true)+AND+labels.VERIFIED_DEPLOYMENT_CONFIG%3DVERIFIED_DEPLOYMENT_SUCCEED&listAllVersions=True
.GET https://us-central1-aiplatform.googleapis.com/v1/publishers/*/models?QUERY_PARAMETERS
curl
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: PROJECT_ID" \
"https://us-central1-aiplatform.googleapis.com/v1/publishers/*/models?QUERY_PARAMETERS"PowerShell
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_ID" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/publishers/*/models?QUERY_PARAMETERS" | Select-Object -Expand Content
{
"publisherModels": [
{
"name": "publishers/google/models/gemma3",
"versionId": "gemma-3-1b-it",
"openSourceCategory": "GOOGLE_OWNED_OSS_WITH_GOOGLE_CHECKPOINT",
"supportedActions": {
"openNotebook": {
"references": {
"us-central1": {
"uri": "https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_gradio_streaming_chat_completions.ipynb"
}
},
"resourceTitle": "Notebook",
"resourceUseCase": "Chat Completion Playground",
"resourceDescription": "Chat with deployed Gemma 2 endpoints via Gradio UI."
},
"deploy": {
"modelDisplayName": "gemma-3-1b-it",
"containerSpec": {
"imageUri": "us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20250312_0916_RC01",
"args": [
"python",
"-m",
"vllm.entrypoints.api_server",
"--host=0.0.0.0",
"--port=8080",
"--model=gs://vertex-model-garden-restricted-us/gemma3/gemma-3-1b-it",
"--tensor-parallel-size=1",
"--swap-space=16",
"--gpu-memory-utilization=0.95",
"--disable-log-stats"
],
"env": [
{
"name": "MODEL_ID",
"value": "google/gemma-3-1b-it"
},
{
"name": "DEPLOY_SOURCE",
"value": "UI_NATIVE_MODEL"
}
],
"ports": [
{
"containerPort": 8080
}
],
"predictRoute": "/generate",
"healthRoute": "/ping"
},
"dedicatedResources": {
"machineSpec": {
"machineType": "g2-standard-12",
"acceleratorType": "NVIDIA_L4",
"acceleratorCount": 1
}
},
"publicArtifactUri": "gs://vertex-model-garden-restricted-us/gemma3/gemma3.tar.gz",
"deployTaskName": "vLLM 128K context",
"deployMetadata": {
"sampleRequest": "{\n \"instances\": [\n {\n \"@requestFormat\": \"chatCompletions\",\n \"messages\": [\n {\n \"role\": \"user\",\n \"content\": \"What is machine learning?\"\n }\n ],\n \"max_tokens\": 100\n }\n ]\n}\n"
}
},
...
2. Deploy a model
Deploy a model with its default configuration.
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy
{
"publisher_model_name": "MODEL_ID",
"model_config": {
"accept_eula": "true"
}
}
curl
request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
cat > request.json << 'EOF'
{
"publisher_model_name": "MODEL_ID",
"model_config": {
"accept_eula": "true"
}
}
EOF
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy"PowerShell
request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
@'
{
"publisher_model_name": "MODEL_ID",
"model_config": {
"accept_eula": "true"
}
}
'@ | Out-File -FilePath request.json -Encoding utf8
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy" | Select-Object -Expand Content
{
"name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID",
"metadata": {
"@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployOperationMetadata",
"genericMetadata": {
"createTime": "2025-03-13T21:44:44.538780Z",
"updateTime": "2025-03-13T21:44:44.538780Z"
},
"publisherModel": "publishers/google/models/gemma3@gemma-3-1b-it",
"destination": "projects/PROJECT_ID/locations/LOCATION",
"projectNumber": "PROJECT_ID"
}
}
Deploy a Hugging Face model
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy
{
"hugging_face_model_id": "MODEL_ID",
"hugging_face_access_token": "ACCESS_TOKEN",
"model_config": {
"accept_eula": "true"
}
}
curl
request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
cat > request.json << 'EOF'
{
"hugging_face_model_id": "MODEL_ID",
"hugging_face_access_token": "ACCESS_TOKEN",
"model_config": {
"accept_eula": "true"
}
}
EOF
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy"PowerShell
request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
@'
{
"hugging_face_model_id": "MODEL_ID",
"hugging_face_access_token": "ACCESS_TOKEN",
"model_config": {
"accept_eula": "true"
}
}
'@ | Out-File -FilePath request.json -Encoding utf8
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy" | Select-Object -Expand Content
{
"name": "projects/PROJECT_ID/locations/us-central1LOCATION/operations/OPERATION_ID",
"metadata": {
"@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployOperationMetadata",
"genericMetadata": {
"createTime": "2025-03-13T21:44:44.538780Z",
"updateTime": "2025-03-13T21:44:44.538780Z"
},
"publisherModel": "publishers/PUBLISHER_NAME/model/MODEL_NAME",
"destination": "projects/PROJECT_ID/locations/LOCATION",
"projectNumber": "PROJECT_ID"
}
}
Deploy a model with customizations
google/gemma@gemma-2b
or
stabilityai/stable-diffusion-xl-base-1.0
.g2-standard-4
.
NVIDIA_L4
reservation_affinity_type
: To use an existing
Compute Engine reservation for your deployment, specify any
reservation or a specific one. If you specify this value, don't specify
spot
.
spot
: Whether to use spot VMs for your deployment.
us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20241016_0916_RC00_maas
fast_tryout_enabled
: When testing a model, you can choose to
use a faster deployment. This option is available only for the highly-used
models with certain machine types. If enabled, you cannot specify model or
deployment configurations.POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy
{
"publisher_model_name": "MODEL_ID",
"deploy_config": {
"dedicated_resources": {
"machine_spec": {
"machine_type": "MACHINE_TYPE",
"accelerator_type": "ACCELERATOR_TYPE",
"accelerator_count": ACCELERATOR_COUNT,
"reservation_affinity": {
"reservation_affinity_type": "ANY_RESERVATION"
}
},
"spot": "false"
}
},
"model_config": {
"accept_eula": "true",
"container_spec": {
"image_uri": "IMAGE_URI",
"args": [CONTAINER_ARGS ],
"ports": [
{
"container_port": CONTAINER_PORT
}
]
}
},
"deploy_config": {
"fast_tryout_enabled": false
},
}
curl
request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
cat > request.json << 'EOF'
{
"publisher_model_name": "MODEL_ID",
"deploy_config": {
"dedicated_resources": {
"machine_spec": {
"machine_type": "MACHINE_TYPE",
"accelerator_type": "ACCELERATOR_TYPE",
"accelerator_count": ACCELERATOR_COUNT,
"reservation_affinity": {
"reservation_affinity_type": "ANY_RESERVATION"
}
},
"spot": "false"
}
},
"model_config": {
"accept_eula": "true",
"container_spec": {
"image_uri": "IMAGE_URI",
"args": [CONTAINER_ARGS ],
"ports": [
{
"container_port": CONTAINER_PORT
}
]
}
},
"deploy_config": {
"fast_tryout_enabled": false
},
}
EOF
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy"PowerShell
request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
@'
{
"publisher_model_name": "MODEL_ID",
"deploy_config": {
"dedicated_resources": {
"machine_spec": {
"machine_type": "MACHINE_TYPE",
"accelerator_type": "ACCELERATOR_TYPE",
"accelerator_count": ACCELERATOR_COUNT,
"reservation_affinity": {
"reservation_affinity_type": "ANY_RESERVATION"
}
},
"spot": "false"
}
},
"model_config": {
"accept_eula": "true",
"container_spec": {
"image_uri": "IMAGE_URI",
"args": [CONTAINER_ARGS ],
"ports": [
{
"container_port": CONTAINER_PORT
}
]
}
},
"deploy_config": {
"fast_tryout_enabled": false
},
}
'@ | Out-File -FilePath request.json -Encoding utf8
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy" | Select-Object -Expand Content
{
"name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID",
"metadata": {
"@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployOperationMetadata",
"genericMetadata": {
"createTime": "2025-03-13T21:44:44.538780Z",
"updateTime": "2025-03-13T21:44:44.538780Z"
},
"publisherModel": "publishers/google/models/gemma3@gemma-3-1b-it",
"destination": "projects/PROJECT_ID/locations/LOCATION",
"projectNumber": "PROJECT_ID"
}
}
Console
Terraform
Deploy a model
gemma-3-1b-it
model to a new
Vertex AI endpoint in us-central1
by using default
configurations.terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "6.45.0"
}
}
}
provider "google" {
region = "us-central1"
}
resource "google_vertex_ai_endpoint_with_model_garden_deployment" "gemma_deployment" {
publisher_model_name = "publishers/google/models/gemma3@gemma-3-1b-it"
location = "us-central1"
model_config {
accept_eula = True
}
}
Apply the Configuration
terraform init
terraform plan
terraform apply
Clean Up
terraform destroy
Deploy a partner model and make prediction requests
In the Google Cloud console, go to the Model Garden page and use the Model collections filter to view the Self-deploy partner models. Choose from the list of self-deploy partner models, and purchase the model by clicking Enable.
You must deploy on the partner's required machine types, as described in the "Recommended hardware configuration" section on their Model Garden model card. When deployed, the model serving resources are located in a secure Google-managed project.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
In your code, replace the following placeholders:
- LOCATION: The region where you plan to deploy the model and endpoint.
- PROJECT_ID: Your project ID.
- DISPLAY_NAME: A descriptive name for the associated resource.
- PUBLISHER_NAME: The name of partner that provides the model to upload or deploy.
- PUBLISHER_MODEL_NAME: The name of the model to upload.
- MACHINE_TYPE: Defines the set of resources to deploy for your
model, such as
g2-standard-4
. You must match one of the confirgurations provided by the partner. - ACCELERATOR_TYPE: Specifies accelerators to add to your
deployment to help improve performance when working with intensive
workloads, such as
NVIDIA_L4
. You must match one of the confirgurations provided by the partner. - ACCELERATOR_COUNT: The number of accelerators to use. You must match one of the confirgurations provided by the partner.
- REQUEST_PAYLOAD: The fields and values to include in your prediction request. View the partner's Model Garden model card to see the available fields.
from google.cloud import aiplatform
aiplatform.init(project=PROJECT_ID, location=LOCATION)
# Upload a model
model = aiplatform.Model.upload(
display_name="DISPLAY_NAME_MODEL",
model_garden_source_model_name = f"publishers/PUBLISHER_NAME/models/PUBLISHER_MODEL_NAME",
)
# Create endpoint
my_endpoint = aiplatform.Endpoint.create(display_name="DISPLAY_NAME_ENDPOINT")
# Deploy model
MACHINE_TYPE = "MACHINE_TYPE" # @param {type: "string"}
ACCELERATOR_TYPE = "ACCELERATOR_TYPE" # @param {type: "string"}
ACCELERATOR_COUNT = ACCELERATOR_COUNT # @param {type: "number"}
model.deploy(
endpoint=my_endpoint,
deployed_model_display_name="DISPLAY_NAME_DEPLOYED_MODEL",
traffic_split={"0": 100},
machine_type=MACHINE_TYPE,
accelerator_type=ACCELERATOR_TYPE,
accelerator_count=ACCELERATOR_COUNT,
min_replica_count=1,
max_replica_count=1,
)
# Unary call for predictions
PAYLOAD = {
REQUEST_PAYLOAD
}
request = json.dumps(PAYLOAD)
response = my_endpoint.raw_predict(
body = request,
headers = {'Content-Type':'application/json'}
)
print(response)
# Streaming call for predictions
PAYLOAD = {
REQUEST_PAYLOAD
}
request = json.dumps(PAYLOAD)
for stream_response in my_endpoint.stream_raw_predict(
body = request,
headers = {'Content-Type':'application/json'}
):
print(stream_response)
REST
List all deployable models and then get the ID of the model to deploy. You can then deploy the model with its default configuration and endpoint. Or, you can choose to customize your deployment, such as setting a specific machine type or using a dedicated endpoint.
In the sample curl commands, replace the following placeholders:
- LOCATION: The region where you plan to deploy the model and endpoint.
- PROJECT_ID: Your project ID.
- DISPLAY_NAME: A descriptive name for the associated resource.
- PUBLISHER_NAME: The name of partner that provides the model to upload or deploy.
- PUBLISHER_MODEL_NAME: The name of the model to upload.
- ENDPOINT_ID: The ID of the endpoint.
- MACHINE_TYPE: Defines the set of resources to deploy for your
model, such as
g2-standard-4
. You must match one of the confirgurations provided by the partner. - ACCELERATOR_TYPE: Specifies accelerators to add to your
deployment to help improve performance when working with intensive
workloads, such as
NVIDIA_L4
. You must match one of the confirgurations provided by the partner. - ACCELERATOR_COUNT: The number of accelerators to use. You must match one of the confirgurations provided by the partner.
- REQUEST_PAYLOAD: The fields and values to include in your prediction request. View the partner's Model Garden model card to see the available fields.
Upload a model to add it to your Model Registry.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://LOCATION-aiplatform.googleapi.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/models:upload \ -d '{ "model": { "displayName": "DISPLAY_NAME_MODEL", "baseModelSource": { "modelGardenSource": { "publicModelName": f"publishers/PUBLISHER_NAME/models/PUBLISHER_MODEL_NAME", } } } }'
Create an endpoint.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://LOCATION-aiplatform.googleapi.com/v1/projects/PROJECT_ID/locations/LOCATION/endpoints \ -d '{ "displayName": "DISPLAY_NAME_ENDPOINT" }'
Deploy the uploaded model to the endpoint.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://LOCATION-aiplatform.googleapi.com/v1/projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID:deployModel \ -d '{ "deployedModel": { "model": f"projects/PROJECT_ID/locations/LOCATION/models/MODEL_ID", "displayName": "DISPLAY_NAME_DEPLOYED_MODEL", "dedicatedResources": { "machineSpec": { "machineType": "MACHINE_TYPE", "acceleratorType": "ACCELERATOR_TYPE", "acceleratorCount":"ACCELERATOR_COUNT", }, "minReplicaCount": 1, "maxReplicaCount": 1 }, }, "trafficSplit": { "0": 100 } }'
After the model is deployed, you can make an unary or streaming call for predictions. View the partner's Model Garden model card to see which API methods are supported.
- Sample unary call:
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://LOCATION-aiplatform.googleapi.com/v1/projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID:rawPredict \ -d 'REQUEST_PAYLOAD'
- Sample streaming call:
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://LOCATION-aiplatform.googleapi.com/v1/projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID:streamRawPredict \ -d 'REQUEST_PAYLOAD'
Console
In the Google Cloud console, go to the Model Garden page.
To find a specific model, enter its name in the Model Garden search box.
To view all the models that you can self-deploy, in the Model collections section of the filter pane, select Self-deploy partner models. The resulting list includes all the self-deployable partner models.
Click the name of the model to deploy, which opens its model card.
Click Deploy options.
In the Deploy on Vertex AI pane, configure your deployment such as the location and machine type.
Click Deploy.
After the deployment is complete, you can request predictions by using the SDK or API. Additional instructions are available in the "Documentation" section on the model card.
View or manage an endpoint
To view and manage your endpoint, go to the Vertex AI Online prediction page.
Vertex AI lists all endpoints in your project for a particular region. Click an endpoint to view its details such as which models are deployed to the endpoint.
Undeploy models and delete resources
To stop a deployed model from using resources in your project, undeploy your model from its endpoint. You must undeploy a model before you can delete the endpoint and the model.
Undeploy models
Undeploy a model from its endpoint.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
In your code, replace:
- PROJECT_ID with your project ID
- LOCATION with your region, for example, "us-central1"
- ENDPOINT_ID with your endpoint ID
from google.cloud import aiplatform
aiplatform.init(project=PROJECT_ID, location=LOCATION)
# To find out which endpoints are available, un-comment the line below:
# endpoints = aiplatform.Endpoint.list()
endpoint = aiplatform.Endpoint(ENDPOINT_ID)
endpoint.undeploy_all()
gcloud
In these commands, replace:
- PROJECT_ID with your project name
- LOCATION_ID with the region where you deployed the model and endpoint
- ENDPOINT_ID with the endpoint ID
- MODEL_ID with the model ID from the list model command
- DEPLOYED_MODEL_ID with the deployed model ID
Find the endpoint ID that is associated with your deployment by running the
gcloud ai endpoints list
command.gcloud ai endpoints list \ --project=PROJECT_ID \ --region=LOCATION_ID
Find the model ID by running the
gcloud ai models list
command.gcloud ai models list \ --project=PROJECT_ID \ --region=LOCATION_ID
Use the model ID from the previous command to get the deployed model ID by running the
gcloud ai models describe
command.gcloud ai models describe MODEL_ID \ --project=PROJECT_ID \ --region=LOCATION_ID
The abbreviated output looks like the following example. In the output, the ID is called
deployedModelId
.Using endpoint [https://us-central1-aiplatform.googleapis.com/] artifactUri: [URI removed] baseModelSource: modelGardenSource: publicModelName: publishers/google/models/gemma2 ... deployedModels: - deployedModelId: '1234567891234567891' endpoint: projects/12345678912/locations/us-central1/endpoints/12345678912345 displayName: gemma2-2b-it-12345678912345 etag: [ETag removed] modelSourceInfo: sourceType: MODEL_GARDEN name: projects/123456789123/locations/us-central1/models/gemma2-2b-it-12345678912345 ...
Run the
gcloud ai endpoints undeploy-model
command to undeploy the model from the endpoint by using the endpoint ID and the deployed model ID from the previous commands.gcloud ai endpoints undeploy-model ENDPOINT_ID \ --project=PROJECT_ID \ --region=LOCATION_ID \ --deployed-model-id=DEPLOYED_MODEL_ID
This command produces no output.
Console
In the Google Cloud console, go to the Endpoints tab on the Online prediction page.
In the Region drop-down list, choose the region where your endpoint is located.
Click the endpoint name to open the details page.
On the row for the model, click
Actions, and then select Undeploy model from endpoint.In the Undeploy model from endpoint dialog, click Undeploy.
Delete endpoints
Delete the Vertex AI endpoint that was associated with your model deployment.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
In your code, replace:
- PROJECT_ID with your project ID
- LOCATION with your region, for example, "us-central1"
- ENDPOINT_ID with your endpoint ID
from google.cloud import aiplatform
aiplatform.init(project=PROJECT_ID, location=LOCATION)
# To find out which endpoints are available, un-comment the line below:
# endpoints = aiplatform.Endpoint.list()
endpoint = aiplatform.Endpoint(ENDPOINT_ID)
endpoint.delete()
gcloud
In these commands, replace:- PROJECT_ID with your project name
- LOCATION_ID with the region where you deployed the model and endpoint
- ENDPOINT_ID with the endpoint ID
Get the endpoint ID to delete by running the
gcloud ai endpoints list
command. This command lists the endpoint IDs for all endpoints in your project.gcloud ai endpoints list \ --project=PROJECT_ID \ --region=LOCATION_ID
Run the
gcloud ai endpoints delete
command to delete the endpoint.gcloud ai endpoints delete ENDPOINT_ID \ --project=PROJECT_ID \ --region=LOCATION_ID
When prompted, type
y
to confirm. This command produces no output.
Console
In the Google Cloud console, go to the Endpoints tab on the Online prediction page.
In the Region drop-down list, choose the region your endpoint is located.
At the end of the endpoint's row, click
Actions, and then select Delete endpoint.In the confirmation prompt, click Confirm.
Delete models
Delete the model resource that was associated with your model deployment.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
In your code, replace:
- PROJECT_ID with your project ID
- LOCATION with your region, for example, "us-central1"
- MODEL_ID with your model ID
from google.cloud import aiplatform
aiplatform.init(project=PROJECT_ID, location=LOCATION)
# To find out which models are available in Model Registry, un-comment the line below:
# models = aiplatform.Model.list()
model = aiplatform.Model(MODEL_ID)
model.delete()
gcloud
In these commands, replace:- PROJECT_ID with your project name
- LOCATION_ID with the region where you deployed the model and endpoint
- MODEL_ID with the model ID from the list model command
Find the model ID to delete by running the
gcloud ai models list
command.gcloud ai models list \ --project=PROJECT_ID \ --region=LOCATION_ID
Run the
gcloud ai models delete
command to delete the model by providing the model ID and the model's location.gcloud ai models delete MODEL_ID \ --project=PROJECT_ID \ --region=LOCATION_ID
Console
Go to the Model Registry page from the Vertex AI section in the Google Cloud console.
In the Region drop-down list, choose the region where you deployed your model.
On the row for your model, click
Actions and then select Delete model.When you delete the model, all associated model versions and evaluations are deleted from your Google Cloud project.
In the confirmation prompt, click Delete.
View code samples
Most of the model cards for task-specific solutions models contain code samples that you can copy and test.
In the Google Cloud console, go to the Model Garden page.
Find a supported model that you want to view code samples for and click the Documentation tab.
The page scrolls to the documentation section with sample code embedded in place.
Create a vision app
The model cards for applicable computer vision models support creating a vision application.
In the Google Cloud console, go to the Model Garden page.
Find a vision model in the Task specific solutions section that you want to use to create a vision application and click View details.
Click Build app.
You're taken to Vertex AI Vision.
In Application name, enter a name for your application and click Continue.
Select a billing plan and click Create.
You're taken to Vertex AI Vision Studio where you can continue creating your computer vision application.