This guide shows you how to deploy models with custom weights on Vertex AI and covers the following topics: You can deploy custom weights for the following base models: Custom weights don't support the import of quantized models. You must provide the model files in the Hugging Face weights format. For more
information on the Hugging Face weights format, see Use Hugging Face
Models. If the required files aren't provided, the model deployment might fail. This table lists the types of model files, which depend on the model's
architecture: You can deploy custom models in all regions where Model Garden is available. Before you begin In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Verify that billing is enabled for your Google Cloud project.
Enable the Vertex AI API.
In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Verify that billing is enabled for your Google Cloud project.
Enable the Vertex AI API.
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a
Cloud Shell
session starts and displays a command-line prompt. Cloud Shell is a shell environment
with the Google Cloud CLI
already installed and with values already set for
your current project. It can take a few seconds for the session to initialize.
This tutorial uses Cloud Shell to
interact with Google Cloud. If you use a shell other than
Cloud Shell, perform the following additional configuration:
Install the Google Cloud CLI.
If you're using an external identity provider (IdP), you must first
sign in to the gcloud CLI with your federated identity.
To initialize the gcloud CLI, run the following command:
The following table describes the available methods for deploying your custom model: If you're using the gcloud CLI, Python, or curl, replace the following variables in your code samples: PROMPT: Your text prompt. To deploy your model with custom weights using the Google Cloud console, follow these steps: In the Google Cloud console, go to the Model Garden page. Click Deploy model with custom weights. The Deploy a model with
custom weights on Vertex AI pane appears. In the Model source section, do the following: Click Browse, select the bucket that contains your model, and
click Select. Optional: Enter a name for your model in the Model name field. In the Deployment settings section, do the following: From the Region field, select your region, and click OK. In the Machine Spec field, select your machine specification. Optional: The Endpoint name field is populated with a default name. You can enter a different name. Click Deploy model with custom weights. This command deploys the model to a specific region. This command deploys the model to a specific region with a specified
machine type, accelerator type, and accelerator count. To specify a machine
configuration, you must set all three fields. Alternatively, you don't have to pass a parameter to the
Alternatively, you can use the API to explicitly set the machine type.
Supported models
Model name
Version
Llama
Gemma
Qwen
Deepseek
Mistral and Mixtral
Phi-4
OpenAI OSS
Limitations
Model files
Model file content
File type
Model configuration
config.json
Model weights
*.safetensors
*.bin
Weights index
*.index.json
Tokenizer file(s)
tokenizer.model
tokenizer.json
tokenizer_config.json
Locations
Prerequisites
gcloud init
Deploy the custom model
Method
Description
Use Case
Google Cloud console
A graphical user interface that guides you through the deployment process.
Best for quick deployments, visual confirmation of settings, and users who prefer a UI over command-line tools.
gcloud CLI
A command-line tool for managing Google Cloud resources that allows for scripted, repeatable deployments.
Ideal for developers and administrators who work in the terminal and want to automate deployment tasks.
Python
The Vertex AI SDK for Python lets you programmatically deploy and manage models within your applications or notebooks.
Suitable for integrating model deployment into a larger Python-based MLOps workflow or application.
curl (REST API)
Make direct HTTP requests to the Vertex AI API for the most control over the deployment configuration.
Useful for developers using languages other than Python or for environments where installing SDKs is not feasible.
uscentral1
.gs://custom-weights-fishfooding/meta-llama/Llama-3.2-1B-Instruct
.g2-standard-12
.NVIDIA_L4
. Console
gcloud CLI
gcloud ai model-garden models deploy --model=${MODEL_GCS} --region ${REGION}
gcloud ai model-garden models deploy --model=${MODEL_GCS} --machine-type=${MACHINE_TYE} --accelerator-type=${ACCELERATOR_TYPE} --accelerator-count=${ACCELERATOR_COUNT} --region ${REGION}
Python
import vertexai
from google.cloud import aiplatform
from vertexai.preview import model_garden
vertexai.init(project=${PROJECT_ID}, location=${REGION})
custom_model = model_garden.CustomModel(
gcs_uri=GCS_URI,
)
endpoint = custom_model.deploy(
machine_type="${MACHINE_TYPE}",
accelerator_type="${ACCELERATOR_TYPE}",
accelerator_count="${ACCELERATOR_COUNT}",
model_display_name="custom-model",
endpoint_display_name="custom-model-endpoint")
endpoint.predict(instances=[{"prompt": "${PROMPT}"}], use_dedicated_endpoint=True)
custom_model.deploy()
method.import vertexai
from google.cloud import aiplatform
from vertexai.preview import model_garden
vertexai.init(project=${PROJECT_ID}, location=${REGION})
custom_model = model_garden.CustomModel(
gcs_uri=GCS_URI,
)
endpoint = custom_model.deploy()
endpoint.predict(instances=[{"prompt": "${PROMPT}"}], use_dedicated_endpoint=True)
curl
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://${REGION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${REGION}:deploy" \
-d '{
"custom_model": {
"gcs_uri": "'"${MODEL_GCS}"'"
},
"destination": "projects/'"${PROJECT_ID}"'/locations/'"${REGION}"'",
"model_config": {
"model_user_id": "'"${MODEL_ID}"'",
},
}'
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://${REGION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${REGION}:deploy" \
-d '{
"custom_model": {
"gcs_uri": "'"${MODEL_GCS}"'"
},
"destination": "projects/'"${PROJECT_ID}"'/locations/'"${REGION}"'",
"model_config": {
"model_user_id": "'"${MODEL_ID}"'",
},
"deploy_config": {
"dedicated_resources": {
"machine_spec": {
"machine_type": "'"${MACHINE_TYPE}"'",
"accelerator_type": "'"${ACCELERATOR_TYPE}"'",
"accelerator_count": '"${ACCELERATOR_COUNT}"'
},
"min_replica_count": 1
}
}
}'
What's next
Deploy models with custom weights
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-23 UTC.