Deploy models with custom weights is a Preview offering. You can fine tune
models based on a predefined set of base models, and deploy your customized
models on Vertex AI Model Garden. You can deploy your custom
models using the custom weights import by uploading your model artifacts to
a Cloud Storage bucket in your project, which is a one-click experience in
Vertex AI. The public preview of Deploy models with custom weights is supported by the
following base models: Custom weights don't support the import of quantized
models. You must supply the model files in the Hugging Face weights format. For more
information on the Hugging Face weights format, see Use Hugging Face
Models. If the required files aren't provided, the model deployment might fail. This table lists the types of model files, which depend on the model's
architecture: You can deploy custom models in all regions from Model Garden services. This section demonstrates how to deploy your custom model. In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Verify that billing is enabled for your Google Cloud project.
Enable the Vertex AI API.
In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Verify that billing is enabled for your Google Cloud project.
Enable the Vertex AI API.
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a
Cloud Shell
session starts and displays a command-line prompt. Cloud Shell is a shell environment
with the Google Cloud CLI
already installed and with values already set for
your current project. It can take a few seconds for the session to initialize.
This tutorial assumes that you are using Cloud Shell to
interact with Google Cloud. If you want to use a different shell instead of
Cloud Shell, then perform the following additional configuration:
Install the Google Cloud CLI.
If you're using an external identity provider (IdP), you must first
sign in to the gcloud CLI with your federated identity.
To initialize the gcloud CLI, run the following command:
This section demonstrates how to deploy your custom model. If you're using the command-line interface (CLI), Python, or JavaScript, replace
the following variables with a value for your code samples to work: PROMPT: Your text prompt. The following steps show you how to use the Google Cloud console to deploy your
model with custom weights. In the Google Cloud console, go to the Model Garden page. Click Deploy model with custom weights. The Deploy a model with
custom weights on Vertex AI pane appears. In the Model source section, do the following: Click Browse, and choose your bucket where your model is stored, and
click Select. Optional: Enter your model's name in the Model name field. In the Deployment settings section, do the following: From the Region field, select your region, and click OK. In the Machine Spec field, select your machine specification, which
is used to the deploy your model. Optional: In the Endpoint name field, your model's endpoint appears
by default. However, you can enter a different endpoint name in the
field. Click Deploy model with custom weights. This command demonstrates how to deploy the model to a specific region. This command demonstrates how to deploy the model to a specific region with
its machine type, accelerator type, and accelerator count. If you want to
select a specific machine configuration, then you must set all three fields. Alternatively, you don't have to pass a parameter to the
Alternatively, you can use the API to explicitly set the machine type.Supported models
Model name
Version
Llama
Gemma
Qwen
Deepseek
Mistral and Mixtral
Phi-4
OpenAI OSS
Limitations
Model files
Model file content
File type
Model configuration
config.json
Model weights
*.safetensors
*.bin
Weights index
*.index.json
Tokenizer file(s)
tokenizer.model
tokenizer.json
tokenizer_config.json
Locations
Prerequisites
Before you begin
gcloud init
Deploy the custom model
uscentral1
.gs://custom-weights-fishfooding/meta-llama/Llama-3.2-1B-Instruct
.g2-standard-12
.NVIDIA_L4
. Console
gcloud CLI
gcloud ai model-garden models deploy --model=${MODEL_GCS} --region ${REGION}
gcloud ai model-garden models deploy --model=${MODEL_GCS} --machine-type=${MACHINE_TYE} --accelerator-type=${ACCELERATOR_TYPE} --accelerator-count=${ACCELERATOR_COUNT} --region ${REGION}
Python
import vertexai
from google.cloud import aiplatform
from vertexai.preview import model_garden
vertexai.init(project=${PROJECT_ID}, location=${REGION})
custom_model = model_garden.CustomModel(
gcs_uri=GCS_URI,
)
endpoint = custom_model.deploy(
machine_type="${MACHINE_TYPE}",
accelerator_type="${ACCELERATOR_TYPE}",
accelerator_count="${ACCELERATOR_COUNT}",
model_display_name="custom-model",
endpoint_display_name="custom-model-endpoint")
endpoint.predict(instances=[{"prompt": "${PROMPT}"}], use_dedicated_endpoint=True)
custom_model.deploy()
method.import vertexai
from google.cloud import aiplatform
from vertexai.preview import model_garden
vertexai.init(project=${PROJECT_ID}, location=${REGION})
custom_model = model_garden.CustomModel(
gcs_uri=GCS_URI,
)
endpoint = custom_model.deploy()
endpoint.predict(instances=[{"prompt": "${PROMPT}"}], use_dedicated_endpoint=True)
curl
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://${REGION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${REGION}:deploy" \
-d '{
"custom_model": {
"gcs_uri": "'"${MODEL_GCS}"'"
},
"destination": "projects/'"${PROJECT_ID}"'/locations/'"${REGION}"'",
"model_config": {
"model_user_id": "'"${MODEL_ID}"'",
},
}'
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://${REGION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${REGION}:deploy" \
-d '{
"custom_model": {
"gcs_uri": "'"${MODEL_GCS}"'"
},
"destination": "projects/'"${PROJECT_ID}"'/locations/'"${REGION}"'",
"model_config": {
"model_user_id": "'"${MODEL_ID}"'",
},
"deploy_config": {
"dedicated_resources": {
"machine_spec": {
"machine_type": "'"${MACHINE_TYPE}"'",
"accelerator_type": "'"${ACCELERATOR_TYPE}"'",
"accelerator_count": '"${ACCELERATOR_COUNT}"'
},
"min_replica_count": 1
}
}
}'
Learn more about self-deployed models in Vertex AI
Deploy models with custom weights
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-23 UTC.