This document describes the different types of self-deployed models available in Model Garden and covers the following topics: The following table compares the self-deployment options available on Vertex AI. In Model Garden, you can deploy and serve open, partner, and custom models on Vertex AI. Unlike model-as-a-service (MaaS) offerings that are serverless, self-deployed models are deployed securely within your Google Cloud project and VPC network. Open models provide pretrained capabilities for various AI tasks, including
Gemini models that excel in multimodal processing. An open model is
freely available, and you can publish its outputs and use it anywhere as long as you adhere to its licensing terms.
Vertex AI offers both open (also known as open weight)
and open source models. When you use an open model with Vertex AI, your deployment uses Vertex AI infrastructure. You can also use open models with other infrastructure products, such as PyTorch or Jax. Many open models are considered open weight large language models (LLMs). Open weight models provide more transparency than models whose weights are not public. A model's weights are the numerical values stored in the model's neural network architecture that represent learned patterns and relationships from the data a model is trained on. The pretrained parameters, or weights, of open weight models are released. You can use an open weight model for inference and tuning. However, details such as the original dataset, model architecture, and training code aren't always provided. Open weight models differ from open source AI models. While open weight models often expose the weights and the core numerical representation of learned patterns, they don't necessarily provide the full source code or training details. Providing weights offers a level of AI model transparency, which lets you understand the model's capabilities without needing to build it yourself. Model Garden helps you purchase and manage model licenses from partners who offer proprietary models as a self-deploy option. After you purchase access to a model from Cloud Marketplace, you can choose to deploy on on-demand hardware or use your Compute Engine reservations and committed use discounts to meet your budget requirements. You are charged for model usage and for the Vertex AI infrastructure that you use. To request usage of a self-deployed partner model, find the relevant model in the Model Garden console, click Contact sales, and then complete the form. This action initiates contact with a Google Cloud sales representative. For more information about deploying and using partner models, see Deploy a
partner model and make prediction requests. Consider the following limitations when using self-deployed partner models: Support for model-specific issues is provided by the partner. To contact a partner about model performance issues, use the contact details in the Support section of their Model Garden model card. You can fine-tune models based on a predefined set of base models and deploy your customized models in Vertex AI Model Garden. To deploy your custom models, you import custom weights by uploading your model artifacts to a Cloud Storage bucket in your project. The public preview of Deploy models with custom weights is supported by the
following base models: Custom weights don't support the import of quantized
models. You must supply the model files in the Hugging Face weights format. For more
information on the Hugging Face weights format, see Use Hugging Face
Models. If the required files aren't provided, the model deployment might fail. This table lists the types of model files, which depend on the model's
architecture: You can deploy custom models in all regions where Model Garden is available. Before you deploy your custom model, complete the following initial setup steps. Before you begin In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Verify that billing is enabled for your Google Cloud project.
Enable the Vertex AI API.
In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Verify that billing is enabled for your Google Cloud project.
Enable the Vertex AI API.
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a
Cloud Shell
session starts and displays a command-line prompt. Cloud Shell is a shell environment
with the Google Cloud CLI
already installed and with values already set for
your current project. It can take a few seconds for the session to initialize.
The following instructions use Cloud Shell. If you use a local development environment, you must authenticate to Google Cloud:
Install the Google Cloud CLI.
If you're using an external identity provider (IdP), you must first
sign in to the gcloud CLI with your federated identity.
To initialize the gcloud CLI, run the following command:
If you're using the gcloud CLI, Python, or curl, replace the following variables in the code samples: PROMPT: Your text prompt. The following steps show you how to use the Google Cloud console to deploy your
model with custom weights. In the Google Cloud console, go to the Model Garden page. Click Deploy model with custom weights. The Deploy a model with
custom weights on Vertex AI pane appears. In the Model source section, do the following: Click Browse, select the bucket where your model is stored, and
click Select. Optional: In the Model name field, enter a name for your model. In the Deployment settings section, do the following: From the Region list, select your region. In the Machine Spec field, select the machine specification to use for deploying your model. Optional: In the Endpoint name field, you can change the default endpoint name. Click Deploy model with custom weights. This command demonstrates how to deploy the model to a specific region. This command demonstrates how to deploy the model to a specific region with
its machine type, accelerator type, and accelerator count. To select a specific machine configuration, you must set all three fields. Alternatively, you can call the Alternatively, you can use the API to explicitly set the machine type.
Choose a self-deployment option
Option
Description
Pros
Cons
Self-deploy open models
Freely available models with public weights. You manage the deployment infrastructure.
High transparency; no model licensing cost; portable.
You are responsible for all infrastructure costs and management.
Self-deployed partner models
Proprietary models from third-party partners, purchased and deployed through Cloud Marketplace.
Access to specialized, commercial-grade models; partner support available.
Incurs model usage costs; weights cannot be exported; some platform limitations (for example, no VPC Service Controls support).
Deploy models with custom weights
Deploy your own fine-tuned versions of supported base models by providing custom model weights.
Maximum customization for your specific use case; deploy on your preferred infrastructure.
Requires you to prepare model files in a specific format; doesn't support quantized models during import.
Self-deploy open models
Open weight models
Open source models
Self-deployed partner models
Considerations
Deploy models with custom weights
Supported models
Model name
Version
Llama
Gemma
Qwen
Deepseek
Mistral and Mixtral
Phi-4
Limitations
Model files
Model file content
File type
Model configuration
config.json
Model weights
*.safetensors
*.bin
Weights index
*.index.json
Tokenizer file(s)
tokenizer.model
tokenizer.json
tokenizer_config.json
Locations
Prerequisites
gcloud init
Deploy the custom model
us-central1
).gs://custom-weights-fishfooding/meta-llama/Llama-3.2-1B-Instruct
).g2-standard-12
).NVIDIA_L4
). Console
gcloud CLI
gcloud ai model-garden models deploy --model=${MODEL_GCS} --region ${REGION}
gcloud ai model-garden models deploy --model=${MODEL_GCS} --machine-type=${MACHINE_TYE} --accelerator-type=${ACCELERATOR_TYPE} --accelerator-count=${ACCELERATOR_COUNT} --region ${REGION}
Python
import vertexai
from google.cloud import aiplatform
from vertexai.preview import model_garden
vertexai.init(project=${PROJECT_ID}, location=${REGION})
custom_model = model_garden.CustomModel(
gcs_uri=GCS_URI,
)
endpoint = custom_model.deploy(
machine_type="${MACHINE_TYPE}",
accelerator_type="${ACCELERATOR_TYPE}",
accelerator_count="${ACCELERATOR_COUNT}",
model_display_name="custom-model",
endpoint_display_name="custom-model-endpoint")
endpoint.predict(instances=[{"prompt": "${PROMPT}"}], use_dedicated_endpoint=True)
custom_model.deploy()
method without arguments to use default settings.import vertexai
from google.cloud import aiplatform
from vertexai.preview import model_garden
vertexai.init(project=${PROJECT_ID}, location=${REGION})
custom_model = model_garden.CustomModel(
gcs_uri=GCS_URI,
)
endpoint = custom_model.deploy()
endpoint.predict(instances=[{"prompt": "${PROMPT}"}], use_dedicated_endpoint=True)
curl
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://${REGION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${REGION}:deploy" \
-d '{
"custom_model": {
"gcs_uri": "'"${MODEL_GCS}"'"
},
"destination": "projects/'"${PROJECT_ID}"'/locations/'"${REGION}"'",
"model_config": {
"model_user_id": "'"${MODEL_ID}"'",
},
}'
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://${REGION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${REGION}:deploy" \
-d '{
"custom_model": {
"gcs_uri": "'"${MODEL_GCS}"'"
},
"destination": "projects/'"${PROJECT_ID}"'/locations/'"${REGION}"'",
"model_config": {
"model_user_id": "'"${MODEL_ID}"'",
},
"deploy_config": {
"dedicated_resources": {
"machine_spec": {
"machine_type": "'"${MACHINE_TYPE}"'",
"accelerator_type": "'"${ACCELERATOR_TYPE}"'",
"accelerator_count": '"${ACCELERATOR_COUNT}"'
},
"min_replica_count": 1
}
}
}'
What's next
Overview of self-deployed models
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-21 UTC.