Deploy models with custom weights

This guide shows you how to deploy models with custom weights on Vertex AI and covers the following topics:

Supported models

You can deploy custom weights for the following base models:

Model name Version
Llama
  • Llama-2: 7B, 13B
  • Llama-3.1: 8B, 70B
  • Llama-3.2: 1B, 3B
  • Llama-4: Scout-17B, Maverick-17B
  • CodeLlama-13B
Gemma
  • Gemma-2: 27B
  • Gemma-3: 1B, 4B, 3-12B, 27B
  • Medgemma: 4B, 27B-text
Qwen
  • Qwen2: 1.5B
  • Qwen2.5: 0.5B, 1.5B, 7B, 32B
  • Qwen3: 0.6B, 1.7B, 8B, 32B, Qwen3-Coder-480B-A35B-Instruct
Deepseek
  • Deepseek-R1
  • Deepseek-V3
Mistral and Mixtral
  • Mistral-7B-v0.1
  • Mixtral-8x7B-v0.1
  • Mistral-Nemo-Base-2407
Phi-4
  • Phi-4-reasoning
OpenAI OSS
  • gpt-oss: 20B, 120B

Limitations

Custom weights don't support the import of quantized models.

Model files

You must provide the model files in the Hugging Face weights format. For more information on the Hugging Face weights format, see Use Hugging Face Models.

If the required files aren't provided, the model deployment might fail.

This table lists the types of model files, which depend on the model's architecture:

Model file content File type
Model configuration
  • config.json
Model weights
  • *.safetensors
  • *.bin
Weights index
  • *.index.json
Tokenizer file(s)
  • tokenizer.model
  • tokenizer.json
  • tokenizer_config.json

Locations

You can deploy custom models in all regions where Model Garden is available.

Prerequisites

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. Enable the Vertex AI API.

    Enable the API

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Verify that billing is enabled for your Google Cloud project.

  7. Enable the Vertex AI API.

    Enable the API

  8. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

This tutorial uses Cloud Shell to interact with Google Cloud. If you use a shell other than Cloud Shell, perform the following additional configuration:

  1. Install the Google Cloud CLI.

  2. If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

  3. To initialize the gcloud CLI, run the following command:

    gcloud init

Deploy the custom model

The following table describes the available methods for deploying your custom model:

Method Description Use Case
Google Cloud console A graphical user interface that guides you through the deployment process. Best for quick deployments, visual confirmation of settings, and users who prefer a UI over command-line tools.
gcloud CLI A command-line tool for managing Google Cloud resources that allows for scripted, repeatable deployments. Ideal for developers and administrators who work in the terminal and want to automate deployment tasks.
Python The Vertex AI SDK for Python lets you programmatically deploy and manage models within your applications or notebooks. Suitable for integrating model deployment into a larger Python-based MLOps workflow or application.
curl (REST API) Make direct HTTP requests to the Vertex AI API for the most control over the deployment configuration. Useful for developers using languages other than Python or for environments where installing SDKs is not feasible.

If you're using the gcloud CLI, Python, or curl, replace the following variables in your code samples:

  • REGION: Your region. For example, uscentral1.
  • MODEL_GCS: Your Google Cloud model. For example, gs://custom-weights-fishfooding/meta-llama/Llama-3.2-1B-Instruct.
  • PROJECT_ID: Your project ID.
  • MODEL_ID: Your model ID.
  • MACHINE_TYPE: Your machine type. For example, g2-standard-12.
  • ACCELERATOR_TYPE: Your accelerator type. For example, NVIDIA_L4.
  • ACCELERATOR_COUNT: Your accelerator count.
  • PROMPT: Your text prompt.

Console

To deploy your model with custom weights using the Google Cloud console, follow these steps:

  1. In the Google Cloud console, go to the Model Garden page.

    Go to Model Garden

  2. Click Deploy model with custom weights. The Deploy a model with custom weights on Vertex AI pane appears.

  3. In the Model source section, do the following:

    1. Click Browse, select the bucket that contains your model, and click Select.

    2. Optional: Enter a name for your model in the Model name field.

  4. In the Deployment settings section, do the following:

    1. From the Region field, select your region, and click OK.

    2. In the Machine Spec field, select your machine specification.

    3. Optional: The Endpoint name field is populated with a default name. You can enter a different name.

  5. Click Deploy model with custom weights.

gcloud CLI

This command deploys the model to a specific region.

gcloud ai model-garden models deploy --model=${MODEL_GCS} --region ${REGION}

This command deploys the model to a specific region with a specified machine type, accelerator type, and accelerator count. To specify a machine configuration, you must set all three fields.

gcloud ai model-garden models deploy --model=${MODEL_GCS} --machine-type=${MACHINE_TYE} --accelerator-type=${ACCELERATOR_TYPE} --accelerator-count=${ACCELERATOR_COUNT} --region ${REGION}

Python

import vertexai
from google.cloud import aiplatform
from vertexai.preview import model_garden

vertexai.init(project=${PROJECT_ID}, location=${REGION})
custom_model = model_garden.CustomModel(
  gcs_uri=GCS_URI,
)
endpoint = custom_model.deploy(
  machine_type="${MACHINE_TYPE}",
  accelerator_type="${ACCELERATOR_TYPE}",
  accelerator_count="${ACCELERATOR_COUNT}",
  model_display_name="custom-model",
  endpoint_display_name="custom-model-endpoint")

endpoint.predict(instances=[{"prompt": "${PROMPT}"}], use_dedicated_endpoint=True)

Alternatively, you don't have to pass a parameter to the custom_model.deploy() method.

import vertexai
from google.cloud import aiplatform
from vertexai.preview import model_garden

vertexai.init(project=${PROJECT_ID}, location=${REGION})
custom_model = model_garden.CustomModel(
  gcs_uri=GCS_URI,
)
endpoint = custom_model.deploy()

endpoint.predict(instances=[{"prompt": "${PROMPT}"}], use_dedicated_endpoint=True)

curl


curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://${REGION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${REGION}:deploy" \
  -d '{
    "custom_model": {
    "gcs_uri": "'"${MODEL_GCS}"'"
  },
  "destination": "projects/'"${PROJECT_ID}"'/locations/'"${REGION}"'",
  "model_config": {
     "model_user_id": "'"${MODEL_ID}"'",
  },
}'

Alternatively, you can use the API to explicitly set the machine type.


curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://${REGION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${REGION}:deploy" \
  -d '{
    "custom_model": {
    "gcs_uri": "'"${MODEL_GCS}"'"
  },
  "destination": "projects/'"${PROJECT_ID}"'/locations/'"${REGION}"'",
  "model_config": {
     "model_user_id": "'"${MODEL_ID}"'",
  },
  "deploy_config": {
    "dedicated_resources": {
      "machine_spec": {
        "machine_type": "'"${MACHINE_TYPE}"'",
        "accelerator_type": "'"${ACCELERATOR_TYPE}"'",
        "accelerator_count": '"${ACCELERATOR_COUNT}"'
      },
      "min_replica_count": 1
    }
  }
}'

What's next