自 2025 年 4 月 29 日起，Gemini 1.5 Pro 和 Gemini 1.5 Flash 模型將無法用於先前未使用這些模型的專案，包括新專案。詳情請參閱「模型版本和生命週期」。

本頁面由 Cloud Translation API 翻譯而成。

自行部署模型總覽

本文說明 Model Garden 中可用的各種自行部署模型，並涵蓋下列主題：

自行部署開放式模型：瞭解可在自有基礎架構部署的開放權重和開放原始碼模型。
合作夥伴自行部署的模型：瞭解如何使用透過 Cloud Marketplace 購買的合作夥伴專屬模型。
部署自訂權重的模型：瞭解如何根據支援的開放式模型，部署自己微調的模型。

選擇自行部署選項

下表比較 Vertex AI 提供的自行部署選項。

選項	說明	優點	缺點
自行部署開放式模型	提供公開權重的免費模型。您負責管理部署基礎架構。	透明度高；沒有模型授權費用；可攜性。	您必須負責所有基礎架構費用和管理作業。
自行部署的合作夥伴模型	透過 Cloud Marketplace 購買及部署第三方合作夥伴的專有模型。	可使用專業的商用級模型，並享有合作夥伴支援服務。	產生模型使用費用；無法匯出權重；部分平台限制 (例如不支援 VPC Service Controls)。
部署使用自訂權重的模型	提供自訂模型權重，部署支援的基礎模型微調版本。	可依據您的應用實例進行最大程度的自訂；在偏好的基礎架構上部署。	您必須準備特定格式的模型檔案，且匯入時不支援量化模型。

在 Model Garden 中，您可以在 Vertex AI 上部署及提供開放原始碼、合作夥伴和自訂模型。與無伺服器的模型即服務 (MaaS) 產品不同，自行部署的模型會安全地部署在您的 Google Cloud 專案和虛擬私有雲網路中。

自行部署開放式模型

開放式模型提供各種 AI 任務的預先訓練功能，包括擅長多模態處理的 Gemini 模型。開放式模型可免費使用，只要遵守授權條款，就能發布輸出內容並在任何地方使用。Vertex AI 提供開放式 (也稱為開放權重) 和開放原始碼模型。

在 Vertex AI 中使用開放模型時，部署作業會使用 Vertex AI 基礎架構。您也可以搭配其他基礎架構產品 (例如 PyTorch 或 Jax) 使用開放模型。

開放權重模型

許多開放模型都屬於開放權重大型語言模型 (LLM)。相較於權重未公開的模型，開放權重模型提供更高的透明度。模型的權重是儲存在模型類神經網路架構中的數值，代表模型從訓練資料中學到的模式和關係。開放權重模型會發布預先訓練的參數或權重。您可以將開放權重模型用於推論和微調。不過，詳細資料 (例如原始資料集、模型架構和訓練程式碼) 並非一律提供。

開放原始碼模型

開放權重模型與開放原始碼 AI 模型不同。開放權重模型通常會公開權重和所學模式的核心數值表示法，但不一定會提供完整原始碼或訓練詳細資料。提供權重可提升 AI 模型透明度，讓您瞭解模型功能，不必自行建構模型。

自行部署的合作夥伴模型

Model Garden 可協助您向合作夥伴購買及管理模型授權，這些合作夥伴提供專有模型，並支援自行部署。從 Cloud Marketplace 購買模型存取權後，您可以選擇在隨需硬體上部署，或使用 Compute Engine 預留資源和承諾使用折扣，以符合預算需求。您需要支付模型使用費，以及為您使用的 Vertex AI 基礎架構支付費用。

如要要求使用自行部署的合作夥伴模型，請在 Model Garden 控制台中找出相關模型，按一下「聯絡銷售人員」，然後填寫表單。這項操作會啟動與 Google Cloud 業務代表的聯絡程序。

如要進一步瞭解如何部署及使用合作夥伴模型，請參閱「部署合作夥伴模型並提出預測要求」。

注意事項

使用自行部署的合作夥伴模型時，請注意下列限制：

與開放式模型不同，您無法匯出權重。
如果專案已設定 VPC Service Controls，您就無法上傳模型，因此無法部署合作夥伴模型。
端點僅支援共用公開端點類型。

合作夥伴會提供特定機型問題的支援服務。如要就模型效能問題與合作夥伴聯絡，請使用模型資訊卡「支援」部分中的聯絡資訊。

部署使用自訂權重的模型

您可以根據預先定義的一組基礎模型微調模型，並在 Vertex AI Model Garden 中部署自訂模型。如要部署自訂模型，請將模型構件上傳至專案的 Cloud Storage bucket，匯入自訂權重。

支援的模型

使用自訂權重部署模型的公開測試版支援下列基礎模型：

模型名稱	版本
Llama	Llama-2：7B、13B Llama-3.1：8B、70B Llama-3.2：10 億、30 億 Llama-4：Scout-17B、Maverick-17B CodeLlama-13B
Gemma	Gemma-2：270 億 Gemma-3：10 億、40 億、30 億至 120 億、270 億 Medgemma：4B、27B 文字
Qwen	Qwen2：15 億 Qwen2.5：0.5B、1.5B、7B、32B Qwen3：0.6B、1.7B、8B、32B
Deepseek	Deepseek-R1 Deepseek-V3
Mistral 和 Mixtral	Mistral-7B-v0.1 Mixtral-8x7B-v0.1 Mistral-Nemo-Base-2407
Phi-4	Phi-4-reasoning

限制

自訂權重不支援匯入量化模型。

模型檔案

您必須以 Hugging Face 權重格式提供模型檔案。如要進一步瞭解 Hugging Face 權重格式，請參閱「使用 Hugging Face 模型」。

如果未提供必要檔案，模型部署作業可能會失敗。

下表列出模型檔案類型，這些類型取決於模型的架構：

模型檔案內容	檔案類型
模型設定	`config.json`
模型權重	`.safetensors` `.bin`
權重指數	`*.index.json`
分詞器檔案	`tokenizer.model` `tokenizer.json` `tokenizer_config.json`

位置

您可以在 Model Garden 支援的所有地區部署自訂模型。

必要條件

部署自訂模型前，請先完成下列初步設定步驟。

事前準備

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI API.

Enable the API

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI API.

Enable the API

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell

At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

以下操作說明使用的是 Cloud Shell。如果您使用本機開發環境，必須向 Google Cloud進行驗證：

Install the Google Cloud CLI.
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
To initialize the gcloud CLI, run the following command:
```
gcloud init
```

部署自訂模型

如果您使用 gcloud CLI、Python 或 curl，請在程式碼範例中替換下列變數：

REGION：您的區域 (例如 us-central1)。
MODEL_GCS：模型的 Cloud Storage 路徑 (例如 gs://custom-weights-fishfooding/meta-llama/Llama-3.2-1B-Instruct)。
PROJECT_ID：您的專案 ID。
MODEL_ID：您的模型 ID。
MACHINE_TYPE：您的機器類型 (例如 g2-standard-12)。
ACCELERATOR_TYPE：加速器類型 (例如 NVIDIA_L4)。
ACCELERATOR_COUNT：加速器數量。
PROMPT：文字提示。

主控台

下列步驟說明如何使用 Google Cloud 控制台，部署具有自訂權重的模型。

前往 Google Cloud 控制台的「Model Garden」頁面。

前往 Model Garden
按一下「Deploy model with custom weights」(使用自訂權重部署模型)。系統會顯示「Deploy a model with custom weights on Vertex AI」(在 Vertex AI 上部署具有自訂權重的模型) 窗格。
在「模型來源」部分執行下列操作：
1. 按一下「瀏覽」，選取儲存模型的 bucket，然後按一下「選取」。
2. 選用：在「Model name」(模型名稱) 欄位中，輸入模型的名稱。
在「部署設定」部分，執行下列操作：
1. 從「Region」(區域) 清單中選取區域。
2. 在「Machine Spec」(機器規格) 欄位中，選取要用於部署模型的機器規格。
3. 選用：在「端點名稱」欄位中，您可以變更預設端點名稱。
按一下「Deploy model with custom weights」(使用自訂權重部署模型)。

gcloud CLI

這項指令示範如何將模型部署至特定區域。

gcloud ai model-garden models deploy --model=${MODEL_GCS} --region ${REGION}

這項指令示範如何將模型部署至特定區域，並指定機器類型、加速器類型和加速器數量。如要選取特定機器設定，必須設定所有三個欄位。

gcloud ai model-garden models deploy --model=${MODEL_GCS} --machine-type=${MACHINE_TYE} --accelerator-type=${ACCELERATOR_TYPE} --accelerator-count=${ACCELERATOR_COUNT} --region ${REGION}

Python

import vertexai
from google.cloud import aiplatform
from vertexai.preview import model_garden

vertexai.init(project=${PROJECT_ID}, location=${REGION})
custom_model = model_garden.CustomModel(
  gcs_uri=GCS_URI,
)
endpoint = custom_model.deploy(
  machine_type="${MACHINE_TYPE}",
  accelerator_type="${ACCELERATOR_TYPE}",
  accelerator_count="${ACCELERATOR_COUNT}",
  model_display_name="custom-model",
  endpoint_display_name="custom-model-endpoint")

endpoint.predict(instances=[{"prompt": "${PROMPT}"}], use_dedicated_endpoint=True)

或者，您也可以呼叫不含引數的 custom_model.deploy() 方法，使用預設設定。

import vertexai
from google.cloud import aiplatform
from vertexai.preview import model_garden

vertexai.init(project=${PROJECT_ID}, location=${REGION})
custom_model = model_garden.CustomModel(
  gcs_uri=GCS_URI,
)
endpoint = custom_model.deploy()

endpoint.predict(instances=[{"prompt": "${PROMPT}"}], use_dedicated_endpoint=True)

curl


curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://${REGION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${REGION}:deploy" \
  -d '{
    "custom_model": {
    "gcs_uri": "'"${MODEL_GCS}"'"
  },
  "destination": "projects/'"${PROJECT_ID}"'/locations/'"${REGION}"'",
  "model_config": {
     "model_user_id": "'"${MODEL_ID}"'",
  },
}'

或者，您也可以使用 API 明確設定機器類型。


curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://${REGION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${REGION}:deploy" \
  -d '{
    "custom_model": {
    "gcs_uri": "'"${MODEL_GCS}"'"
  },
  "destination": "projects/'"${PROJECT_ID}"'/locations/'"${REGION}"'",
  "model_config": {
     "model_user_id": "'"${MODEL_ID}"'",
  },
  "deploy_config": {
    "dedicated_resources": {
      "machine_spec": {
        "machine_type": "'"${MACHINE_TYPE}"'",
        "accelerator_type": "'"${ACCELERATOR_TYPE}"'",
        "accelerator_count": '"${ACCELERATOR_COUNT}"'
      },
      "min_replica_count": 1
    }
  }
}'

後續步驟

如要進一步瞭解 Model Garden，請參閱「Model Garden 總覽」。
如要進一步瞭解如何部署模型，請參閱「在 Model Garden 中使用模型」。
使用 Gemma 開放式模型
使用 Llama 開放式模型
使用 Hugging Face 開放模型