此页面由 Cloud Translation API 翻译。

部署具有自定义权重的模型

部署具有自定义权重的模型是一项预览版功能。您可以基于预定义的基础模型集微调模型，并将自定义模型部署到 Vertex AI Model Garden。您可以通过将模型制品上传到项目中的 Cloud Storage 存储桶来部署自定义模型，从而使用自定义权重导入功能，这在 Vertex AI 中只需点击一下即可完成。

支持的模型

部署具有自定义权重的模型的公开预览版支持以下基础模型：

模型名称	版本
Llama	Llama-2：7B、13B Llama-3.1：8B、70B Llama-3.2：1B、3B Llama-4：Scout-17B、Maverick-17B CodeLlama-13B
Gemma	Gemma-2：27B Gemma-3：1B、4B、3-12B、27B Medgemma：4B、27B-text
Qwen	Qwen2：15 亿 Qwen2.5：0.5B、1.5B、7B、32B Qwen3：0.6B、1.7B、8B、32B、Qwen3-Coder-480B-A35B-Instruct
Deepseek	Deepseek-R1 Deepseek-V3
Mistral 和 Mixtral	Mistral-7B-v0.1 Mixtral-8x7B-v0.1 Mistral-Nemo-Base-2407
Phi-4	Phi-4-reasoning
OpenAI OSS	gpt-oss：200 亿、1, 200 亿

限制

自定义权重不支持导入量化模型。

模型文件

您必须以 Hugging Face 权重格式提供模型文件。如需详细了解 Hugging Face 权重格式，请参阅使用 Hugging Face 模型。

如果未提供所需文件，模型部署可能会失败。

下表列出了模型文件类型，这些类型取决于模型的架构：

模型文件内容	文件类型
模型配置	`config.json`
模型权重	`.safetensors` `.bin`
权重指数	`*.index.json`
词元化器文件	`tokenizer.model` `tokenizer.json` `tokenizer_config.json`

位置

您可以通过 Model Garden 服务在所有区域中部署自定义模型。

前提条件

本部分演示了如何部署自定义模型。

准备工作

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI API.

Enable the API

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI API.

Enable the API

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell

At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

本教程假定您使用 Cloud Shell 与 Google Cloud进行互动。如果您想使用其他 shell 取代 Cloud Shell，请执行以下额外的配置：

Install the Google Cloud CLI.
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
To initialize the gcloud CLI, run the following command:
```
gcloud init
```

部署自定义模型

本部分演示了如何部署自定义模型。

如果您使用的是命令行界面 (CLI)、Python 或 JavaScript，请将以下变量替换为相应的值，以便代码示例正常运行：

REGION：您的区域。例如 uscentral1。
MODEL_GCS：您的 Google Cloud 模型。例如 gs://custom-weights-fishfooding/meta-llama/Llama-3.2-1B-Instruct。
PROJECT_ID：您的项目 ID。
MODEL_ID：您的模型 ID。
MACHINE_TYPE：您的机器类型。例如 g2-standard-12。
ACCELERATOR_TYPE：加速器类型。例如 NVIDIA_L4。
ACCELERATOR_COUNT：加速器数量。
PROMPT：您的文本提示。

控制台

以下步骤介绍了如何使用 Google Cloud 控制台部署具有自定义权重的模型。

在 Google Cloud 控制台中，前往 Model Garden 页面。

前往 Model Garden
点击部署具有自定义权重的模型。系统随即会显示在 Vertex AI 上部署具有自定义权重的模型窗格。
在模型来源部分中，执行以下操作：
1. 点击浏览，然后选择存储模型的存储桶，再点击选择。
2. 可选：在模型名称字段中输入模型的名称。
在部署设置部分，执行以下操作：
1. 在区域字段中，选择您的区域，然后点击确定。
2. 在机器规格字段中，选择用于部署模型的机器规格。
3. 可选：在端点名称字段中，默认显示模型的端点。不过，您可以在相应字段中输入其他端点名称。
点击部署具有自定义权重的模型。

gcloud CLI

此命令演示了如何将模型部署到特定区域。

gcloud ai model-garden models deploy --model=${MODEL_GCS} --region ${REGION}

此命令演示了如何将模型部署到特定区域，并指定其机器类型、加速器类型和加速器数量。如果您想选择特定的机器配置，则必须设置所有这三个字段。

gcloud ai model-garden models deploy --model=${MODEL_GCS} --machine-type=${MACHINE_TYE} --accelerator-type=${ACCELERATOR_TYPE} --accelerator-count=${ACCELERATOR_COUNT} --region ${REGION}

Python

import vertexai
from google.cloud import aiplatform
from vertexai.preview import model_garden

vertexai.init(project=${PROJECT_ID}, location=${REGION})
custom_model = model_garden.CustomModel(
  gcs_uri=GCS_URI,
)
endpoint = custom_model.deploy(
  machine_type="${MACHINE_TYPE}",
  accelerator_type="${ACCELERATOR_TYPE}",
  accelerator_count="${ACCELERATOR_COUNT}",
  model_display_name="custom-model",
  endpoint_display_name="custom-model-endpoint")

endpoint.predict(instances=[{"prompt": "${PROMPT}"}], use_dedicated_endpoint=True)

或者，您也可以不向 custom_model.deploy() 方法传递参数。

import vertexai
from google.cloud import aiplatform
from vertexai.preview import model_garden

vertexai.init(project=${PROJECT_ID}, location=${REGION})
custom_model = model_garden.CustomModel(
  gcs_uri=GCS_URI,
)
endpoint = custom_model.deploy()

endpoint.predict(instances=[{"prompt": "${PROMPT}"}], use_dedicated_endpoint=True)

curl


curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://${REGION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${REGION}:deploy" \
  -d '{
    "custom_model": {
    "gcs_uri": "'"${MODEL_GCS}"'"
  },
  "destination": "projects/'"${PROJECT_ID}"'/locations/'"${REGION}"'",
  "model_config": {
     "model_user_id": "'"${MODEL_ID}"'",
  },
}'

或者，您也可以使用 API 显式设置机器类型。


curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://${REGION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${REGION}:deploy" \
  -d '{
    "custom_model": {
    "gcs_uri": "'"${MODEL_GCS}"'"
  },
  "destination": "projects/'"${PROJECT_ID}"'/locations/'"${REGION}"'",
  "model_config": {
     "model_user_id": "'"${MODEL_ID}"'",
  },
  "deploy_config": {
    "dedicated_resources": {
      "machine_spec": {
        "machine_type": "'"${MACHINE_TYPE}"'",
        "accelerator_type": "'"${ACCELERATOR_TYPE}"'",
        "accelerator_count": '"${ACCELERATOR_COUNT}"'
      },
      "min_replica_count": 1
    }
  }
}'

详细了解 Vertex AI 中的自行部署模型

如需详细了解自行部署的模型，请参阅自行部署的模型概览。
如需详细了解 Model Garden，请参阅 Model Garden 概览。
如需详细了解如何部署模型，请参阅使用 Model Garden 中的模型。
使用 Gemma 开放模型
使用 Llama 开放模型
使用 Hugging Face 开放模型