调优开放模型

本页面介绍了如何对 Llama 3.1 等开放模型执行监督式微调。

支持的调优模式

全面微调
低秩自适应 (LoRA)：LoRA 是一种参数高效的调优模式，仅调整部分参数。与全面微调相比，它的成本效益更高，所需的训练数据更少。另一方面，通过调整所有参数，全面微调有可能达到更高的质量。

支持的模型

Gemma 3 27B IT^** (google/gemma-3-27b-it)
Llama 3.1 8B (meta/llama3_1@llama-3.1-8b)
Llama 3.1 8B Instruct (meta/llama3_1@llama-3.1-8b-instruct)
Llama 3.2 1B Instruct^* (meta/llama3-2@llama-3.2-1b-instruct)
Llama 3.2 3B Instruct^* (meta/llama3-2@llama-3.2-3b-instruct)
Llama 3.3 70B Instruct (meta/llama3-3@llama-3.3-70b-instruct)
Qwen 3 32B^** (qwen/qwen3@qwen3-32b)

^* 仅支持全面微调

^** 仅支持参数高效微调

准备工作

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI and Cloud Storage APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI and Cloud Storage APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

安装并初始化 Python 版 Vertex AI SDK

导入以下库：

import os
import time
import uuid
import vertexai

vertexai.init(project=PROJECT_ID, location=REGION)

from google.cloud import aiplatform
from vertexai.preview.tuning import sft, SourceModel

准备用于调优的数据集

调优需要训练数据集。如果您想评估调优模型的性能，建议准备一个可选的验证数据集。

您的数据集必须采用以下受支持的 JSON 行 (JSONL) 格式之一，其中每行包含一个调优示例。

提示完成

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

基于回合的聊天格式

{"messages": [
  {"content": "You are a chatbot that helps with scientific literature and generates state-of-the-art abstracts from articles.",
    "role": "system"},
  {"content": "Summarize the paper in one paragraph.",
    "role": "user"},
  {"content": " Here is a one paragraph summary of the paper:\n\nThe paper describes PaLM, ...",
    "role": "assistant"}
]}

将 JSONL 文件上传到 Cloud Storage。

创建调优作业

您可以通过以下来源调优：

受支持的基础模型，例如 Llama 3.1
与支持的基础模型之一具有相同架构的模型。这可以是来自 Hugging Face 等代码库的自定义模型检查点，也可以是来自 Vertex AI 调优作业的之前调优的模型。这样，您就可以继续对已调优的模型进行调优。

Cloud 控制台

您可以通过以下方式启动微调：
- 前往模型卡片，点击微调，然后选择托管式调优。
  
  前往 Llama 3.1 模型卡片
  
  或
- 前往调优页面，然后点击创建调优的模型。
  
  前往“调优”
填写参数，然后点击开始调优。

这会启动一个调优作业，您可以在“调优”页面中的托管式调优标签页下看到该作业。

调优作业完成后，您可以在详细信息标签页中查看有关调优的模型的信息。

Vertex AI SDK for Python

将参数值替换为您自己的值，然后运行以下代码以创建调优作业：

sft_tuning_job = sft.preview_train(
    source_model=SourceModel(
      base_model="meta/llama3_1@llama-3.1-8b",
      # Optional, folder that either a custom model checkpoint or previously tuned model
      custom_base_model="gs://{STORAGE-URI}",
    ),
    tuning_mode="FULL", # FULL or PEFT_ADAPTER
    epochs=3,
    train_dataset="gs://{STORAGE-URI}", # JSONL file
    validation_dataset="gs://{STORAGE-URI}", # JSONL file
    output_uri="gs://{STORAGE-URI}",
)

作业完成后，调优的模型的模型制品会存储在 <output_uri>/postprocess/node-0/checkpoints/final 文件夹中。

部署调优的模型

您可以将调整的模型部署到 Vertex AI 端点。您还可以从 Cloud Storage 导出调优的模型，并将其部署到其他位置。

如需将调优的模型部署到 Vertex AI 端点，请执行以下操作：

Cloud 控制台

前往 Model Garden 页面，然后点击部署具有自定义权重的模型。

转到 Model Garden
填写参数，然后点击部署。

Vertex AI SDK for Python

使用预构建容器部署 G2 machine：

from vertexai.preview import model_garden

MODEL_ARTIFACTS_STORAGE_URI = "gs://{STORAGE-URI}/postprocess/node-0/checkpoints/final"

model = model_garden.CustomModel(
    gcs_uri=MODEL_ARTIFACTS_STORAGE_URI,
)

# deploy the model to an endpoint using GPUs. Cost will incur for the deployment
endpoint = model.deploy(
  machine_type="g2-standard-12",
  accelerator_type="NVIDIA_L4",
  accelerator_count=1,
)

获取推理

成功部署后，您就可以向端点发送包含文本提示的请求。请注意，前几个提示需要更长时间才能执行完毕。

# Loads the deployed endpoint
endpoint = aiplatform.Endpoint("projects/{PROJECT_ID}/locations/{REGION}/endpoints/{endpoint_name}")

prompt = "Summarize the following article. Article: Preparing a perfect risotto requires patience and attention to detail. Begin by heating butter in a large, heavy-bottomed pot over medium heat. Add finely chopped onions and minced garlic to the pot, and cook until they're soft and translucent, about 5 minutes. Next, add Arborio rice to the pot and cook, stirring constantly, until the grains are coated with the butter and begin to toast slightly. Pour in a splash of white wine and cook until it's absorbed. From there, gradually add hot chicken or vegetable broth to the rice, stirring frequently, until the risotto is creamy and the rice is tender with a slight bite.. Summary:"

# Define input to the prediction call
instances = [
    {
        "prompt": "What is a car?",
        "max_tokens": 200,
        "temperature": 1.0,
        "top_p": 1.0,
        "top_k": 1,
        "raw_response": True,
    },
]

# Request the prediction
response = endpoint.predict(
    instances=instances
)

for prediction in response.predictions:
    print(prediction)

如需详细了解如何从部署的模型获取推理，请参阅获取在线推理。

请注意，托管式开放模型使用 chat.completions 方法，而不是已部署模型使用的 predict 方法。如需详细了解如何从托管式模型获取推理，请参阅调用 Llama 模型。

限制和配额

系统对并发调优作业的数量实施配额。每个项目都配有运行至少一个调优作业的默认配额。这是一个全球配额，所有可用区域和支持的模型共用这一配额。如果要同时运行更多作业，则需要为 Global concurrent managed OSS model fine-tuning jobs per project 申请更多配额。

价格

您需要根据模型调优价格支付调优费用。

您还需要为相关服务（例如 Cloud Storage 和 Vertex AI Prediction）付费。

了解 Vertex AI 价格和 Cloud Storage 价格，并使用价格计算器根据预计使用量来生成估算的费用。

后续步骤

评估调优后的模型