このページは Cloud Translation API によって翻訳されました。

オープンモデルをチューニングする

このページでは、Llama 3.1 などのオープンモデルで教師ありファインチューニングを行う方法について説明します。

サポートされているチューニングモード

フルファインチューニング
Low-Rank Adaptation（LoRA）: LoRA は、パラメータのサブセットのみを調整するパラメータ効率チューニングモードです。フルファインチューニングよりも費用対効果が高く、必要なトレーニングデータも少なくなります。一方、フルファインチューニングでは、すべてのパラメータを調整することで、品質を高めることができます。

サポートされているモデル

Gemma 3 27B IT^**（google/gemma-3-27b-it）
Llama 3.1 8B（meta/llama3_1@llama-3.1-8b）
Llama 3.1 8B Instruct（meta/llama3_1@llama-3.1-8b-instruct）
Llama 3.2 1B Instruct^*（meta/llama3-2@llama-3.2-1b-instruct）
Llama 3.2 3B Instruct^*（meta/llama3-2@llama-3.2-3b-instruct）
Llama 3.3 70B Instruct（meta/llama3-3@llama-3.3-70b-instruct）
Qwen 3 32B^**（qwen/qwen3@qwen3-32b）

^* フルファインチューニングのみをサポート

^** パラメータエフィシエントファインチューニングのみをサポート

始める前に

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI and Cloud Storage APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI and Cloud Storage APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

Vertex AI SDK for Python をインストールして初期化する

次のライブラリをインポートします。

import os
import time
import uuid
import vertexai

vertexai.init(project=PROJECT_ID, location=REGION)

from google.cloud import aiplatform
from vertexai.preview.tuning import sft, SourceModel

チューニング用のデータセットを準備する

チューニングにはトレーニングデータセットが必要です。チューニングされたモデルのパフォーマンスを評価する場合は、オプションの検証データセットを準備することをおすすめします。

データセットは、次のいずれかのサポートされている JSON Lines（JSONL）形式にする必要があります。各行には 1 つのチューニングサンプルが含まれます。

プロンプトの完了

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

ターンベースのチャット形式

{"messages": [
  {"content": "You are a chatbot that helps with scientific literature and generates state-of-the-art abstracts from articles.",
    "role": "system"},
  {"content": "Summarize the paper in one paragraph.",
    "role": "user"},
  {"content": " Here is a one paragraph summary of the paper:\n\nThe paper describes PaLM, ...",
    "role": "assistant"}
]}

JSONL ファイルを Cloud Storage にアップロードします。

チューニングジョブの作成

次の項目を調整できます。

Llama 3.1 などのサポートされているベースモデル
サポートされているベースモデルのいずれかと同じアーキテクチャを持つモデル。これは、Hugging Face などのリポジトリのカスタムモデルチェックポイント、または Vertex AI チューニングジョブから以前にチューニングされたモデルのいずれかになります。これにより、すでにチューニングされているモデルのチューニングを続行できます。

Cloud Console

ファインチューニングは次の方法で開始できます。
- モデルカードに移動して [微調整] をクリックし、[マネージドチューニング] を選択します。
  
  Llama 3.1 モデルカードに移動
  
  または
- [チューニング] ページに移動し、[チューニング済みモデルを作成] をクリックします。
  
  [チューニング] に移動
パラメータを入力し、[チューニングを開始] をクリックします。

これにより、チューニングジョブが開始されます。このジョブは、[チューニング] ページの [マネージドチューニング] タブで確認できます。

チューニングジョブが完了すると、[詳細] タブでチューニングされたモデルに関する情報を確認できます。

Vertex AI SDK for Python

パラメータ値を独自の値に置き換えて、次のコードを実行してチューニングジョブを作成します。

sft_tuning_job = sft.preview_train(
    source_model=SourceModel(
      base_model="meta/llama3_1@llama-3.1-8b",
      # Optional, folder that either a custom model checkpoint or previously tuned model
      custom_base_model="gs://{STORAGE-URI}",
    ),
    tuning_mode="FULL", # FULL or PEFT_ADAPTER
    epochs=3,
    train_dataset="gs://{STORAGE-URI}", # JSONL file
    validation_dataset="gs://{STORAGE-URI}", # JSONL file
    output_uri="gs://{STORAGE-URI}",
)

ジョブが完了すると、チューニングされたモデルのモデルアーティファクトが <output_uri>/postprocess/node-0/checkpoints/final フォルダに保存されます。

チューニング済みモデルをデプロイする

チューニングされたモデルは、Vertex AI エンドポイントにデプロイできます。チューニングされたモデルを Cloud Storage からエクスポートして、別の場所にデプロイすることもできます。

チューニングされたモデルを Vertex AI エンドポイントにデプロイするには:

Cloud Console

[Model Garden] ページに移動し、[カスタムの重みを使用してモデルをデプロイ] をクリックします。

Model Garden に移動
パラメータを入力して、[デプロイ] をクリックします。

Vertex AI SDK for Python

ビルド済みコンテナを使用して G2 machine をデプロイします。

from vertexai.preview import model_garden

MODEL_ARTIFACTS_STORAGE_URI = "gs://{STORAGE-URI}/postprocess/node-0/checkpoints/final"

model = model_garden.CustomModel(
    gcs_uri=MODEL_ARTIFACTS_STORAGE_URI,
)

# deploy the model to an endpoint using GPUs. Cost will incur for the deployment
endpoint = model.deploy(
  machine_type="g2-standard-12",
  accelerator_type="NVIDIA_L4",
  accelerator_count=1,
)

推論を取得する

デプロイが成功すると、テキストプロンプトを使用してエンドポイントにリクエストを送信できます。最初の数個のプロンプトの実行には時間がかかります。

# Loads the deployed endpoint
endpoint = aiplatform.Endpoint("projects/{PROJECT_ID}/locations/{REGION}/endpoints/{endpoint_name}")

prompt = "Summarize the following article. Article: Preparing a perfect risotto requires patience and attention to detail. Begin by heating butter in a large, heavy-bottomed pot over medium heat. Add finely chopped onions and minced garlic to the pot, and cook until they're soft and translucent, about 5 minutes. Next, add Arborio rice to the pot and cook, stirring constantly, until the grains are coated with the butter and begin to toast slightly. Pour in a splash of white wine and cook until it's absorbed. From there, gradually add hot chicken or vegetable broth to the rice, stirring frequently, until the risotto is creamy and the rice is tender with a slight bite.. Summary:"

# Define input to the prediction call
instances = [
    {
        "prompt": "What is a car?",
        "max_tokens": 200,
        "temperature": 1.0,
        "top_p": 1.0,
        "top_k": 1,
        "raw_response": True,
    },
]

# Request the prediction
response = endpoint.predict(
    instances=instances
)

for prediction in response.predictions:
    print(prediction)

デプロイされたモデルから推論を取得する方法の詳細については、オンライン推論を取得するをご覧ください。

マネージドオープンモデルでは、デプロイされたモデルで使用される predict メソッドではなく、chat.completions メソッドが使用されます。マネージドモデルから推論を取得する方法については、Llama モデルを呼び出すをご覧ください。

制限と割り当て

同時チューニングジョブの数に割り当てが適用されます。どのプロジェクトにも、少なくとも 1 つのチューニングジョブを実行するためのデフォルトの割り当てがあります。これはグローバル割り当てであり、利用可能なすべてのリージョンとサポートされているモデルで共有されます。複数のジョブを同時に実行する場合は、Global concurrent managed OSS model fine-tuning jobs per project の追加の割り当てをリクエストする必要があります。

料金

チューニングの料金は、モデルチューニングの料金に基づいて請求されます。

Cloud Storage や Vertex AI Prediction などの関連サービスに対しても課金されます。

Vertex AI の料金と Cloud Storage の料金をご覧ください。また、料金計算ツールを使用すると、予想される使用量に基づいて費用を見積もることができます。

次のステップ

チューニング済みモデルを評価する

オープンモデルをチューニングする

サポートされているチューニング モード

サポートされているモデル

始める前に

チューニング用のデータセットを準備する

プロンプトの完了

ターンベースのチャット形式

チューニング ジョブの作成

Cloud Console

Vertex AI SDK for Python

チューニング済みモデルをデプロイする

Cloud Console

Vertex AI SDK for Python

推論を取得する

制限と割り当て

料金

次のステップ

サポートされているチューニングモード

チューニングジョブの作成