本頁面由 Cloud Translation API 翻譯而成。

評估模型效能

這段範例程式碼示範如何評估 GenAI 模型的效能。這個範例會展示如何定義評估規格、評估模型，以及擷取評估指標。

深入探索

如需包含這個程式碼範例的詳細說明文件，請參閱下列內容：

執行以運算式為準的評估管道

程式碼範例

Python

在試用這個範例之前，請先按照Python使用用戶端程式庫的 Vertex AI 快速入門中的操作說明進行設定。詳情請參閱 Vertex AI Python API 參考說明文件。

如要向 Vertex AI 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

import os

from google.auth import default

import vertexai
from vertexai.preview.language_models import (
    EvaluationTextClassificationSpec,
    TextGenerationModel,
)

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def evaluate_model() -> object:
    """Evaluate the performance of a generative AI model."""

    # Set credentials for the pipeline components used in the evaluation task
    credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])

    vertexai.init(project=PROJECT_ID, location="us-central1", credentials=credentials)

    # Create a reference to a generative AI model
    model = TextGenerationModel.from_pretrained("text-bison@002")

    # Define the evaluation specification for a text classification task
    task_spec = EvaluationTextClassificationSpec(
        ground_truth_data=[
            "gs://cloud-samples-data/ai-platform/generative_ai/llm_classification_bp_input_prompts_with_ground_truth.jsonl"
        ],
        class_names=["nature", "news", "sports", "health", "startups"],
        target_column_name="ground_truth",
    )

    # Evaluate the model
    eval_metrics = model.evaluate(task_spec=task_spec)
    print(eval_metrics)
    # Example response:
    # ...
    # PipelineJob run completed.
    # Resource name: projects/123456789/locations/us-central1/pipelineJobs/evaluation-llm-classification-...
    # EvaluationClassificationMetric(label_name=None, auPrc=0.53833705, auRoc=0.8...

    return eval_metrics

後續步驟

如要搜尋及篩選其他 Google Cloud 產品的程式碼範例，請參閱Google Cloud 範例瀏覽器。