Mulai 29 April 2025, model Gemini 1.5 Pro dan Gemini 1.5 Flash tidak tersedia di project yang belum pernah menggunakan model ini, termasuk project baru. Untuk mengetahui detailnya, lihat Versi dan siklus proses model.

Halaman ini diterjemahkan oleh Cloud Translation API.

Tutorial: Melakukan evaluasi menggunakan Python SDK

Halaman ini menunjukkan cara melakukan evaluasi berbasis model dengan layanan evaluasi Gen AI menggunakan Vertex AI SDK untuk Python.

Sebelum memulai

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.
Instal Vertex AI SDK for Python dengan dependensi layanan evaluasi Gen AI:
```
!pip install google-cloud-aiplatform[evaluation]
```

Siapkan kredensial Anda. Jika Anda menjalankan panduan memulai ini di Colaboratory, jalankan perintah berikut:

from google.colab import auth
auth.authenticate_user()

Untuk lingkungan lain, lihat Mengautentikasi ke Vertex AI.

Mengimpor library

Impor library dan siapkan project serta lokasi Anda.

import pandas as pd

import vertexai
from vertexai.evaluation import EvalTask, PointwiseMetric, PointwiseMetricPromptTemplate
from google.cloud import aiplatform

PROJECT_ID = "PROJECT_ID"
LOCATION = "LOCATION"
EXPERIMENT_NAME = "EXPERIMENT_NAME"

vertexai.init(
    project=PROJECT_ID,
    location=LOCATION,
)

Perhatikan bahwa EXPERIMENT_NAME hanya dapat berisi karakter alfanumerik huruf kecil dan tanda hubung, hingga maksimum 127 karakter.

Siapkan metrik evaluasi berdasarkan kriteria Anda

Definisi metrik berikut mengevaluasi kualitas teks yang dihasilkan dari model bahasa besar berdasarkan dua kriteria: Fluency dan Entertaining. Kode ini menentukan metrik yang disebut custom_text_quality menggunakan dua kriteria tersebut:

custom_text_quality = PointwiseMetric(
    metric="custom_text_quality",
    metric_prompt_template=PointwiseMetricPromptTemplate(
        criteria={
            "fluency": (
                "Sentences flow smoothly and are easy to read, avoiding awkward"
                " phrasing or run-on sentences. Ideas and sentences connect"
                " logically, using transitions effectively where needed."
            ),
            "entertaining": (
                "Short, amusing text that incorporates emojis, exclamations and"
                " questions to convey quick and spontaneous communication and"
                " diversion."
            ),
        },
        rating_rubric={
            "1": "The response performs well on both criteria.",
            "0": "The response is somewhat aligned with both criteria",
            "-1": "The response falls short on both criteria",
        },
    ),
)

Menyiapkan set data

Tambahkan kode berikut untuk menyiapkan set data Anda:

responses = [
    # An example of good custom_text_quality
    "Life is a rollercoaster, full of ups and downs, but it's the thrill that keeps us coming back for more!",
    # An example of medium custom_text_quality
    "The weather is nice today, not too hot, not too cold.",
    # An example of poor custom_text_quality
    "The weather is, you know, whatever.",
]

eval_dataset = pd.DataFrame({
    "response" : responses,
})

Menjalankan evaluasi dengan set data Anda

Jalankan evaluasi:

eval_task = EvalTask(
    dataset=eval_dataset,
    metrics=[custom_text_quality],
    experiment=EXPERIMENT_NAME
)

pointwise_result = eval_task.evaluate()

Lihat hasil evaluasi untuk setiap respons di metrics_tablePandas DataFrame:

pointwise_result.metrics_table

Pembersihan

Agar akun Google Cloud Anda tidak dikenai biaya untuk resource yang digunakan pada halaman ini, ikuti langkah-langkah berikut.

Hapus ExperimentRun yang dibuat oleh evaluasi:

aiplatform.ExperimentRun(
    run_name=pointwise_result.metadata["experiment_run"],
    experiment=pointwise_result.metadata["experiment"],
).delete()