教學課程:使用 Python SDK 執行評估
本頁說明如何使用 Vertex AI SDK for Python,透過 Gen AI Evaluation Service 執行模型評估。
事前準備
-
Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Verify that billing is enabled for your Google Cloud project.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Verify that billing is enabled for your Google Cloud project.
安裝 Vertex AI SDK for Python,並加入 Gen AI Evaluation Service 依附元件:
!pip install google-cloud-aiplatform[evaluation]
設定憑證。如果您是在 Colaboratory 中執行本快速入門導覽課程,請執行下列指令:
from google.colab import auth auth.authenticate_user()
如為其他環境,請參閱「向 Vertex AI 進行驗證」。
匯入程式庫
匯入程式庫,並設定專案和位置。
import pandas as pd import vertexai from vertexai.evaluation import EvalTask, PointwiseMetric, PointwiseMetricPromptTemplate from google.cloud import aiplatform PROJECT_ID = "PROJECT_ID" LOCATION = "LOCATION" EXPERIMENT_NAME = "EXPERIMENT_NAME" vertexai.init( project=PROJECT_ID, location=LOCATION, )
請注意,
EXPERIMENT_NAME
最多只能包含 127 個小寫英數字元和連字號。根據條件設定評估指標
下列指標定義會根據
Fluency
和Entertaining
這兩項條件,評估大型語言模型生成的文字品質。程式碼會使用這兩項條件定義名為custom_text_quality
的指標:custom_text_quality = PointwiseMetric( metric="custom_text_quality", metric_prompt_template=PointwiseMetricPromptTemplate( criteria={ "fluency": ( "Sentences flow smoothly and are easy to read, avoiding awkward" " phrasing or run-on sentences. Ideas and sentences connect" " logically, using transitions effectively where needed." ), "entertaining": ( "Short, amusing text that incorporates emojis, exclamations and" " questions to convey quick and spontaneous communication and" " diversion." ), }, rating_rubric={ "1": "The response performs well on both criteria.", "0": "The response is somewhat aligned with both criteria", "-1": "The response falls short on both criteria", }, ), )
準備資料集
新增下列程式碼,準備資料集:
responses = [ # An example of good custom_text_quality "Life is a rollercoaster, full of ups and downs, but it's the thrill that keeps us coming back for more!", # An example of medium custom_text_quality "The weather is nice today, not too hot, not too cold.", # An example of poor custom_text_quality "The weather is, you know, whatever.", ] eval_dataset = pd.DataFrame({ "response" : responses, })
使用資料集執行評估
執行評估作業:
eval_task = EvalTask( dataset=eval_dataset, metrics=[custom_text_quality], experiment=EXPERIMENT_NAME ) pointwise_result = eval_task.evaluate()
在
metrics_table
Pandas DataFrame 中查看每項回覆的評估結果:pointwise_result.metrics_table
清除所用資源
如要避免系統向您的 Google Cloud 帳戶收取本頁所用資源的費用,請按照下列步驟操作。
刪除評估作業建立的
ExperimentRun
:aiplatform.ExperimentRun( run_name=pointwise_result.metadata["experiment_run"], experiment=pointwise_result.metadata["experiment"], ).delete()
後續步驟