教程:使用 Vertex AI SDK 中的 GenAI 客户端执行评估

本页介绍了如何使用 Vertex AI SDK 中的 GenAI 客户端,在各种应用场景中评估生成式 AI 模型和应用。

准备工作

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

    In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

    Verify that billing is enabled for your Google Cloud project.

    In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

    Verify that billing is enabled for your Google Cloud project.

  2. 安装 Vertex AI SDK for Python:

    !pip install google-cloud-aiplatform[evaluation]
    
  3. 设置凭据。如果您是在 Colaboratory 中运行本教程,请运行以下命令:

    from google.colab import auth
    auth.authenticate_user()
    

    对于其他环境,请参阅向 Vertex AI 进行身份验证

  4. 生成回答

    使用 run_inference() 为数据集生成模型回答:

    1. 以 Pandas DataFrame 的形式准备数据集

      import pandas as pd
      
      eval_df = pd.DataFrame({
        "prompt": [
            "Explain software 'technical debt' using a concise analogy of planting a garden.",
            "Write a Python function to find the nth Fibonacci number using recursion with memoization, but without using any imports.",
            "Write a four-line poem about a lonely robot, where every line must be a question and the word 'and' cannot be used.",
            "A drawer has 10 red socks and 10 blue socks. In complete darkness, what is the minimum number of socks you must pull out to guarantee you have a matching pair?",
            "An AI discovers a cure for a major disease, but the cure is based on private data it analyzed without consent. Should the cure be released? Justify your answer."
        ]
      })
      
    2. 使用 run_inference() 生成模型回答:

      eval_dataset = client.evals.run_inference(
        model="gemini-2.5-flash",
        src=eval_df,
      )
      
    3. 通过对 EvaluationDataset 对象调用 .show()直观呈现推理结果,以便检查模型输出以及原始提示和参考内容:

      eval_dataset.show()
      

    下图显示了包含提示及其相应生成回答的评估数据集:

    一个表格,显示了包含提示和回答列的评估数据集。

    运行评估

    运行 evaluate() 以评估模型回答:

    1. 使用默认的 GENERAL_QUALITY 基于自适应评分标准的指标评估模型回答:

      eval_result = client.evals.evaluate(dataset=eval_dataset)
      
    2. 通过对 EvaluationResult 对象调用 .show()直观呈现评估结果,以显示摘要指标和详细结果:

      eval_result.show()
      

    下图显示了一份评估报告,其中显示了每个提示-回答对的摘要指标和详细结果。

    一份评估报告,其中显示了摘要指标以及每个提示-回答对的详细结果。

    清理

    本教程中未创建任何 Vertex AI 资源。

    后续步骤