Code Parser 扩展程序

本文档介绍如何通过 Google Cloud 控制台和 Vertex AI API 注册和使用 Google 提供的代码解释器扩展程序。借助此扩展程序,您可以生成并运行 Python 代码来实现以下目的:

  • 分析、清理、转换和重塑数据集
  • 在图表和图形中直观呈现数据
  • 运行计算

代码解释器扩展程序使用 code_interpreter_tool 来根据自然语言说明生成和运行 Python 代码。code_interpreter_toolOpenAPI 规范 code_interpreter.yaml 文件中定义。

openapi: "3.0.0"
info:
  version: 1.0.0
  title: code_interpreter_tool
  description: >
    This tool supports the following operations based on user input:

    1. **Generates and Executes Code:** Accepts an user query in natural language, generates corresponding code, and executes it to produce results for the user query.


    Supported AuthTypes:

    - `GOOGLE_SERVICE_ACCOUNT_AUTH`: (Vertex AI Extension Service Agent is supported).
paths:
  /generate_and_execute:
    post:
      operationId: generate_and_execute
      description: >
        Get the results of a natural language query by generating and executing a code snippet.
        Example queries: "Find the max in [1, 2, 5]" or "Plot average sales by year (from data.csv)".
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required:
              - query
              properties:
                query:
                  type: string
                  description: >
                    Required. The Natural language query to get the results for.
                    The query string can optionally contain data to use for the code generated.
                    For example: "I have a list of numbers: [1, 2, 3, 4]. Find the largest number in the provided data."
                timeout:
                  type: number
                  description: >
                    Optional. Timeout in miliseconds for the code execution. Default value: 30000.
                files:
                  type: array
                  description: >
                    Optional. Input files to use when executing the generated code.
                    If specified, the file contents are expected be base64-encoded.
                    For example: [{"name": "data.csv", "contents": "aXRlbTEsaXRlbTI="}]

                    Only one of `file_gcs_uris` and `files` field should be provided.
                  items:
                    $ref: "#/components/schemas/File"
                file_gcs_uris:
                  type: array
                  description: >
                    Optional. GCS URIs of input files to use when executing the generated code.
                    For example: ["gs://input-bucket/data.csv"]

                    Only one of `file_gcs_uris` and `files` field should be provided.
                    This option is only applicable when `file_input_gcs_bucket` is specified in `Extension.CodeInterpreterRuntimeConfig`.
                  items:
                    type: string
      responses:
        '200':
          description: >
            The results of generating and executing code based on the natual language query.
            The result contains the generated code, and the STDOUT, STDERR, and output files from code execution.
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/GenerationAndExecutionResult"
components:
  schemas:
    File:
      description: >
        File used as inputs and outputs of code execution. The `contents` string should be base64-encoded bytes.
        For example: [{"name": "data.csv", "contents": "aXRlbTEsaXRlbTI="}]
      type: object
      properties:
        name:
          type: string
        contents:
          type: string
          format: byte
    GenerationAndExecutionResult:
      description: >
        The results of generating and executing code based on the natual language query.
      properties:
        generated_code:
          type: string
          description: >
            The generated code in markdown format.
            For example: "```python\nprint(\"Hello World\")\n```"
        execution_result:
          type: string
          description: >
            The code execution result string from STDOUT.
        execution_error:
          type: string
          description: >
            The code execution error string from STDERR.
        output_files:
          type: array
          description: >
            The output files generated from code execution.
            If present, the file contents are required be base64-encoded.
            For example: [{"name": "data.csv", "contents": "aXRlbTEsaXRlbTI="}]
          items:
            $ref: "#/components/schemas/File"
        output_gcs_uris:
          type: array
          description: >
            The output GCS URIs of files generated from code execution.
            For example: ["gs://output-bucket/subfolder/output.csv"]

            This field is only applicable when `file_output_gcs_bucket` is specified in `Extension.CodeInterpreterRuntimeConfig`.
          items:
            type: string

    

如需通过端到端教程了解 Google 扩展程序,请参阅以下 Jupyter 笔记本:

准备工作

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Vertex AI API.

    Enable the API

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project.

  7. Enable the Vertex AI API.

    Enable the API

注册、查询和运行代码解释器扩展程序

以下部分介绍如何使用 Google Cloud 控制台和 Vertex AI API 注册 Code Interpreter 扩展程序。注册扩展程序后,您便可以使用 Google Cloud 控制台查询扩展程序,也可以使用 Vertex AI API 执行扩展程序。

控制台

注册扩展程序

执行以下步骤,使用 Google Cloud 控制台注册 Code Parser 扩展程序。

  1. 在 Google Cloud 控制台中,转到 Vertex AI Extensions页面。

    转到“Vertex AI Extensions”

  2. 点击创建扩展程序

  3. 创建新扩展程序对话框中,执行以下操作:

    • 扩展程序名称:输入扩展程序的名称,例如“code_Interpreter_extension”。
    • 说明:(可选)输入扩展程序说明,例如“代码解释器扩展程序”。
    • 扩展程序类型:选择 Code interpreter
  4. 在现在显示的 OpenAPI 规范文件部分中,确认已正确设置以下字段:

    • API 名称: code_interpreter_tool
    • API 说明: Tool to generate and run valid Python code from a natural language description, or to run custom Python code...
    • 来源: Cloud Storage
    • OpenAPI 规范 vertex-extension-public/code_interpreter.yaml
    • 身份验证: Google service account
  5. (可选)在运行时配置部分中,提供输入存储桶和输出存储桶。

  6. 点击创建扩展程序

(可选)查询扩展程序

您可以使用 Google Cloud 控制台试验 Code Interpreter 扩展程序。执行以下步骤,使用自然语言提示调用扩展程序。

  1. 在 Google Cloud 控制台中,转到 Vertex AI Extensions页面。

    转到“Vertex AI Extensions”

  2. 点击 Code Interpreter 扩展程序名称以打开扩展程序详情页面。

    代码解释器名称。

  3. 输入消息框中,输入查询,然后查看回答。展开“扩展程序响应”部分,以查看扩展程序生成并运行以生成结果的代码。

    以下示例显示了计算用户输入一系列数字的平均值的查询结果。

    平均值查询。

REST

注册扩展程序

提交 Vertex AI API extensions.import 请求以注册 Code Interpreter 扩展程序。

在使用任何请求数据之前,请先进行以下替换:

HTTP 方法和网址:

POST https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/extensions:import

请求 JSON 正文:

{
  "displayName":"DISPLAY_NAME",
  "description":"DESCRIPTION",
  "manifest":{
    "name":"code_interpreter_tool",
    "description":"A Google Code Interpreter tool",
    "apiSpec":{
      "openApiGcsUri":"gs://vertex-extension-public/code_interpreter.yaml"
    },
    "authConfig":{
      "authType":"GOOGLE_SERVICE_ACCOUNT_AUTH",
      "googleServiceAccountConfig":{
        "serviceAccount":"SERVICE_ACCOUNT"
      }
    }
  }
  "runtimeConfig": {
     "codeInterpreterRuntimeConfig": {
        "fileInputGcsBucket": "INPUT_BUCKET",
        "fileOutputGcsBucket": "OUTPUT_BUCKET"
     }
  }
}

如需发送请求,请选择以下方式之一:

curl

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/extensions:import"

PowerShell

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/extensions:import" | Select-Object -Expand Content

运行扩展程序

您可以向 Vertex AI API 提交 execute 操作,以根据自然语言查询生成和运行 Python 代码。

查询示例:

  • 简单查询:查找数字列表中的最大值。
  • 查询内嵌数据:要查询的数据在请求正文中提供。
  • 文件数据查询:输出文件数据。
  • Cloud Storage 数据查询:读取 Cloud Storage 数据。

简单查询

在使用任何请求数据之前,请先进行以下替换:

  • PROJECT_ID:您的 Google Cloud 项目的 ID。
  • REGIONCompute Engine 区域
  • EXTENSION_ID:Google Cloud 控制台的扩展程序详情中列出的代码解释器扩展程序的 ID。

HTTP 方法和网址:

POST https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/extensions/EXTENSION_ID:execute

请求 JSON 正文:

{
  "operation_id":"generate_and_execute",
  "operation_params":{
    "query":"find the max value in the list: [1,2,3,4,-5]"
  }
}

如需发送请求,请选择以下方式之一:

curl

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/extensions/EXTENSION_ID:execute"

PowerShell

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/extensions/EXTENSION_ID:execute" | Select-Object -Expand Content

内嵌数据

在使用任何请求数据之前,请先进行以下替换:

  • PROJECT_ID:您的 Google Cloud 项目的 ID。
  • REGIONCompute Engine 区域
  • EXTENSION_ID:Google Cloud 控制台的扩展程序详情中列出的代码解释器扩展程序的 ID。

HTTP 方法和网址:

POST https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/extensions/EXTENSION_ID:execute

请求 JSON 正文:

{
  "operation_id":"generate_and_execute",
  "operation_params":{
    "query":"Calculate the total values of each column(mobile_subscribers, percent_internet_users, total_internet_users, fixed_broadband_subscribers) from the below dataset.\n\n\ncountry_name        country_code        year        mobile_subscribers        percent_internet_users        total_internet_users        fixed_broadband_subscribers\nUnited States        US        2023        333.4        90.5        303.1        200.3\nChina        CN        2023        1.613        70.2        1131.4        512.2\nIndia        IN        2023        1.165        50.7        688.5        557.2\nJapan        JP        2023        124.3        88.2        109.5        114.8\nGermany        DE        2023        102.1        90.5        92.1        100\nUnited Kingdom        UK        2023        67.1        92.7        62.2        65\nFrance        FR        2023        66.7        89        63        69.7\nBrazil        BR        2023        213.5        68        144.1        69.4\nRussia        RU        2023        203.8        74.9        152.7        51.1"
  }
}

如需发送请求,请选择以下方式之一:

curl

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/extensions/EXTENSION_ID:execute"

PowerShell

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/extensions/EXTENSION_ID:execute" | Select-Object -Expand Content

文件输出

在使用任何请求数据之前,请先进行以下替换:

  • PROJECT_ID:您的 Google Cloud 项目的 ID。
  • REGIONCompute Engine 区域
  • EXTENSION_ID:Google Cloud 控制台的扩展程序详情中列出的代码解释器扩展程序的 ID。
  • FILE_NAME:请求正文中的 CSV 文件数据将写入工作目录中的此文件。
  • BASE64_ENCODED_FILE_BYTES:请求正文中的文件字节必须采用 base64 编码。

HTTP 方法和网址:

POST https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/extensions/EXTENSION_ID:execute

请求 JSON 正文:

{
  "operation_id":"generate_and_execute",
  "operation_params":{
    "query":"print the csv file",
    "files":[
      {
        "name":"FILE_NAME",
        "contents":"BASE64_ENCODED_FILE_BYTES"
      }
    ]
  }
}

如需发送请求,请选择以下方式之一:

curl

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/extensions/EXTENSION_ID:execute"

PowerShell

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/extensions/EXTENSION_ID:execute" | Select-Object -Expand Content

Cloud Storage 读取

在使用任何请求数据之前,请先进行以下替换:

  • PROJECT_ID:您的 Google Cloud 项目的 ID。
  • REGIONCompute Engine 区域
  • EXTENSION_ID:Google Cloud 控制台的扩展程序详情中列出的代码解释器扩展程序的 ID。
  • BUCKET_NAME:包含要输出的 CSV 文件的 Cloud Storage 存储桶。注册代码解释器扩展程序时必须指定此输入存储桶。
  • FILE_NAMEBUCKET_NAME 中要输出的 CSV 文件数据。

HTTP 方法和网址:

POST https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/extensions/EXTENSION_ID:execute

请求 JSON 正文:

{
  "operation_id":"generate_and_execute",
  "operation_params":{
    "query":"print the csv file",
    "file_gcs_uris": ["gs://BUCKET_NAME/FILE_NAME"]
  }
}

如需发送请求,请选择以下方式之一:

curl

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/extensions/EXTENSION_ID:execute"

PowerShell

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/extensions/EXTENSION_ID:execute" | Select-Object -Expand Content