获取批量文本生成内容

批量预测是一种高效发送大量非延迟时间敏感类文本提示请求的方法。与在线预测（一次只能有一个输入请求）不同，您可以在单个批量请求中发送大量 LLM 请求。与对 Vertex AI 上的表格数据执行批量预测的方式类似，您可以确定输出位置、添加输入提示，然后响应会异步填充到输出位置。

提交针对文本模型的批量请求并查看其结果后，您可以通过模型调优来调整模型。调优后，您可以照常提交更新后的模型以进行批量生成。如需详细了解如何调优模型，请参阅调优基础模型。

支持批量预测的文本模型

text-bison

准备输入

批量请求的输入指定要发送到模型以进行批量生成的内容。在模型上使用文本分类时，您可以使用 JSON 行文件或 BigQuery 表来指定输入列表。您需要将 BigQuery 表存储在 BigQuery 中，并将 JSON 行文件存储在 Cloud Storage 中。

文本模型的批量请求仅接受 BigQuery 存储源和 Cloud Storage。请求最多可包含 30,000 项提示。

如需详细了解格式设置，请参阅：

JSONL 示例

JSONL 输入格式

{"prompt":"Give a short description of a machine learning model:"}
{"prompt":"Best recipe for banana bread:"}

JSONL 输出

{"instance":{"prompt":"Give..."},"predictions": [{"content":"A machine","safetyAttributes":{...}}],"status":""}
{"instance":{"prompt":"Best..."},"predictions": [{"content":"Sure", "safetyAttributes":{...}}],"status":""}

BigQuery 示例

BigQuery 输入格式

此示例展示了一个单列 BigQuery 表。

提示
“提供机器学习模型的简短说明：”
“最适合香蕉面包的食谱：”

BigQuery 输出

提示	预测	status
“提供机器学习模型的简短说明：”	'[{ "content": "A machine learning model is a statistical method", "safetyAttributes": { "blocked": false, "scores": [ 0.10000000149011612 ], "categories": [ "Violent" ] } }]'
“最适合香蕉面包的食谱：”	'[{"content": "Sure, here is a recipe for banana bread:\n\nIngredients:\n\n*", "safetyAttributes": { "scores": [ 0.10000000149011612 ], "blocked": false, "categories": [ "Violent" ] } }]'

提示

预测

status

“提供机器学习模型的简短说明：”

'[{
   "content": "A machine learning model is a
               statistical method",
   "safetyAttributes": {
     "blocked": false,
     "scores": [
       0.10000000149011612
     ],
     "categories": [
       "Violent"
     ]
   }
 }]'

“最适合香蕉面包的食谱：”

'[{"content": "Sure, here is a recipe for banana
               bread:\n\nIngredients:\n\n*",
   "safetyAttributes": {
     "scores": [
       0.10000000149011612
     ],
     "blocked": false,
     "categories": [
       "Violent"
     ]
   }
}]'

请求批量响应

批量生成任务可能需要一些时间才能完成，具体取决于提交的输入数据项数量。

REST

如需使用 Vertex AI API 测试文本提示，请向发布方模型端点发送 POST 请求。

在使用任何请求数据之前，请先进行以下替换：

PROJECT_ID：您的 Google Cloud 项目的名称。
BP_JOB_NAME：作业名称。
MODEL_PARAM：模型参数的映射。一些可接受的参数包括：maxOutputTokens、topK、topP 和 temperature。
INPUT_URI：输入源 URI。这是 BigQuery 表 URI 或 Cloud Storage 中的 JSONL 文件 URI。
OUTPUT_URI：输出目标 URI。

HTTP 方法和网址：

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs

请求 JSON 正文：

{
    "name": "BP_JOB_NAME",
    "displayName": "BP_JOB_NAME",
    "model": "publishers/google/models/text-bison",
    "model_parameters": "MODEL_PARAM"
    "inputConfig": {
      "instancesFormat":"bigquery",
      "bigquerySource":{
        "inputUri" : "INPUT_URI"
      }
    },
    "outputConfig": {
      "predictionsFormat":"bigquery",
      "bigqueryDestination":{
        "outputUri": "OUTPUT_URI"
        }
    }
}

如需发送请求，请选择以下方式之一：

curl

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI，或者使用了 Cloud Shell，这会使您自动登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs"

PowerShell

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

{
  "name": "projects/{PROJECT_ID}/locations/us-central1/batchPredictionJobs/{BATCH_JOB_ID}",
  "displayName": "BP_sample_publisher_BQ_20230712_134650",
  "model": "projects/{PROJECT_ID}/locations/us-central1/models/text-bison",
  "inputConfig": {
    "instancesFormat": "bigquery",
    "bigquerySource": {
      "inputUri": "bq://sample.text_input"
    }
  },
  "modelParameters": {},
  "outputConfig": {
    "predictionsFormat": "bigquery",
    "bigqueryDestination": {
      "outputUri": "bq://sample.llm_dataset.embedding_out_BP_sample_publisher_BQ_20230712_134650"
    }
  },
  "state": "JOB_STATE_PENDING",
  "createTime": "2023-07-12T20:46:52.148717Z",
  "updateTime": "2023-07-12T20:46:52.148717Z",
  "labels": {
    "owner": "sample_owner",
    "product": "llm"
  },
  "modelVersionId": "1",
  "modelMonitoringStatus": {}
}

响应包含批量作业的唯一标识符。您可以使用 BATCH_JOB_ID 轮询批量作业的状态，直到作业 state 为 JOB_STATE_SUCCEEDED。例如：

curl \
  -X GET \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs/BATCH_JOB_ID

Python

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python API 参考文档。

from vertexai.preview.language_models import TextGenerationModel
text_model = TextGenerationModel.from_pretrained("text-bison")
batch_prediction_job = text_model.batch_predict(
  source_uri=["gs://BUCKET_NAME/test_table.jsonl"],
  destination_uri_prefix="gs://BUCKET_NAME/tmp/2023-05-25-vertex-LLM-Batch-Prediction/result3",
  # Optional:
  model_parameters={
      "maxOutputTokens": "200",
      "temperature": "0.2",
      "topP": "0.95",
      "topK": "40",
  },
)
print(batch_prediction_job.display_name)
print(batch_prediction_job.resource_name)
print(batch_prediction_job.state)

检索批量输出

批量预测任务完成后，输出存储在您在请求中指定的 Cloud Storage 存储桶或 BigQuery 表中。

后续步骤

了解如何测试文本提示。
了解其他文本特定任务提示设计策略：
- 摘要提示
- 提取提示
了解如何调整模型。