文本

PaLM 2 for Text（text-bison、text-unicorn）基础模型针对各种自然语言任务（例如情感分析、实体提取和内容创作）进行了优化。PaLM 2 for Text 模型可以创作的内容类型包括文档摘要、问题解答以及用于对内容进行分类的标签。

PaLM 2 for Text 模型非常适合可通过一个 API 响应（无需连续对话）完成的任务。对于需要来回交互的文本任务，请使用 Generative AI on Vertex AI API 进行聊天。

如需在控制台中探索此模型，请在 Model Garden 中选择 PaLM 2 for Text 模型卡片。
前往 Model Garden

使用场景

汇总：创建包含原始文本中相关信息的简短文档版本。例如，您可能想总结一下教科书的章节内容。或者，您可以根据详细描述产品的长篇段落来创建简洁的产品说明。
问答：以文字的形式回答问题。例如，您可以根据知识库内容自动创建常见问题解答 (FAQ) 文档。
分类：为提供的文本分配标签。例如，标签可以应用于文本，以描述该文本的语法正确程度。
情感分析：这是一种识别文本情感的分类形式。情感会转变为应用于文本的标签。例如，文本的情感可以是像积极或消极这样的两极对立，也可以是像愤怒或高兴这样的情绪。
实体提取：从文本中提取一条信息。例如，您可以从文章中提取电影名称。

如需详细了解如何设计文本提示，请参阅设计文本提示。

HTTP 请求

POST https://us-central1-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/us-central1/publishers/google/models/text-bison:predict

如需了解详情，请参阅 predict 方法。

模型版本

如需使用最新的模型版本，请指定不含版本号的模型名称，例如 text-bison。

如需使用稳定的模型版本，请指定模型版本号，例如 text-bison@002。每个稳定版本会在后续稳定版发布日期后的六个月内可用。

下表包含可用的稳定模型版本：

text-bison 模型	发布日期	终止日期
text-bison@002	2023 年 12 月 6 日	2025 年 4 月 9 日

text-unicorn 模型	发布日期	终止日期
text-unicorn@001	2023 年 11 月 30 日	2025 年 4 月 9 日

如需了解详情，请参阅模型版本和生命周期。

请求正文

{
  "instances": [
    {
      "prompt": string
    }
  ],
  "parameters": {
    "temperature": number,
    "maxOutputTokens": integer,
    "topK": integer,
    "topP": number,
    "groundingConfig": string,
    "stopSequences": [ string ],
    "candidateCount": integer,
    "logprobs": integer,
    "presencePenalty": float,
    "frequencyPenalty": float,
    "echo": boolean,
    "seed": integer
  }
}

对文本模型 text-bison 使用以下参数。如需了解详情，请参阅设计文本提示。

参数	说明	可接受的值
`prompt`	用于生成模型响应的文本输入。提示可能包括序言、问题、建议、说明或示例。	文本
`temperature`	温度 (temperature) 在生成回复期间用于采样，在应用 `topP` 和 `topK` 时会生成回复。温度可以控制词元选择的随机性。较低的温度有利于需要更少开放性或创造性回复的提示，而较高的温度可以带来更具多样性或创造性的结果。温度为 `0` 表示始终选择概率最高的词元。在这种情况下，给定提示的回复大多是确定的，但可能仍然有少量变化。如果模型返回的回答过于笼统、过于简短，或者模型给出后备回复，请尝试提高温度。	`0.0–1.0` `Default: 0.0`
`maxOutputTokens`	回复中可生成的词元数量上限。词元约为 4 个字符。100 个词元对应大约 60-80 个单词。指定较低的值可获得较短的回复，指定较高的值可获得可能较长的回复。	`1–2048` 用于 text-bison（最新） `1–1024` 用于 text-bison@002 `Default: 1024`
`topK`	Top-K 可更改模型选择输出词元的方式。如果 top-K 设为 `1`，表示所选词元是模型词汇表的所有词元中概率最高的词元（也称为贪心解码）。如果 top-K 设为 `3`，则表示系统将从 3 个概率最高的词元（通过温度确定）中选择下一个词元。在每个词元选择步骤中，系统都会对概率最高的 top-K 词元进行采样。然后，系统会根据 top-P 进一步过滤词元，并使用温度采样选择最终的词元。指定较低的值可获得随机程度较低的回答，指定较高的值可获得随机程度较高的回答。	`1–40` `Default: 40`
`topP`	Top-P 可更改模型选择输出词元的方式。系统会按照概率从最高（见 top-K）到最低的顺序选择词元，直到所选词元的概率总和等于 top-P 的值。例如，如果词元 A、B 和 C 的概率分别为 0.3、0.2 和 0.1，并且 top-P 值为 `0.5`，则模型将选择 A 或 B 作为下一个词元（通过温度确定），并会排除 C，将其作为候选词元。指定较低的值可获得随机程度较低的回答，指定较高的值可获得随机程度较高的回答。	`0.0–1.0` `Default: 0.95`
`stopSequence`	指定一个字符串列表，告知模型在响应中遇到其中一个字符串时，停止生成文本。如果某个字符串在响应中多次出现，则响应会在首次出现的位置截断。字符串区分大小写。例如，未指定 `stopSequences` 时，如果下面的内容是返回的回复： `public static string reverse(string myString)` 则返回的回复为以下内容，其中 `stopSequences` 设置为 `["Str", "reverse"]`： `public static string`	`default: []`
`groundingConfig`	使用连接功能时，您可以在使用语言模型时引用特定数据。连接模型后，模型可以引用代码库中的内部、机密或其他特定数据，并在回复中包含数据。仅支持来自 Vertex AI Search 的数据存储区。	路径应采用如下格式：`projects/{project_number_or_id}/locations/global/collections/{collection_name}/dataStores/{DATA_STORE_ID}`
`candidateCount`	要返回的响应变体数量。对于每个请求，您需要为所有候选词元的输出词元付费，但只需为输入词元支付一次费用。指定多个候选项是适用于 `generateContent` 的预览版功能（不支持 `streamGenerateContent`）。支持以下型号： Gemini 1.5 Flash：`1`-`8`，默认值：`1` Gemini 1.5 Pro：`1`-`8`，默认值：`1` Gemini 1.0 Pro：`1`-`8`，默认值：`1`	`1–4` `Default: 1`
`logprobs`	返回每个生成步骤中排名靠前的候选词元的对数概率。系统会始终在每个步骤返回模型的所选词元及其对数概率，这些词元可能不会显示在最可能候选项列表中。使用介于 `1` 到 `5` 范围内的整数值指定要返回的候选项数量。	`0-5`
`frequencyPenalty`	正值会惩罚生成的文本中反复出现的词元，从而降低重复内容概率。可接受的值为 `-2.0`-`2.0`。	`Minimum value: -2.0` `Maximum value: 2.0`
`presencePenalty`	正值会惩罚已生成文本中已存在的词元，从而增加生成更多样化内容的概率。可接受的值为 `-2.0`-`2.0`。	`Minimum value: -2.0` `Maximum value: 2.0`
`echo`	如果为 true，则提示会在生成的文本中回显。	`Optional`
`seed`	当种子固定为特定值时，模型会尽最大努力为重复请求提供相同的回答。无法保证确定性输出。此外，更改模型或参数设置（例如温度）可能会导致回答发生变化，即使您使用相同的种子值也是如此。默认情况下，系统会使用随机种子值。这是预览版功能。	`Optional`

示例请求

REST

如需使用 Vertex AI API 测试文本提示，请向发布方模型端点发送 POST 请求。

在使用任何请求数据之前，请先进行以下替换：

PROJECT_ID：您的项目 ID。

如需了解其他字段，请查看请求正文表。

HTTP 方法和网址：

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/text-bison:predict

请求 JSON 正文：

{
  "instances": [
    { "prompt": "Give me ten interview questions for the role of program manager."}
  ],
  "parameters": {
    "temperature": 0.2,
    "maxOutputTokens": 256,
    "topK": 40,
    "topP": 0.95,
    "logprobs": 2
  }
}

如需发送请求，请选择以下方式之一：

curl

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI，或者使用了 Cloud Shell，这会使您自动登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/text-bison:predict"

PowerShell

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/text-bison:predict" | Select-Object -Expand Content

您应该会收到类似示例响应的 JSON 响应。

Python

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python API 参考文档。

import vertexai

from vertexai.language_models import TextGenerationModel

# TODO(developer): Update project_id and location
vertexai.init(project=PROJECT_ID, location="us-central1")
parameters = {
    "temperature": 0.2,  # Temperature controls the degree of randomness in token selection.
    "max_output_tokens": 256,  # Token limit determines the maximum amount of text output.
    "top_p": 0.8,  # Tokens are selected from most probable to least until the sum of their probabilities equals the top_p value.
    "top_k": 40,  # A top_k of 1 means the selected token is the most probable among all tokens.
}

model = TextGenerationModel.from_pretrained("text-bison@002")
response = model.predict(
    "Give me ten interview questions for the role of program manager.",
    **parameters,
)
print(f"Response from Model: {response.text}")

Node.js

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Node.js 设置说明执行操作。如需了解详情，请参阅 Vertex AI Node.js API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

/**
 * TODO(developer): Update these variables before running the sample.
 */
const PROJECT_ID = process.env.CAIP_PROJECT_ID;
const LOCATION = 'us-central1';
const PUBLISHER = 'google';
const MODEL = 'text-bison@001';
const aiplatform = require('@google-cloud/aiplatform');

// Imports the Google Cloud Prediction service client
const {PredictionServiceClient} = aiplatform.v1;

// Import the helper module for converting arbitrary protobuf.Value objects.
const {helpers} = aiplatform;

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const predictionServiceClient = new PredictionServiceClient(clientOptions);

async function callPredict() {
  // Configure the parent resource
  const endpoint = `projects/${PROJECT_ID}/locations/${LOCATION}/publishers/${PUBLISHER}/models/${MODEL}`;

  const prompt = {
    prompt:
      'Give me ten interview questions for the role of program manager.',
  };
  const instanceValue = helpers.toValue(prompt);
  const instances = [instanceValue];

  const parameter = {
    temperature: 0.2,
    maxOutputTokens: 256,
    topP: 0.95,
    topK: 40,
  };
  const parameters = helpers.toValue(parameter);

  const request = {
    endpoint,
    instances,
    parameters,
  };

  // Predict request
  const response = await predictionServiceClient.predict(request);
  console.log('Get text prompt response');
  console.log(response);
}

callPredict();

Java

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Java 设置说明执行操作。如需了解详情，请参阅 Vertex AI Java API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。


import com.google.cloud.aiplatform.v1.EndpointName;
import com.google.cloud.aiplatform.v1.PredictResponse;
import com.google.cloud.aiplatform.v1.PredictionServiceClient;
import com.google.cloud.aiplatform.v1.PredictionServiceSettings;
import com.google.protobuf.Value;
import com.google.protobuf.util.JsonFormat;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class PredictTextPromptSample {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    // Details of designing text prompts for supported large language models:
    // https://cloud.google.com/vertex-ai/docs/generative-ai/text/text-overview
    String instance =
        "{ \"prompt\": " + "\"Give me ten interview questions for the role of program manager.\"}";
    String parameters =
        "{\n"
            + "  \"temperature\": 0.2,\n"
            + "  \"maxOutputTokens\": 256,\n"
            + "  \"topP\": 0.95,\n"
            + "  \"topK\": 40\n"
            + "}";
    String project = "YOUR_PROJECT_ID";
    String location = "us-central1";
    String publisher = "google";
    String model = "text-bison@001";

    predictTextPrompt(instance, parameters, project, location, publisher, model);
  }

  // Get a text prompt from a supported text model
  public static void predictTextPrompt(
      String instance,
      String parameters,
      String project,
      String location,
      String publisher,
      String model)
      throws IOException {
    String endpoint = String.format("%s-aiplatform.googleapis.com:443", location);
    PredictionServiceSettings predictionServiceSettings =
        PredictionServiceSettings.newBuilder().setEndpoint(endpoint).build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (PredictionServiceClient predictionServiceClient =
        PredictionServiceClient.create(predictionServiceSettings)) {
      final EndpointName endpointName =
          EndpointName.ofProjectLocationPublisherModelName(project, location, publisher, model);

      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      Value.Builder instanceValue = Value.newBuilder();
      JsonFormat.parser().merge(instance, instanceValue);
      List<Value> instances = new ArrayList<>();
      instances.add(instanceValue.build());

      // Use Value.Builder to convert instance to a dynamically typed value that can be
      // processed by the service.
      Value.Builder parameterValueBuilder = Value.newBuilder();
      JsonFormat.parser().merge(parameters, parameterValueBuilder);
      Value parameterValue = parameterValueBuilder.build();

      PredictResponse predictResponse =
          predictionServiceClient.predict(endpointName, instances, parameterValue);
      System.out.println("Predict Response");
      System.out.println(predictResponse);
    }
  }
}

响应正文

{
  "predictions":[
    {
      "content": string,
      "citationMetadata": {
        "citations": [
          {
            "startIndex": integer,
            "endIndex": integer,
            "url": string,
            "title": string,
            "license": string,
            "publicationDate": string
          }
        ]
      },
      "logprobs": {
        "tokenLogProbs": [ float ],
        "tokens": [ string ],
        "topLogProbs": [ { map<string, float> } ]
      },
      "safetyAttributes": {
        "categories": [ string ],
        "blocked": boolean,
        "scores": [ float ],
        "errors": [ int ]
      }
    }
  ],
  "metadata": {
    "tokenMetadata": {
      "input_token_count": {
        "total_tokens": integer,
        "total_billable_characters": integer
      },
      "output_token_count": {
        "total_tokens": integer,
        "total_billable_characters": integer
      }
    }
  }
}

响应元素	说明
`content`	根据输入文本生成的结果。
`categories`	与所生成内容关联的“安全属性”类别的显示名称。顺序与得分匹配。
`scores`	每个类别的置信度分数越高，表示置信度越高。
`blocked`	用于指示模型的输入或输出是否已被阻止的一个标志。
`errors`	确定输入或输出被阻止的原因的错误代码。如需查看错误代码列表，请参阅安全过滤器和属性。
`startIndex`	预测输出中引用开始位置的索引（含边界值）。必须 >= 0 且 < end_index。
`endIndex`	预测输出中引用结束位置的索引（不含边界值）。必须 > start_index 且 < len(output)。
`url`	与此引用关联的网址。如果存在，此网址会链接到此引用来源的网页。可能的网址包括新闻网站、GitHub 代码库等。
`title`	与此引用关联的标题。如果存在，则引用此引用来源的标题。可能的标题包括新闻标题、书名等。
`license`	与此引用关联的许可。如果存在，则引用此引用来源的许可。可能的许可包括代码许可，例如 mit 许可。
`publicationDate`	与此引用关联的发布日期。如果存在，则引用此引用来源的发布日期。可能的格式为 YYYY、YYYY-MM、YYYY-MM-DD。
`input_token_count`	输入词元数。这是所有提示、前缀和后缀中的词元总数。
`output_token_count`	输出词元数。这是所有预测中 `content` 中的词元总数。
`tokens`	采样词元。
`tokenLogProbs`	采样词元的对数概率。
`topLogProb`	每个步骤中最可能的候选词元及其对数概率。
`logprobs`	“logprobs”参数的结果。1-1 映射到“候选”。

示例响应

{
  "predictions": [
    {
      "citationMetadata":{
        "citations": [ ]
      },
      "safetyAttributes":{
        "scores": [
          0.1
        ],
        "categories": [
          "Finance"
        ],
        "blocked": false
      },
      "content":"1. What is your experience with project management?\n2. What are your strengths and weaknesses as a project manager?\n3. How do you handle conflict and difficult situations?\n4. How do you communicate with stakeholders?\n5. How do you stay organized and on track?\n6. How do you manage your time effectively?\n7. What are your goals for your career?\n8. Why are you interested in this position?\n9. What are your salary expectations?\n10. What are your availability and start date?",
      "logprobs": {
        "tokenLogProbs": [
          -0.1,
          -0.2
        ],
        "tokens": [
          "vertex",
          " rocks!"
        ],
        "topLogProbs": [
          {
            "vertex": -0.1,
            "hello": -0.2
          },
          {
            " rocks!": -0.2,
            " world!": -0.3
          }
        ]
      }
    },
    "metadata": {
      "tokenMetadata": {
        "outputTokenCount": {
          "totalTokens": 153,
          "totalBillableCharacters": 537
        },
        "inputTokenCount": {
          "totalBillableCharacters": 54,
          "totalTokens": 12
        }
      }
    }
  ]
}

流式传输来自生成式 AI 模型的响应

对于 API 的流式传输请求和非流式传输请求，这些参数是相同的。

如需使用 REST API 查看示例代码请求和响应，请参阅使用 REST API 的示例。

如需使用 Python 版 Vertex AI SDK 查看示例代码请求和响应，请参阅使用 Python 版 Vertex AI SDK 的示例。