此页面由 Cloud Translation API 翻译。

文字聊天

PaLM 2 for Chat (chat-bison) 基础模型是一种大语言模型 (LLM)，擅长语言理解、语言生成和对话。此聊天模型经过微调，可以进行自然的多轮对话，非常适合涉及需要来回交互的代码的文本任务。

对于可通过一个 API 响应完成（无需持续对话）的文本任务，请使用文本模型。

如需在控制台中探索此模型，请参阅 Model Garden 中的 PaLM 2 for Chat 模型卡片。
前往 Model Garden

使用场景

客户服务：指示模型以只谈论您的公司产品的客服人员身份做出响应
技术支持：指示模型以呼叫中心客服的身份与客户交流，并使用有关如何响应和不能说哪些内容的具体参数
人设和人物：指示模型以特定人员的风格响应（“...莎士比亚的风格”）
网站配套应用：针对购物、旅行和其他应用场景创建对话助理

如需了解详情，请参阅设计聊天提示。

HTTP 请求

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/chat-bison:predict

如需了解详情，请参阅 predict 方法。

模型版本

如需使用最新的模型版本，请指定不含版本号的模型名称，例如 chat-bison。

如需使用稳定的模型版本，请指定模型版本号，例如 chat-bison@002。每个稳定版本会在后续稳定版发布日期后的六个月内可用。

下表包含可用的稳定模型版本：

chat-bison 模型	发布日期	终止日期	建议的升级
chat-bison@002	2023 年 12 月 6 日	2025 年 4 月 9 日	gemini-2.0-flash

如需了解详情，请参阅模型版本和生命周期。

请求正文

{
  "instances": [
    {
      "context":  string,
      "examples": [
        {
          "input": { "content": string },
          "output": { "content": string }
        }
      ],
      "messages": [
        {
          "author": string,
          "content": string,
        }
      ],
    }
  ],
  "parameters": {
    "temperature": number,
    "maxOutputTokens": integer,
    "topP": number,
    "topK": integer,
    "groundingConfig": string,
    "stopSequences": [ string ],
    "candidateCount": integer
    "logprobs": integer,
    "presencePenalty": float,
    "frequencyPenalty": float,
    "seed": integer
  }
}

对于聊天 API 调用，context、examples 和 messages 会组合在一起以形成提示。下表显示了针对文本需要为 Vertex AI PaLM API 配置的参数：

参数	说明	可接受的值
`context` （可选）	上下文决定模型在整个对话中的响应方式。例如，您可以使用上下文来指定模型可以或不可以使用的字词、要重点关注或避免出现的主题，或者响应格式或样式。	文字
`examples` （可选）	用于学习如何响应对话的模型示例。	[{ "input": {"content": "provide content"}, "output": {"content": "provide content"} }]
`messages` （必填）	以结构化的备用作者形式提供给模型的对话历史记录。消息按时间顺序显示：最旧的消息在前面，最新的信息在后面。当消息的历史记录导致输入超过最大长度时，最旧的消息会被移除，直到整个提示在允许的限制范围内。	[{ "author": "user", "content": "user message" }]
`temperature`	温度 (temperature) 在生成回复期间用于采样，在应用 `topP` 和 `topK` 时会生成回复。温度可以控制词元选择的随机性。较低的温度有利于需要更少开放性或创造性回复的提示，而较高的温度可以带来更具多样性或创造性的结果。温度为 `0` 表示始终选择概率最高的词元。在这种情况下，给定提示的回复大多是确定的，但可能仍然有少量变化。如果模型返回的回答过于笼统、过于简短，或者模型给出后备回复，请尝试提高温度。	`0.0–1.0` `Default: 0.0`
`maxOutputTokens`	回复中可生成的词元数量上限。词元约为 4 个字符。100 个词元对应大约 60-80 个单词。指定较低的值可获得较短的回复，指定较高的值可获得可能较长的回复。	`1–2048` `Default: 1024`
`topK`	Top-K 可更改模型选择输出词元的方式。如果 top-K 设为 `1`，表示所选词元是模型词汇表的所有词元中概率最高的词元（也称为贪心解码）。如果 top-K 设为 `3`，则表示系统将从 3 个概率最高的词元（通过温度确定）中选择下一个词元。在每个词元选择步骤中，系统都会对概率最高的 top-K 词元进行采样。然后，系统会根据 top-P 进一步过滤词元，并使用温度采样选择最终的词元。指定较低的值可获得随机程度较低的回答，指定较高的值可获得随机程度较高的回答。	`1–40` `Default: 40`
`topP`	Top-P 可更改模型选择输出词元的方式。系统会按照概率从最高（见 top-K）到最低的顺序选择词元，直到所选词元的概率总和等于 top-P 的值。例如，如果词元 A、B 和 C 的概率分别为 0.3、0.2 和 0.1，并且 top-P 值为 `0.5`，则模型将选择 A 或 B 作为下一个词元（通过温度确定），并会排除 C，将其作为候选词元。指定较低的值可获得随机程度较低的回答，指定较高的值可获得随机程度较高的回答。	`0.0–1.0` `Default: 0.95`
`stopSequences`	指定一个字符串列表，告知模型在响应中遇到其中一个字符串时，停止生成文本。如果某个字符串在响应中多次出现，则响应会在首次出现的位置截断。字符串区分大小写。例如，未指定 `stopSequences` 时，如果下面的内容是返回的回复： `public static string reverse(string myString)` 则返回的回复为以下内容，其中 `stopSequences` 设置为 `["Str", "reverse"]`： `public static string`	`default: []`
`groundingConfig`	使用连接功能时，您可以在使用语言模型时引用特定数据。连接模型后，模型可以引用代码库中的内部、机密或其他特定数据，并在回复中包含数据。仅支持来自 Vertex AI Search 的数据存储区。	路径应采用如下格式：`projects/{project_id}/locations/global/collections/{collection_name}/dataStores/{DATA_STORE_ID}`
`candidateCount`	要返回的响应变体数量。对于每个请求，您需要为所有候选词元的输出词元付费，但只需为输入词元支付一次费用。指定多个候选项是适用于 `generateContent` 的预览版功能（不支持 `streamGenerateContent`）。支持以下型号： Gemini 1.5 Flash：`1`-`8`，默认值：`1` Gemini 1.5 Pro：`1`-`8`，默认值：`1` Gemini 1.0 Pro：`1`-`8`，默认值：`1`	`1–4` `Default: 1`
`logprobs`	返回每个生成步骤中排名靠前的候选 token 的对数概率。模型的所选 token 可能与每个步骤中排名靠前的候选 token 不同。使用介于 `1` 到 `5` 范围内的整数值指定要返回的候选项数量。	`0-5`
`frequencyPenalty`	正值会惩罚生成的文本中反复出现的词元，从而降低重复内容概率。最小值为 `-2.0`。最大值为 `2.0`，但不包括该数值。	`Minimum value: -2.0` `Maximum value: 2.0`
`presencePenalty`	正值会惩罚已生成文本中已存在的词元，从而增加生成更多样化内容的概率。最小值为 `-2.0`。最大值为 `2.0`，但不包括该数值。	`Minimum value: -2.0` `Maximum value: 2.0`
`seed`	当种子固定为特定值时，模型会尽最大努力为重复请求提供相同的回答。无法保证确定性输出。此外，更改模型或参数设置（例如温度）可能会导致回答发生变化，即使您使用相同的种子值也是如此。默认情况下，系统会使用随机种子值。这是预览版功能。	`Optional`

示例请求

REST

如需使用 Vertex AI API 测试文本聊天，请向发布方模型端点发送 POST 请求。

在使用任何请求数据之前，请先进行以下替换：

PROJECT_ID：您的项目 ID。

如需了解其他字段，请查看下面的请求正文表。

HTTP 方法和网址：

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/chat-bison:predict

请求 JSON 正文：

{
  "instances": [{
      "context":  "CONTEXT",
      "examples": [
       { 
          "input": {"content": "EXAMPLE_INPUT"},
          "output": {"content": "EXAMPLE_OUTPUT"}
       }],
      "messages": [
       { 
          "author": "AUTHOR",
          "content": "CONTENT",
       }],
   }],
  "parameters": {
    "temperature": TEMPERATURE,
    "maxOutputTokens": MAX_OUTPUT_TOKENS,
    "topP": TOP_P,
    "topK": TOP_K
  }
}

如需发送请求，请选择以下方式之一：

curl

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI，或者使用了 Cloud Shell，这会使您自动登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/chat-bison:predict"

PowerShell

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/chat-bison:predict" | Select-Object -Expand Content

您应该会收到类似示例响应的 JSON 响应。

Python 版 Vertex AI SDK

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python 版 Vertex AI SDK API 参考文档。

from vertexai.language_models import ChatModel, InputOutputTextPair

chat_model = ChatModel.from_pretrained("chat-bison@002")

parameters = {
    "temperature": 0.2,
    "max_output_tokens": 256,
    "top_p": 0.95,
    "top_k": 40,
}

chat_session = chat_model.start_chat(
    context="My name is Miles. You are an astronomer, knowledgeable about the solar system.",
    examples=[
        InputOutputTextPair(
            input_text="How many moons does Mars have?",
            output_text="The planet Mars has two moons, Phobos and Deimos.",
        ),
    ],
)

response = chat_session.send_message(
    "How many planets are there in the solar system?", **parameters
)
print(response.text)
# Example response:
# There are eight planets in the solar system:
# Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune.

Node.js

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Node.js 设置说明执行操作。如需了解详情，请参阅 Vertex AI Node.js API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';
const aiplatform = require('@google-cloud/aiplatform');

// Imports the Google Cloud Prediction service client
const {PredictionServiceClient} = aiplatform.v1;

// Import the helper module for converting arbitrary protobuf.Value objects.
const {helpers} = aiplatform;

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};
const publisher = 'google';
const model = 'chat-bison@001';

// Instantiates a client
const predictionServiceClient = new PredictionServiceClient(clientOptions);

async function callPredict() {
  // Configure the parent resource
  const endpoint = `projects/${project}/locations/${location}/publishers/${publisher}/models/${model}`;

  const prompt = {
    context:
      'My name is Miles. You are an astronomer, knowledgeable about the solar system.',
    examples: [
      {
        input: {content: 'How many moons does Mars have?'},
        output: {
          content: 'The planet Mars has two moons, Phobos and Deimos.',
        },
      },
    ],
    messages: [
      {
        author: 'user',
        content: 'How many planets are there in the solar system?',
      },
    ],
  };
  const instanceValue = helpers.toValue(prompt);
  const instances = [instanceValue];

  const parameter = {
    temperature: 0.2,
    maxOutputTokens: 256,
    topP: 0.95,
    topK: 40,
  };
  const parameters = helpers.toValue(parameter);

  const request = {
    endpoint,
    instances,
    parameters,
  };

  // Predict request
  const [response] = await predictionServiceClient.predict(request);
  console.log('Get chat prompt response');
  const predictions = response.predictions;
  console.log('\tPredictions :');
  for (const prediction of predictions) {
    console.log(`\t\tPrediction : ${JSON.stringify(prediction)}`);
  }
}

callPredict();

Java

在尝试此示例之前，请按照《Vertex AI 快速入门：使用客户端库》中的 Java 设置说明执行操作。如需了解详情，请参阅 Vertex AI Java API 参考文档。

如需向 Vertex AI 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。


import com.google.cloud.aiplatform.v1beta1.EndpointName;
import com.google.cloud.aiplatform.v1beta1.PredictResponse;
import com.google.cloud.aiplatform.v1beta1.PredictionServiceClient;
import com.google.cloud.aiplatform.v1beta1.PredictionServiceSettings;
import com.google.protobuf.Value;
import com.google.protobuf.util.JsonFormat;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

// Send a Predict request to a large language model to test a chat prompt
public class PredictChatPromptSample {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String instance =
        "{\n"
            + "   \"context\":  \"My name is Ned. You are my personal assistant. My favorite movies"
            + " are Lord of the Rings and Hobbit.\",\n"
            + "   \"examples\": [ { \n"
            + "       \"input\": {\"content\": \"Who do you work for?\"},\n"
            + "       \"output\": {\"content\": \"I work for Ned.\"}\n"
            + "    },\n"
            + "    { \n"
            + "       \"input\": {\"content\": \"What do I like?\"},\n"
            + "       \"output\": {\"content\": \"Ned likes watching movies.\"}\n"
            + "    }],\n"
            + "   \"messages\": [\n"
            + "    { \n"
            + "       \"author\": \"user\",\n"
            + "       \"content\": \"Are my favorite movies based on a book series?\"\n"
            + "    }]\n"
            + "}";
    String parameters =
        "{\n"
            + "  \"temperature\": 0.3,\n"
            + "  \"maxDecodeSteps\": 200,\n"
            + "  \"topP\": 0.8,\n"
            + "  \"topK\": 40\n"
            + "}";
    String project = "YOUR_PROJECT_ID";
    String publisher = "google";
    String model = "chat-bison@001";

    predictChatPrompt(instance, parameters, project, publisher, model);
  }

  static void predictChatPrompt(
      String instance, String parameters, String project, String publisher, String model)
      throws IOException {
    PredictionServiceSettings predictionServiceSettings =
        PredictionServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (PredictionServiceClient predictionServiceClient =
        PredictionServiceClient.create(predictionServiceSettings)) {
      String location = "us-central1";
      final EndpointName endpointName =
          EndpointName.ofProjectLocationPublisherModelName(project, location, publisher, model);

      Value.Builder instanceValue = Value.newBuilder();
      JsonFormat.parser().merge(instance, instanceValue);
      List<Value> instances = new ArrayList<>();
      instances.add(instanceValue.build());

      Value.Builder parameterValueBuilder = Value.newBuilder();
      JsonFormat.parser().merge(parameters, parameterValueBuilder);
      Value parameterValue = parameterValueBuilder.build();

      PredictResponse predictResponse =
          predictionServiceClient.predict(endpointName, instances, parameterValue);
      System.out.println("Predict Response");
    }
  }
}

响应正文

{
  "predictions": [
    {
      "candidates": [
        {
          "author": string,
          "content": string
        }
      ],
      "citationMetadata": {
        "citations": [
          {
            "startIndex": integer,
            "endIndex": integer,
            "url": string,
            "title": string,
            "license": string,
            "publicationDate": string
          }
        ]
      },
      "logprobs": {
        "tokenLogProbs": [ float ],
        "tokens": [ string ],
        "topLogProbs": [ { map<string, float> } ]
      },
      "safetyAttributes": {
        "categories": [ string ],
        "blocked": false,
        "scores": [ float ],
        "errors": [ int ]
      }
    }
  ],
  "metadata": {
    "tokenMetadata": {
      "input_token_count": {
        "total_tokens": integer,
        "total_billable_characters": integer
      },
      "output_token_count": {
        "total_tokens": integer,
        "total_billable_characters": integer
      }
    }
  }
}

响应元素	说明
`content`	聊天消息的文本内容。
`candidates`	从给定消息生成的聊天结果。
`categories`	与所生成内容关联的“安全属性”类别的显示名称。顺序与得分匹配。
`author`	轮次的作者标记。
`scores`	每个类别的置信度分数越高，表示置信度越高。
`blocked`	用于指示模型的输入或输出是否已被阻止的一个标志。
`startIndex`	预测输出中引用开始位置的索引（含边界值）。必须 >= 0 且 < end_index。
`endIndex`	预测输出中引用结束位置的索引（不含边界值）。必须 > start_index 且 < len(output)。
`url`	与此引用关联的网址。如果存在，此网址会链接到此引用来源的网页。可能的网址包括新闻网站、GitHub 代码库等。
`title`	与此引用关联的标题。如果存在，则引用此引用来源的标题。可能的标题包括新闻标题、书名等。
`license`	与此引用关联的许可。如果存在，则引用此引用来源的许可。可能的许可包括代码许可，例如 mit 许可。
`publicationDate`	与此引用关联的发布日期。如果存在，则引用此引用来源的发布日期。可能的格式为 YYYY、YYYY-MM、YYYY-MM-DD。
`safetyAttributes`	类别及其关联置信度分数的集合。与 `candidates` 的 1-1 映射。
`input_token_count`	输入词元数。这是所有消息、样本和上下文中的词元总数。
`output_token_count`	输出词元数。这是 `content` 中所有响应的候选词元总数。
`tokens`	采样词元。
`tokenLogProbs`	采样词元的对数概率。
`topLogProb`	每个步骤中最可能的候选词元及其对数概率。
`logprobs`	“logprobs”参数的结果。1-1 映射到“候选”。

示例响应

{
  "predictions": [
    {
      "citationMetadata": {
        "citations": []
      },
      "safetyAttributes": {
        "scores": [
          0.1
        ],
        "categories": [
          "Finance"
        ],
        "blocked": false
      },
      "candidates": [
        {
          "author": "AUTHOR",
          "content": "RESPONSE"
        }
      ]
    }
  ]
}

流式传输来自生成式 AI 模型的响应

对于 API 的流式传输请求和非流式传输请求，这些参数是相同的。

如需使用 REST API 查看示例代码请求和响应，请参阅使用流式传输 REST API 的示例。

如需使用 Python 版 Vertex AI SDK 查看示例代码请求和响应，请参阅使用 Python 版 Vertex AI SDK 进行流式传输的示例。

文字聊天 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

使用场景

HTTP 请求

模型版本

请求正文

示例请求

REST

curl

PowerShell

Python 版 Vertex AI SDK

Node.js

Java

响应正文

示例响应

流式传输来自生成式 AI 模型的响应

文字聊天