Text

The PaLM 2 for Text (text-bison, text-unicorn) foundation models are optimized for a variety of natural language tasks such as sentiment analysis, entity extraction, and content creation. The types of content that the PaLM 2 for Text models can create include document summaries, answers to questions, and labels that classify content.

The PaLM 2 for Text models are ideal for tasks that can be completed with one API response, without the need for continuous conversation. For text tasks that require back-and-forth interactions, use the Generative AI on Vertex AI API for chat.

To explore the models in the console, select the PaLM 2 for Text model card in the Model Garden.
Go to the Model Garden

Use cases

Summarization: Create a shorter version of a document that incorporates pertinent information from the original text. For example, you might want to summarize a chapter from a textbook. Or, you could create a succinct product description from a long paragraph that describes the product in detail.
Question answering: Provide answers to questions in text. For example, you might automate the creation of a Frequently Asked Questions (FAQ) document from knowledge base content.
Classification: Assign a label to provided text. For example, a label might be applied to text that describes how grammatically correct it is.
Sentiment analysis: This is a form of classification that identifies the sentiment of text. The sentiment is turned into a label that's applied to the text. For example, the sentiment of text might be polarities like positive or negative, or sentiments like anger or happiness.
Entity extraction: Extract a piece of information from text. For example, you can extract the name of a movie from the text of an article.

For more information on designing text prompts, see Design text prompts.

HTTP request

POST https://us-central1-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/us-central1/publishers/google/models/text-bison:predict

See the predict method for more information.

Model versions

To use the latest model version, specify the model name without a version number, for example text-bison.

To use a stable model version, specify the model version number, for example text-bison@002. Each stable version is available for six months after the release date of the subsequent stable version.

The following table contains the available stable model versions:

text-bison model	Release date	Discontinuation date
text-bison@002	December 6, 2023	April 9, 2025

text-unicorn model	Release date	Discontinuation date
text-unicorn@001	November 30, 2023	April 9, 2025

For more information, see Model versions and lifecycle.

Request body

{
  "instances": [
    {
      "prompt": string
    }
  ],
  "parameters": {
    "temperature": number,
    "maxOutputTokens": integer,
    "topK": integer,
    "topP": number,
    "groundingConfig": string,
    "stopSequences": [ string ],
    "candidateCount": integer,
    "logprobs": integer,
    "presencePenalty": float,
    "frequencyPenalty": float,
    "echo": boolean,
    "seed": integer
  }
}

Use the following parameters for the text model text-bison. For more information, see Design text prompts.

Parameter	Description	Acceptable values
`prompt`	Text input to generate model response. Prompts can include preamble, questions, suggestions, instructions, or examples.	Text
`temperature`	The temperature is used for sampling during response generation, which occurs when `topP` and `topK` are applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of `0` means that the highest probability tokens are always selected. In this case, responses for a given prompt are mostly deterministic, but a small amount of variation is still possible. If the model returns a response that's too generic, too short, or the model gives a fallback response, try increasing the temperature.	`0.0–1.0` `Default: 0.0`
`maxOutputTokens`	Maximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words. Specify a lower value for shorter responses and a higher value for potentially longer responses.	`1–2048` for text-bison (latest) `1–1024` for text-bison@002 `Default: 1024`
`topK`	Top-K changes how the model selects tokens for output. A top-K of `1` means the next selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding), while a top-K of `3` means that the next token is selected from among the three most probable tokens by using temperature. For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling. Specify a lower value for less random responses and a higher value for more random responses.	`1–40` `Default: 40`
`topP`	Top-P changes how the model selects tokens for output. Tokens are selected from the most (see top-K) to least probable until the sum of their probabilities equals the top-P value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is `0.5`, then the model will select either A or B as the next token by using temperature and excludes C as a candidate. Specify a lower value for less random responses and a higher value for more random responses.	`0.0–1.0` `Default: 0.95`
`stopSequence`	Specifies a list of strings that tells the model to stop generating text if one of the strings is encountered in the response. If a string appears multiple times in the response, then the response truncates where it's first encountered. The strings are case-sensitive. For example, if the following is the returned response when `stopSequences` isn't specified: `public static string reverse(string myString)` Then the returned response with `stopSequences` set to `["Str", "reverse"]` is: `public static string`	`default: []`
`groundingConfig`	Grounding lets you reference specific data when using language models. When you ground a model, the model can reference internal, confidential, and otherwise specific data from your repository and include the data in the response. Only data stores from Vertex AI Search are supported.	Path should follow format: `projects/{project_number_or_id}/locations/global/collections/{collection_name}/dataStores/{DATA_STORE_ID}`
`candidateCount`	The number of response variations to return. For each request, you're charged for the output tokens of all candidates, but are only charged once for the input tokens. Specifying multiple candidates is a Preview feature that works with `generateContent` (`streamGenerateContent` is not supported). The following models are supported: Gemini 1.5 Flash: `1`-`8`, default: `1` Gemini 1.5 Pro: `1`-`8`, default: `1` Gemini 1.0 Pro: `1`-`8`, default: `1`	`1–4` `Default: 1`
`logprobs`	Returns the log probabilities of the top candidate tokens at each generation step. The model's chosen token might not be the same as the top candidate token at each step. Specify the number of candidates to return by using an integer value in the range of `1`-`5`.	`0-5`
`frequencyPenalty`	Positive values penalize tokens that repeatedly appear in the generated text, decreasing the probability of repeating content. The minimum value is `-2.0`. The maximum value is up to, but not including, `2.0`.	`Minimum value: -2.0` `Maximum value: 2.0`
`presencePenalty`	Positive values penalize tokens that already appear in the generated text, increasing the probability of generating more diverse content. The minimum value is `-2.0`. The maximum value is up to, but not including, `2.0`.	`Minimum value: -2.0` `Maximum value: 2.0`
`echo`	If true, the prompt is echoed in the generated text.	`Optional`
`seed`	When seed is fixed to a specific value, the model makes a best effort to provide the same response for repeated requests. Deterministic output isn't guaranteed. Also, changing the model or parameter settings, such as the temperature, can cause variations in the response even when you use the same seed value. By default, a random seed value is used. This is a preview feature.	`Optional`

Sample request

REST

To test a text prompt by using the Vertex AI API, send a POST request to the publisher model endpoint.

Before using any of the request data, make the following replacements:

PROJECT_ID: Your project ID.

For other fields, see the Request body table.

HTTP method and URL:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/text-bison:predict

Request JSON body:

{
  "instances": [
    { "prompt": "Give me ten interview questions for the role of program manager."}
  ],
  "parameters": {
    "temperature": 0.2,
    "maxOutputTokens": 256,
    "topK": 40,
    "topP": 0.95,
    "logprobs": 2
  }
}

To send your request, choose one of these options:

curl

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login , or by using Cloud Shell, which automatically logs you into the gcloud CLI . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/text-bison:predict"

PowerShell

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/text-bison:predict" | Select-Object -Expand Content

You should receive a JSON response similar to the sample response.

Vertex AI SDK for Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Vertex AI SDK for Python API reference documentation.

import vertexai

from vertexai.language_models import TextGenerationModel

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"
vertexai.init(project=PROJECT_ID, location="us-central1")
parameters = {
    "temperature": 0.2,  # Temperature controls the degree of randomness in token selection.
    "max_output_tokens": 256,  # Token limit determines the maximum amount of text output.
    "top_p": 0.8,  # Tokens are selected from most probable to least until the sum of their probabilities equals the top_p value.
    "top_k": 40,  # A top_k of 1 means the selected token is the most probable among all tokens.
}

model = TextGenerationModel.from_pretrained("text-bison@002")
response = model.predict(
    "Give me ten interview questions for the role of program manager.",
    **parameters,
)
print(f"Response from Model: {response.text}")
# Example response:
# Response from Model:  1. **Tell me about your experience managing programs.**
# 2. **What are your strengths and weaknesses as a program manager?**
# 3. **What do you think are the most important qualities for a successful program manager?**
# ...

Node.js

Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

/**
 * TODO(developer): Update these variables before running the sample.
 */
const PROJECT_ID = process.env.CAIP_PROJECT_ID;
const LOCATION = 'us-central1';
const PUBLISHER = 'google';
const MODEL = 'text-bison@001';
const aiplatform = require('@google-cloud/aiplatform');

// Imports the Google Cloud Prediction service client
const {PredictionServiceClient} = aiplatform.v1;

// Import the helper module for converting arbitrary protobuf.Value objects.
const {helpers} = aiplatform;

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const predictionServiceClient = new PredictionServiceClient(clientOptions);

async function callPredict() {
  // Configure the parent resource
  const endpoint = `projects/${PROJECT_ID}/locations/${LOCATION}/publishers/${PUBLISHER}/models/${MODEL}`;

  const prompt = {
    prompt:
      'Give me ten interview questions for the role of program manager.',
  };
  const instanceValue = helpers.toValue(prompt);
  const instances = [instanceValue];

  const parameter = {
    temperature: 0.2,
    maxOutputTokens: 256,
    topP: 0.95,
    topK: 40,
  };
  const parameters = helpers.toValue(parameter);

  const request = {
    endpoint,
    instances,
    parameters,
  };

  // Predict request
  const response = await predictionServiceClient.predict(request);
  console.log('Get text prompt response');
  console.log(response);
}

callPredict();

Java

Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


import com.google.cloud.aiplatform.v1.EndpointName;
import com.google.cloud.aiplatform.v1.PredictResponse;
import com.google.cloud.aiplatform.v1.PredictionServiceClient;
import com.google.cloud.aiplatform.v1.PredictionServiceSettings;
import com.google.protobuf.Value;
import com.google.protobuf.util.JsonFormat;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class PredictTextPromptSample {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    // Details of designing text prompts for supported large language models:
    // https://cloud.google.com/vertex-ai/docs/generative-ai/text/text-overview
    String instance =
        "{ \"prompt\": " + "\"Give me ten interview questions for the role of program manager.\"}";
    String parameters =
        "{\n"
            + "  \"temperature\": 0.2,\n"
            + "  \"maxOutputTokens\": 256,\n"
            + "  \"topP\": 0.95,\n"
            + "  \"topK\": 40\n"
            + "}";
    String project = "YOUR_PROJECT_ID";
    String location = "us-central1";
    String publisher = "google";
    String model = "text-bison@001";

    predictTextPrompt(instance, parameters, project, location, publisher, model);
  }

  // Get a text prompt from a supported text model
  public static void predictTextPrompt(
      String instance,
      String parameters,
      String project,
      String location,
      String publisher,
      String model)
      throws IOException {
    String endpoint = String.format("%s-aiplatform.googleapis.com:443", location);
    PredictionServiceSettings predictionServiceSettings =
        PredictionServiceSettings.newBuilder().setEndpoint(endpoint).build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (PredictionServiceClient predictionServiceClient =
        PredictionServiceClient.create(predictionServiceSettings)) {
      final EndpointName endpointName =
          EndpointName.ofProjectLocationPublisherModelName(project, location, publisher, model);

      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      Value.Builder instanceValue = Value.newBuilder();
      JsonFormat.parser().merge(instance, instanceValue);
      List<Value> instances = new ArrayList<>();
      instances.add(instanceValue.build());

      // Use Value.Builder to convert instance to a dynamically typed value that can be
      // processed by the service.
      Value.Builder parameterValueBuilder = Value.newBuilder();
      JsonFormat.parser().merge(parameters, parameterValueBuilder);
      Value parameterValue = parameterValueBuilder.build();

      PredictResponse predictResponse =
          predictionServiceClient.predict(endpointName, instances, parameterValue);
      System.out.println("Predict Response");
      System.out.println(predictResponse);
    }
  }
}

Response body

{
  "predictions":[
    {
      "content": string,
      "citationMetadata": {
        "citations": [
          {
            "startIndex": integer,
            "endIndex": integer,
            "url": string,
            "title": string,
            "license": string,
            "publicationDate": string
          }
        ]
      },
      "logprobs": {
        "tokenLogProbs": [ float ],
        "tokens": [ string ],
        "topLogProbs": [ { map<string, float> } ]
      },
      "safetyAttributes": {
        "categories": [ string ],
        "blocked": boolean,
        "scores": [ float ],
        "errors": [ int ]
      }
    }
  ],
  "metadata": {
    "tokenMetadata": {
      "input_token_count": {
        "total_tokens": integer,
        "total_billable_characters": integer
      },
      "output_token_count": {
        "total_tokens": integer,
        "total_billable_characters": integer
      }
    }
  }
}

Response element	Description
`content`	The result generated from input text.
`categories`	The display names of Safety Attribute categories associated with the generated content. Order matches the Scores.
`scores`	The confidence scores of the each category, higher value means higher confidence.
`blocked`	A flag indicating if the model's input or output was blocked.
`errors`	An error code that identifies why the input or output was blocked. For a list of error codes, see Safety filters and attributes.
`startIndex`	Index in the prediction output where the citation starts (inclusive). Must be >= 0 and < end_index.
`endIndex`	Index in the prediction output where the citation ends (exclusive). Must be > start_index and < len(output).
`url`	URL associated with this citation. If present, this URL links to the webpage of the source of this citation. Possible URLs include news websites, GitHub repos, etc.
`title`	Title associated with this citation. If present, it refers to the title of the source of this citation. Possible titles include news titles, book titles, etc.
`license`	License associated with this recitation. If present, it refers to the license of the source of this citation. Possible licenses include code licenses, e.g., mit license.
`publicationDate`	Publication date associated with this citation. If present, it refers to the date at which the source of this citation was published. Possible formats are YYYY, YYYY-MM, YYYY-MM-DD.
`input_token_count`	Number of input tokens. This is the total number of tokens across all prompts, prefixes, and suffixes.
`output_token_count`	Number of output tokens. This is the total number of tokens in `content` across all predictions.
`tokens`	The sampled tokens.
`tokenLogProbs`	The sampled tokens' log probabilities.
`topLogProb`	The most likely candidate tokens and their log probabilities at each step.
`logprobs`	Results of the `logprobs` parameter. 1-1 mapping to `candidates`.

Sample response

{
  "predictions": [
    {
      "citationMetadata":{
        "citations": [ ]
      },
      "safetyAttributes":{
        "scores": [
          0.1
        ],
        "categories": [
          "Finance"
        ],
        "blocked": false
      },
      "content":"1. What is your experience with project management?\n2. What are your strengths and weaknesses as a project manager?\n3. How do you handle conflict and difficult situations?\n4. How do you communicate with stakeholders?\n5. How do you stay organized and on track?\n6. How do you manage your time effectively?\n7. What are your goals for your career?\n8. Why are you interested in this position?\n9. What are your salary expectations?\n10. What are your availability and start date?",
      "logprobs": {
        "tokenLogProbs": [
          -0.1,
          -0.2
        ],
        "tokens": [
          "vertex",
          " rocks!"
        ],
        "topLogProbs": [
          {
            "vertex": -0.1,
            "hello": -0.2
          },
          {
            " rocks!": -0.2,
            " world!": -0.3
          }
        ]
      }
    },
    "metadata": {
      "tokenMetadata": {
        "outputTokenCount": {
          "totalTokens": 153,
          "totalBillableCharacters": 537
        },
        "inputTokenCount": {
          "totalBillableCharacters": 54,
          "totalTokens": 12
        }
      }
    }
  ]
}

Stream response from Generative AI models

The parameters are the same for streaming and non-streaming requests to the APIs.

To view sample code requests and responses using the REST API, see Examples using the REST API.

To view sample code requests and responses using the Vertex AI SDK for Python, see Examples using Vertex AI SDK for Python.