Stream answers

This page introduces the streaming answer method.

The streaming answer method has many of the same features as the answer method plus one additional feature: streaming. When you stream an answer, the generated answer is broken into multiple parts that are sent in sequence.

Streaming answers is particularly useful if the generated answers are long, so that sending the entire answer at once causes a delay. Streaming answers reduces the appearance of latency.

Limitations

The streaming answer method has the same features as the answer method with the following exceptions:

  • Streaming answer is supported only for English.

  • Only Gemini models can be used with streaming answer API. For example, you can't use a text-bison model. For a list of models, see Available models.

  • The number of rephrase steps is one. You can't disable rephrasing, nor can you change the maximum number of steps.

  • You can't get grounding scores for streaming answers, nor can you choose to return only well-grounded answers.

Stream an answer

The following command shows how to call the streaming answer method and return a generated answer in the form of a series of JSON responses. Typically, each response contains one sentence of the answer.

This basic command shows the required input only. The options are left at their defaults.

For examples of other options, see Get answers and follow-ups. Some answer options aren't available for answer streaming; see the limitations on this page.

REST

To search and get results with a streamed generated answer, do the following:

  1. Run the following curl command:

    curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/engines/APP_ID/servingConfigs/default_search:streamAnswer" \
      -d '{
            "query": { "text": "QUERY"}
          }'
    

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • APP_ID: the ID of the Vertex AI Search app that you want to query.
    • QUERY: a free-text string that contains the question or search query. For example, "Which database is faster, bigquery or spanner?".

Other examples

The basic command shown in Stream an answer is the simplest command with no options specified. However, you can apply the same options available with the answer method, with the exception of the limitations listed on this page.

Streaming answers can also used with follow-up sessions.