The ML.GENERATE_TEXT function

This document describes the ML.GENERATE_TEXT function, which lets you perform generative natural language tasks on text that's stored in BigQuery tables. Example language tasks include the following:

  • Classification
  • Sentiment Analysis
  • Entity extraction
  • Extractive Question Answering
  • Summarization
  • Rewriting text in a different style
  • Ad copy generation
  • Concept ideation

The function works by sending requests to a BigQuery ML remote model that represents one of the Vertex AI text-bison* natural language foundation models (LLMs), and then returning the LLM's response. Several of the function's arguments provide the parameters that shape the LLM's response.

The prompt parameter provides the text for the model to analyze. Prompt design can strongly affect the responses returned by the LLM. For more information, see Design text prompts.

Syntax

ML.GENERATE_TEXT(
MODEL `project_id.dataset.model`,
{ TABLE `project_id.dataset.table` | (query_statement) },
STRUCT(
  [number_of_output_tokens AS max_output_tokens]
  [, top_k_value AS top_k]
  [, top_p_value AS top_p]
  [, temperature AS temperature]
  [, flatten_json_output AS flatten_json_output])
)

Arguments

ML.GENERATE_TEXT takes the following arguments:

  • project_id: a STRING value that specifies your project ID.

  • dataset: a STRING value that specifies the BigQuery dataset that contains the model.

  • model: a STRING value that specifies the name of a remote model that uses one of the text-bison* Vertex AI LLMs. For more information about how to create this type of remote model, see ENDPOINT.

  • table: a STRING value that specifies the name of the BigQuery table that contains the prompt data. The text in the column that's named prompt is sent to the model. If your table does not have a prompt column, use a SELECT statement for this argument to provide an alias for an existing table column. An error occurs if no prompt column is available.

  • query_statement : a STRING value that specifies the GoogleSQL query that is used to generate the prompt data.

  • max_output_tokens: an INT64 value in the range [1,1024] that sets the maximum number of tokens that the model outputs. Specify a lower value for shorter responses and a higher value for longer responses. The default is 50.

    A token might be smaller than a word and is approximately four characters. 100 tokens correspond to approximately 60-80 words.

  • temperature: a FLOAT64 value in the range [0.0,1.0] that is used for sampling during the response generation, which occurs when top_k and top_p are applied. It controls the degree of randomness in token selection. Lower temperature values are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperature values can lead to more diverse or creative results. A temperature value of 0 is deterministic, meaning that the highest probability response is always selected. The default is 1.0.

  • top_k: an INT64 value in the range [1,40] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 40.

    A top_k value of 1 means the next selected token is the most probable among all tokens in the model's vocabulary, while a top_k value of 3 means that the next token is selected from among the three most probable tokens by using the temperature value.

    For each token selection step, the top_k tokens with the highest probabilities are sampled. Then tokens are further filtered based on the top_p value, with the final token selected using temperature sampling.

  • top_p: a FLOAT64 value in the range [0.0,1.0] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 1.0.

    Tokens are selected from the most (based on the top_k value) to least probable until the sum of their probabilities equals the top_p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top_p value is 0.5, then the model selects either A or B as the next token by using the temperature value and doesn't consider C.

  • flatten_json_output: a BOOL value that determines whether the JSON content returned by the function is parsed into separate columns. The default is FALSE.

Output

ML.GENERATE_TEXT returns the input table plus the following columns:

  • ml_generate_text_result: the JSON response from theprojects.locations.endpoints.predict call to the model. The generated text is in the content element. The safety attributes are in the safetyAttributes element. This column is returned when flatten_json_output is FALSE.
  • ml_generate_text_llm_result: a STRING value that contains the generated text returned by the projects.locations.endpoints.predict call to the model. This column is returned when flatten_json_output is TRUE.
  • ml_generate_text_rai_result: a STRING value that contains the safety attributes returned by theprojects.locations.endpoints.predict call to the model. This column is returned when flatten_json_output is TRUE.
  • ml_generate_text_status: a STRING value that contains the API response status for the corresponding row. This value is empty if the operation was successful.

Locations

ML.GENERATE_TEXT must run in the same region as the remote model that the function references. You can only create remote models based on text-bison* models in the following locations:

  • asia-northeast3
  • asia-southeast1
  • eu
  • europe-west1
  • europe-west2
  • europe-west3
  • europe-west4
  • europe-west9
  • us
  • us-central1
  • us-west4

Quotas

See Cloud AI service table-valued functions quotas and limits.

Examples

Example 1

This example shows a request with the following characteristics:

  • Provides a single prompt.
  • Returns a longer generated text response.
  • Returns a less probable generated text response.
SELECT *
FROM
  ML.GENERATE_TEXT(
    MODEL `mydataset.llm_model`,
    (SELECT 'What is the purpose of dreams?' AS prompt),
    STRUCT(
      0.8 AS temperature,
      1024 AS max_output_tokens,
      0.95 AS top_p,
      40 AS top_k));

Example 2

This example shows a request with the following characteristics:

  • Provides prompt data from a table column that's named prompt.
  • Returns a shorter generated text response.
  • Returns a more probable generated text response.
  • Flattens the JSON response into separate columns.
SELECT *
FROM
  ML.GENERATE_TEXT(
    MODEL
      `mydataset.llm_model`
        TABLE `mydataset.prompt_table`,
    STRUCT(
      0.2 AS temperature, 75 AS max_output_tokens, 0.3 AS top_p, 15 AS top_k, TRUE AS flatten_json_output));

Example 3

This example shows a request with the following characteristics:

  • Provides prompt data from a table column named question that is aliased as prompt.
  • Returns a moderately long generated text response.
  • Returns a moderately probable generated text response.
SELECT *
FROM
  ML.GENERATE_TEXT(
    MODEL `mydataset.llm_model`,
    (SELECT question AS prompt FROM `mydataset.prompt_table`),
    STRUCT(
      0.4 AS temperature, 750 AS max_output_tokens, 0.5 AS top_p, 30 AS top_k));

Example 4

This example shows a request with the following characteristics:

  • Concatenates strings and a table column to provide the prompt data.
  • Returns a longer generated text response.
  • Returns a more probable generated text response.
SELECT *
FROM
  ML.GENERATE_TEXT(
    MODEL `mydataset.llm_model`,
    (
      SELECT
        CONCAT(
          'Classify the sentiment of the following text as positive or negative.Text:',
          input_column,
          'Sentiment:') AS prompt
      FROM `mydataset.input_table`
    ),
    STRUCT(
      0.1 AS temperature,
      1000 AS max_output_tokens,
      0.1 AS top_p,
      10 AS top_k));

What's next