The ML.GENERATE_TEXT function

This document describes the ML.GENERATE_TEXT function, which lets you perform generative natural language tasks by using text from BigQuery standard tables, or visual content from BigQuery object tables.

The function works by sending requests to a BigQuery ML remote model that represents one of the Vertex AI text or multimodal foundation models (LLMs), and then returning the model's response. Several of the function's arguments provide the parameters that shape the Vertex AI model's response.

The ML.GENERATE_TEXT function works with the Vertex AI models to perform text processing tasks like classification, sentiment analysis, and code generation. For more information on the types of tasks these models can perform, see the following use cases:

Prompt design can strongly affect the responses returned by the Vertex AI model. For more information, see Design multimodal prompts or Design text prompts.

Syntax

ML.GENERATE_TEXT syntax differs depending on the Vertex AI model that your remote models targets. Choose the option appropriate for your use case.

gemini-pro

ML.GENERATE_TEXT(
MODEL project_id.dataset.model,
{ TABLE project_id.dataset.table | (query_statement) },
STRUCT(
  [max_output_tokens AS max_output_tokens]
  [, top_k AS top_k]
  [, top_p AS top_p]
  [, temperature AS temperature]
  [, flatten_json_output AS flatten_json_output]
  [, stop_sequences AS stop_sequences])
)

Arguments

ML.GENERATE_TEXT takes the following arguments:

  • project_id: your project ID.

  • dataset: the BigQuery dataset that contains the model.

  • model: the name of the remote model over the Vertex AI LLM. For more information about how to create this type of remote model, see The CREATE MODEL statement for remote models over LLMs.

    You can confirm what LLM is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.

  • table: the name of the BigQuery table that contains the prompt data. The text in the column that's named prompt is sent to the model. If your table does not have a prompt column, use a SELECT statement for this argument to provide an alias for an existing table column. An error occurs if no prompt column is available.

  • query_statement: the GoogleSQL query that generates the prompt data.

  • max_output_tokens: an INT64 value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range [1,8192]. Specify a lower value for shorter responses and a higher value for longer responses. The default is 128.

  • top_k: an INT64 value in the range [1,40] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 40.

    A top_k value of 1 means the next selected token is the most probable among all tokens in the model's vocabulary, while a top_k value of 3 means that the next token is selected from among the three most probable tokens by using the temperature value.

    For each token selection step, the top_k tokens with the highest probabilities are sampled. Then tokens are further filtered based on the top_p value, with the final token selected using temperature sampling.

  • top_p: a FLOAT64 value in the range [0.0,1.0] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 0.95.

    Tokens are selected from the most (based on the top_k value) to least probable until the sum of their probabilities equals the top_p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1, and the top_p value is 0.5, then the model selects either A or B as the next token by using the temperature value and doesn't consider C.

  • temperature: a FLOAT64 value in the range [0.0,1.0] that is used for sampling during the response generation, which occurs when the top_k and top_p values are applied. It controls the degree of randomness in token selection. Lower temperature values are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperature values can lead to more diverse or creative results. A temperature value of 0 is deterministic, meaning that the highest probability response is always selected. The default is 0.

  • flatten_json_output: a BOOL value that determines whether the JSON content returned by the function is parsed into separate columns. The default is FALSE.

  • stop_sequences: an ARRAY<STRING> value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.

Details

The model and input table must be in the same region.

gemini-pro-vision

ML.GENERATE_TEXT(
MODEL project_id.dataset.model,
TABLE project_id.dataset.table,
STRUCT(
  prompt AS prompt
  [, max_output_tokens AS max_output_tokens]
  [, top_k AS top_k]
  [, top_p AS top_p]
  [, temperature AS temperature]
  [, flatten_json_output AS flatten_json_output]
  [, stop_sequences AS stop_sequences])
)

Arguments

ML.GENERATE_TEXT takes the following arguments:

  • project_id: your project ID.

  • dataset: the BigQuery dataset that contains the model.

  • model: the name of the remote model over the Vertex AI LLM. For more information about how to create this type of remote model, see The CREATE MODEL statement for remote models over LLMs.

    You can confirm what LLM is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.

  • table: the name of the object table that contains the visual content to analyze. For more information on what types of visual content you can analyze, see Supported visual content.

    The Cloud Storage bucket used by the input object table must be in the same project where you have created the model and where you are calling the ML.GENERATE_TEXT function.

  • prompt: a STRING value that contains the prompt to use to analyze the visual content. The prompt value must contain less than 16,000 tokens. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words.
  • max_output_tokens: an INT64 value that sets the maximum number of tokens that can be generated in the response. This value must be in the range [1,2048]. Specify a lower value for shorter responses and a higher value for longer responses. The default is 2048.

  • top_k: an INT64 value in the range [1,40] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 32.

    A top_k value of 1 means the next selected token is the most probable among all tokens in the model's vocabulary, while a top_k value of 3 means that the next token is selected from among the three most probable tokens by using the temperature value.

    For each token selection step, the top_k tokens with the highest probabilities are sampled. Then tokens are further filtered based on the top_p value, with the final token selected using temperature sampling.

  • top_p: a FLOAT64 value in the range [0.0,1.0] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 0.95.

    Tokens are selected from the most (based on the top_k value) to least probable until the sum of their probabilities equals the top_p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1, and the top_p value is 0.5, then the model selects either A or B as the next token by using the temperature value and doesn't consider C.

  • temperature: a FLOAT64 value in the range [0.0,1.0] that is used for sampling during the response generation, which occurs when the top_k and top_p values are applied. It controls the degree of randomness in token selection. Lower temperature values are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperature values can lead to more diverse or creative results. A temperature value of 0 is deterministic, meaning that the highest probability response is always selected. The default is 0.4.

  • flatten_json_output: a BOOL value that determines whether the JSON content returned by the function is parsed into separate columns. The default is FALSE.

  • stop_sequences: an ARRAY<STRING> value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.

Details

The model and input table must be in the same region.

text-bison

ML.GENERATE_TEXT(
MODEL project_id.dataset.model,
{ TABLE project_id.dataset.table | (query_statement) },
STRUCT(
  [max_output_tokens AS max_output_tokens]
  [, top_k AS top_k]
  [, top_p AS top_p]
  [, temperature AS temperature]
  [, flatten_json_output AS flatten_json_output]
  [, stop_sequences AS stop_sequences])
)

Arguments

ML.GENERATE_TEXT takes the following arguments:

  • project_id: your project ID.

  • dataset: the BigQuery dataset that contains the model.

  • model: the name of the remote model over the Vertex AI LLM. For more information about how to create this type of remote model, see The CREATE MODEL statement for remote models over LLMs.

    You can confirm what LLM is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.

  • table: the name of the BigQuery table that contains the prompt data. The text in the column that's named prompt is sent to the model. If your table does not have a prompt column, use a SELECT statement for this argument to provide an alias for an existing table column. An error occurs if no prompt column is available.

  • query_statement: the GoogleSQL query that generates the prompt data.

  • max_output_tokens: an INT64 value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range [1,1024]. Specify a lower value for shorter responses and a higher value for longer responses. The default is 128.

  • top_k: an INT64 value in the range [1,40] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 40.

    A top_k value of 1 means the next selected token is the most probable among all tokens in the model's vocabulary, while a top_k value of 3 means that the next token is selected from among the three most probable tokens by using the temperature value.

    For each token selection step, the top_k tokens with the highest probabilities are sampled. Then tokens are further filtered based on the top_p value, with the final token selected using temperature sampling.

  • top_p: a FLOAT64 value in the range [0.0,1.0] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 0.95.

    Tokens are selected from the most (based on the top_k value) to least probable until the sum of their probabilities equals the top_p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1, and the top_p value is 0.5, then the model selects either A or B as the next token by using the temperature value and doesn't consider C.

  • temperature: a FLOAT64 value in the range [0.0,1.0] that is used for sampling during the response generation, which occurs when the top_k and top_p values are applied. It controls the degree of randomness in token selection. Lower temperature values are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperature values can lead to more diverse or creative results. A temperature value of 0 is deterministic, meaning that the highest probability response is always selected. The default is 0.

  • flatten_json_output: a BOOL value that determines whether the JSON content returned by the function is parsed into separate columns. The default is FALSE.

  • stop_sequences: an ARRAY<STRING> value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.

Details

The model and input table must be in the same region.

text-bison-32

ML.GENERATE_TEXT(
MODEL project_id.dataset.model,
{ TABLE project_id.dataset.table | (query_statement) },
STRUCT(
  [max_output_tokens AS max_output_tokens]
  [, top_k AS top_k]
  [, top_p AS top_p]
  [, temperature AS temperature]
  [, flatten_json_output AS flatten_json_output]
  [, stop_sequences AS stop_sequences])
)

Arguments

ML.GENERATE_TEXT takes the following arguments:

  • project_id: your project ID.

  • dataset: the BigQuery dataset that contains the model.

  • model: the name of the remote model over the Vertex AI LLM. For more information about how to create this type of remote model, see The CREATE MODEL statement for remote models over LLMs.

    You can confirm what LLM is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.

  • table: the name of the BigQuery table that contains the prompt data. The text in the column that's named prompt is sent to the model. If your table does not have a prompt column, use a SELECT statement for this argument to provide an alias for an existing table column. An error occurs if no prompt column is available.

  • query_statement: the GoogleSQL query that generates the prompt data.

  • max_output_tokens: an INT64 value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range [1,8192]. Specify a lower value for shorter responses and a higher value for longer responses. The default is 128.

  • top_k: an INT64 value in the range [1,40] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 40.

    A top_k value of 1 means the next selected token is the most probable among all tokens in the model's vocabulary, while a top_k value of 3 means that the next token is selected from among the three most probable tokens by using the temperature value.

    For each token selection step, the top_k tokens with the highest probabilities are sampled. Then tokens are further filtered based on the top_p value, with the final token selected using temperature sampling.

  • top_p: a FLOAT64 value in the range [0.0,1.0] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 0.95.

    Tokens are selected from the most (based on the top_k value) to least probable until the sum of their probabilities equals the top_p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1, and the top_p value is 0.5, then the model selects either A or B as the next token by using the temperature value and doesn't consider C.

  • temperature: a FLOAT64 value in the range [0.0,1.0] that is used for sampling during the response generation, which occurs when the top_k and top_p values are applied. It controls the degree of randomness in token selection. Lower temperature values are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperature values can lead to more diverse or creative results. A temperature value of 0 is deterministic, meaning that the highest probability response is always selected. The default is 0.

  • flatten_json_output: a BOOL value that determines whether the JSON content returned by the function is parsed into separate columns. The default is FALSE.

  • stop_sequences: an ARRAY<STRING> value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.

Details

The model and input table must be in the same region.

text-unicorn

ML.GENERATE_TEXT(
MODEL project_id.dataset.model,
{ TABLE project_id.dataset.table | (query_statement) },
STRUCT(
  [max_output_tokens AS max_output_tokens]
  [, top_k AS top_k]
  [, top_p AS top_p]
  [, temperature AS temperature]
  [, flatten_json_output AS flatten_json_output]
  [, stop_sequences AS stop_sequences])
)

Arguments

ML.GENERATE_TEXT takes the following arguments:

  • project_id: your project ID.

  • dataset: the BigQuery dataset that contains the model.

  • model: the name of the remote model over the Vertex AI LLM. For more information about how to create this type of remote model, see The CREATE MODEL statement for remote models over LLMs.

    You can confirm what LLM is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.

  • table: the name of the BigQuery table that contains the prompt data. The text in the column that's named prompt is sent to the model. If your table does not have a prompt column, use a SELECT statement for this argument to provide an alias for an existing table column. An error occurs if no prompt column is available.

  • query_statement: the GoogleSQL query that generates the prompt data.

  • max_output_tokens: an INT64 value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range [1,1024]. Specify a lower value for shorter responses and a higher value for longer responses. The default is 128.

  • top_k: an INT64 value in the range [1,40] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 40.

    A top_k value of 1 means the next selected token is the most probable among all tokens in the model's vocabulary, while a top_k value of 3 means that the next token is selected from among the three most probable tokens by using the temperature value.

    For each token selection step, the top_k tokens with the highest probabilities are sampled. Then tokens are further filtered based on the top_p value, with the final token selected using temperature sampling.

  • top_p: a FLOAT64 value in the range [0.0,1.0] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 0.95.

    Tokens are selected from the most (based on the top_k value) to least probable until the sum of their probabilities equals the top_p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1, and the top_p value is 0.5, then the model selects either A or B as the next token by using the temperature value and doesn't consider C.

  • temperature: a FLOAT64 value in the range [0.0,1.0] that is used for sampling during the response generation, which occurs when the top_k and top_p values are applied. It controls the degree of randomness in token selection. Lower temperature values are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperature values can lead to more diverse or creative results. A temperature value of 0 is deterministic, meaning that the highest probability response is always selected. The default is 0.

  • flatten_json_output: a BOOL value that determines whether the JSON content returned by the function is parsed into separate columns. The default is FALSE.

  • stop_sequences: an ARRAY<STRING> value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.

Details

The model and input table must be in the same region.

Output

ML.GENERATE_TEXT returns the input table plus the following columns:

Gemini API models

  • ml_generate_text_result: This is the JSON response from the projects.locations.endpoints.generateContent call to the model. The generated text is in the text element. The safety attributes are in the safety_ratings element. This column is returned when flatten_json_output is FALSE.
  • ml_generate_text_llm_result: a STRING value that contains the generated text. This column is returned when flatten_json_output is TRUE.
  • ml_generate_text_rai_result: a STRING value that contains the safety attributes. This column is returned when flatten_json_output is TRUE.
  • ml_generate_text_status: a STRING value that contains the API response status for the corresponding row. This value is empty if the operation was successful.

PaLM API models

  • ml_generate_text_result: the JSON response from the projects.locations.endpoints.predict call to the model. The generated text is in the content element. The safety attributes are in the safetyAttributes element. This column is returned when flatten_json_output is FALSE.
  • ml_generate_text_llm_result: a STRING value that contains the generated text. This column is returned when flatten_json_output is TRUE.
  • ml_generate_text_rai_result: a STRING value that contains the safety attributes. This column is returned when flatten_json_output is TRUE.
  • ml_generate_text_status: a STRING value that contains the API response status for the corresponding row. This value is empty if the operation was successful.

Examples

Text analysis

Example 1

This example shows a request with the following characteristics:

  • Provides a single prompt.
  • Returns a longer generated text response.
  • Returns a less probable generated text response.
SELECT *
FROM
  ML.GENERATE_TEXT(
    MODEL `mydataset.llm_model`,
    (SELECT 'What is the purpose of dreams?' AS prompt),
    STRUCT(
      0.8 AS temperature,
      1024 AS max_output_tokens,
      0.95 AS top_p,
      40 AS top_k));

Example 2

This example shows a request with the following characteristics:

  • Provides prompt data from a table column that's named prompt.
  • Returns a shorter generated text response.
  • Returns a more probable generated text response.
  • Flattens the JSON response into separate columns.
SELECT *
FROM
  ML.GENERATE_TEXT(
    MODEL
      `mydataset.llm_model`,
    TABLE `mydataset.prompt_table`,
    STRUCT(
      0.2 AS temperature,
      75 AS max_output_tokens,
      0.3 AS top_p,
      15 AS top_k,
      TRUE AS flatten_json_output));

Example 3

This example shows a request with the following characteristics:

  • Provides prompt data from a table column named question that is aliased as prompt.
  • Returns a moderately long generated text response.
  • Returns a moderately probable generated text response.
SELECT *
FROM
  ML.GENERATE_TEXT(
    MODEL `mydataset.llm_model`,
    (SELECT question AS prompt FROM `mydataset.prompt_table`),
    STRUCT(
      0.4 AS temperature, 750 AS max_output_tokens, 0.5 AS top_p, 30 AS top_k));

Example 4

This example shows a request with the following characteristics:

  • Concatenates strings and a table column to provide the prompt data.
  • Returns a longer generated text response.
  • Returns a more probable generated text response.
SELECT *
FROM
  ML.GENERATE_TEXT(
    MODEL `mydataset.llm_model`,
    (
      SELECT
        CONCAT(
          'Classify the sentiment of the following text as positive or negative.Text:',
          input_column,
          'Sentiment:') AS prompt
      FROM `mydataset.input_table`
    ),
    STRUCT(
      0.1 AS temperature,
      1000 AS max_output_tokens,
      0.1 AS top_p,
      10 AS top_k));

Example 5

This example shows a request with the following characteristics:

  • Provides prompt data from a table column that's named prompt.
  • Returns a shorter generated text response.
  • Returns a more probable generated text response.
  • Flattens the JSON response into separate columns.
  • Excludes model responses that contain the strings Golf or football.
SELECT *
FROM
  ML.GENERATE_TEXT(
    MODEL
      `mydataset.llm_model`,
    TABLE `mydataset.prompt_table`,
    STRUCT(
      0.2 AS temperature,
      75 AS max_output_tokens,
      0.3 AS top_p,
      15 AS top_k,
      TRUE AS flatten_json_output,
      ['Golf', 'football'] AS stop_sequences));

Visual content analysis

This example analyzes visual content from an object table that's named dogs and identifies the breed of dog contained in the content:

SELECT
  uri,
  ml_generate_text_llm_result
FROM
  ML.GENERATE_TEXT(
    MODEL
      `mydataset.gemini_pro_vision_model`,
    TABLE `mydataset.dogs`
      STRUCT(
        'What is the breed of the dog?' AS PROMPT,
        TRUE AS FLATTEN_JSON_OUTPUT));

Supported visual content

When you are using a remote model over a gemini-pro-vision multimodal model, you can use ML.GENERATE_TEXT to analyze visual content that meets the following requirements:

  • Content must be in the supported image and video formats that are described in the mimeType field of the Gemini API request body.
  • Each piece of content must be no greater than 20 MB.
  • The supported maximum video length is 2 minutes. If the video is longer than 2 minutes, ML.GENERATE_TEXT only returns results for the first 2 minutes.

Locations

ML.GENERATE_TEXT must run in the same region or multi-region as the remote model that the function references. You can create remote models over Vertex AI models in all of the regions that support Generative AI APIS, and also in the US and EU multi-regions.

Quotas

See Vertex AI and Cloud AI service functions quotas and limits.

Known issues

Sometimes after a query job that uses this function finishes successfully, some returned rows contain the following error message:

A retryable error occurred: RESOURCE EXHAUSTED error from <remote endpoint>

This issue occurs because BigQuery query jobs finish successfully even if the function fails for some of the rows. The function fails when the volume of API calls to the remote endpoint exceeds the quota limits for that service. This issue occurs most often when you are running multiple parallel batch queries. BigQuery retries these calls, but if the retries fail, the resource exhausted error message is returned.

What's next