Batch text generation

Batch predictions are a way to efficiently send multiple multimodal prompts that are not latency sensitive. Unlike online prediction, where you are limited to one input prompt at a time, you can send a large number of multimodal prompts in a single batch request. Then, your responses asynchronously populate in your BigQuery storage output location.

Batch requests for Gemini models are discounted 50% from standard requests. To learn more, see the Pricing page.

Multimodal models that support batch predictions

The following multimodal models support batch predictions.

  • gemini-1.5-flash-002
  • gemini-1.5-flash-001
  • gemini-1.5-pro-002
  • gemini-1.5-pro-001
  • gemini-1.0-pro-002
  • gemini-1.0-pro-001

Prepare your inputs

Batch requests for multimodal models accept BigQuery storage sources and Cloud Storage sources.

BigQuery storage input

  • The content in the request column must be valid JSON. This JSON data represents your input for the model.
  • The content in the JSON instructions must match the structure of a GenerateContentRequest.
  • Your input table can have columns other than request. They are ignored for content generation but included in the output table. The system reserves two column names for output: response and status. These are used to provide information about the outcome of the batch prediction job.
  • Batch prediction only supports public YouTube and Cloud Storage bucket URIs in the fileData field for Gemini.
  • The fileData support is limited to certain Gemini models.
Example input (JSON)
        
{
  "contents": [
    {
      "role": "user",
      "parts": {
        "text": "Give me a recipe for banana bread."
      }
    }
  ],
  "system_instruction": {
    "parts": [
      {
        "text": "You are a chef."
      }
    ]
  }
}
        
        

Cloud Storage input

  • File format: JSON Lines (JSONL)
  • Located in us-central1
  • Appropriate read permissions for the service account
  • Batch prediction only supports public YouTube and Cloud Storage bucket URIs in the fileData field for Gemini.
  • The fileData support is limited to certain Gemini models.
    Example input (JSONL)
    
    {"request":{"contents": [{"role": "user", "parts": [{"text": "What is the relation between the following video and image samples?"}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/video/animals.mp4", "mime_type": "video/mp4"}}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/image/cricket.jpeg", "mime_type": "image/jpeg"}}]}]}}
    {"request":{"contents": [{"role": "user", "parts": [{"text": "Describe what is happening in this video."}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/video/another_video.mov", "mime_type": "video/mov"}}]}]}}
        

Request a batch response

Depending on the number of input items that you submitted, a batch generation task can take some time to complete.

REST

To test a multimodal prompt by using the Vertex AI API, send a POST request to the publisher model endpoint.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: The name of your Google Cloud project.
  • BP_JOB_NAME: A name you choose for your job.
  • INPUT_URI: The input source URI. This is a BigQuery table URI in the form bq://PROJECT_ID.DATASET.TABLE. Or your Cloud Storage bucket URI.
  • INPUT_SOURCE: The input source type. Options are bigquerySource and gcsSource.
  • INSTANCES_FORMAT: Input instances format - can be `jsonl` or `bigquery`.
  • OUTPUT_URI: The URI of the output or target output table, in the form bq://PROJECT_ID.DATASET.TABLE. If the table doesn't already exist, then it is created for you.

HTTP method and URL:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs

Request JSON body:

{
    "displayName": "BP_JOB_NAME",
    "model": "publishers/google/models/gemini-1.0-pro-002",
    "inputConfig": {
      "instancesFormat":"INSTANCES_FORMAT",
      "inputSource":{ INPUT_SOURCE
        "inputUri" : "INPUT_URI"
      }
    },
    "outputConfig": {
      "predictionsFormat":"bigquery",
      "bigqueryDestination":{
        "outputUri": "OUTPUT_URI"
        }
    }
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{
  "name": "projects/{PROJECT_ID}/locations/us-central1/batchPredictionJobs/{BATCH_JOB_ID}",
  "displayName": "My first batch prediction",
  "model": "projects/{PROJECT_ID}/locations/us-central1/models/gemini-1.0-pro-002",
  "inputConfig": {
    "instancesFormat": "bigquery",
    "bigquerySource": {
      "inputUri": "bq://{PROJECT_ID}.mydataset.batch_predictions_input"
    }
  },
  "modelParameters": {},
  "outputConfig": {
    "predictionsFormat": "bigquery",
    "bigqueryDestination": {
      "outputUri": "bq://{PROJECT_ID}.mydataset.batch_predictions_output"
    }
  },
  "state": "JOB_STATE_PENDING",
  "createTime": "2023-07-12T20:46:52.148717Z",
  "updateTime": "2023-07-12T20:46:52.148717Z",
  "modelVersionId": "1"
}

The response includes a unique identifier for the batch job. You can poll for the status of the batch job using the BATCH_JOB_ID until the job state is JOB_STATE_SUCCEEDED. For example:

curl \
  -X GET \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs/BATCH_JOB_ID

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os

from vertexai.preview.batch_prediction import BatchPredictionJob

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def batch_predict_gemini_createjob(
    input_uri: str, output_uri: str
) -> BatchPredictionJob:
    """Perform batch text prediction using a Gemini AI model.
    Args:
        input_uri (str): URI of the input file in BigQuery table or Google Cloud Storage.
            Example: "gs://[BUCKET]/[DATASET].jsonl" OR "bq://[PROJECT].[DATASET].[TABLE]"

        output_uri (str): URI of the output folder,  in BigQuery table or Google Cloud Storage.
            Example: "gs://[BUCKET]/[OUTPUT].jsonl" OR "bq://[PROJECT].[DATASET].[TABLE]"
    Returns:
        batch_prediction_job: The batch prediction job object containing details of the job.
    """

    import time
    import vertexai

    from vertexai.preview.batch_prediction import BatchPredictionJob

    # TODO(developer): Update and un-comment below lines
    # input_uri ="gs://[BUCKET]/[OUTPUT].jsonl" # Example
    # output_uri ="gs://[BUCKET]"

    # Initialize vertexai
    vertexai.init(project=PROJECT_ID, location="us-central1")

    # Submit a batch prediction job with Gemini model
    batch_prediction_job = BatchPredictionJob.submit(
        source_model="gemini-1.5-flash-002",
        input_dataset=input_uri,
        output_uri_prefix=output_uri,
    )

    # Check job status
    print(f"Job resource name: {batch_prediction_job.resource_name}")
    print(f"Model resource name with the job: {batch_prediction_job.model_name}")
    print(f"Job state: {batch_prediction_job.state.name}")

    # Refresh the job until complete
    while not batch_prediction_job.has_ended:
        time.sleep(5)
        batch_prediction_job.refresh()

    # Check if the job succeeds
    if batch_prediction_job.has_succeeded:
        print("Job succeeded!")
    else:
        print(f"Job failed: {batch_prediction_job.error}")

    # Check the location of the output
    print(f"Job output location: {batch_prediction_job.output_location}")

    # Example response:
    #  Job output location: gs://your-bucket/gen-ai-batch-prediction/prediction-model-year-month-day-hour:minute:second.12345

    # https://storage.googleapis.com/cloud-samples-data/batch/prompt_for_batch_gemini_predict.jsonl

    return batch_prediction_job



if __name__ == "__main__":
    # TODO(developer): Update your Cloud Storage bucket and uri file paths
    GCS_BUCKET = "gs://your-bucket"
    batch_predict_gemini_createjob(
        input_uri=f"gs://{GCS_BUCKET}/batch_data/sample_input_file.jsonl",
        output_uri=f"gs://{GCS_BUCKET}/batch_predictions/sample_output/",
    )

Retrieve batch output

When a batch prediction task completes, the output is stored in the BigQuery table that you specified in your request.

BigQuery output example

request response status
'{"content":[{...}]}'
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "In a medium bowl, whisk together the flour, baking soda, baking powder."
          }
        ]
      },
      "finishReason": "STOP",
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.14057204,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.14270912
        }
      ]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 8,
    "candidatesTokenCount": 396,
    "totalTokenCount": 404
  }
}

Cloud Storage output example

PROJECT_ID=[PROJECT ID]
REGION="us-central1"
MODEL_URI="publishers/google/models/gemini-1.0-pro-001@default"
INPUT_URI="[GCS INPUT URI]"
OUTPUT_URI="[OUTPUT URI]"

# Setting variables based on parameters
ENDPOINT="${REGION}-autopush-aiplatform.sandbox.googleapis.com"
API_VERSION=v1
ENV=autopush
BP_JOB_NAME="BP_testing_`date +%Y%m%d_%H%M%S`"

curl \
  -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${ENDPOINT}/${API_VERSION}/projects/${PROJECT_ID}/locations/${REGION}/batchPredictionJobs \
-d '{
    "name": "'${BP_JOB_NAME}'",
    "displayName": "'${BP_JOB_NAME}'",
    "model": "'${MODEL_URI}'",
    "inputConfig": {
      "instancesFormat":"jsonl",
      "gcsSource":{
        "uris" : "'${INPUT_URI}'"
      }
    },
    "outputConfig": {
      "predictionsFormat":"jsonl",
      "gcsDestination":{
        "outputUriPrefix": "'${OUTPUT_URI}'"
      }
    },
    "labels": {"stage": "'${ENV}'"},
}'

What's next