Get batch predictions for Gemini

Batch predictions are a way to efficiently send multiple multimodal prompts that are not latency sensitive. Unlike online prediction, where you are limited to one input prompt at a time, you can send a large number of multimodal prompts in a single batch request. Then, your responses asynchronously populate in your BigQuery storage output location.

Multimodal models that support batch predictions

The following multimodal models support batch predictions.

  • gemini-1.5-flash-001
  • gemini-1.5-pro-001
  • gemini-1.0-pro-002
  • gemini-1.0-pro-001

Prepare your inputs

Batch requests for multimodal models only accept BigQuery storage sources. To learn more, see Overview of BigQuery storage. Store your input in a BigQuery table with a JSON column called request.

  • The content in the request column must be valid JSON.
  • The content in the JSON instructions must match the structure of a GenerateContentRequest.
  • Your input table can have columns other than Request. They are ignored for content generation but included in the output table. The system reserves two column names for output: Response and Status. These are used to provide information about the outcome of the batch prediction job.
  • Batch prediction doesn't support the fileData field for Gemini.

BigQuery input example

request
{
  "contents": [
    {
      "role": "user",
      "parts": {
        "text": "Give me a recipe for banana bread."
      }
    }
  ],
  "system_instruction": {
    "parts": [
      {
        "text": "You are a chef."
      }
    ]
  }
}

Request a batch response

Depending on the number of input items that you submitted, a batch generation task can take some time to complete.

REST

To test a multimodal prompt by using the Vertex AI API, send a POST request to the publisher model endpoint.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: The name of your Google Cloud project.
  • BP_JOB_NAME: A name you choose for your job.
  • INPUT_URI: The input source URI. This is a BigQuery table URI in the form bq://PROJECT_ID.DATASET.TABLE.
  • OUTPUT_URI: The BigQuery URI of the target output table, in the form bq://PROJECT_ID.DATASET.TABLE. If the table doesn't already exist, then it is created for you.

HTTP method and URL:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs

Request JSON body:

{
    "displayName": "BP_JOB_NAME",
    "model": "publishers/google/models/gemini-1.0-pro-002",
    "inputConfig": {
      "instancesFormat":"bigquery",
      "bigquerySource":{
        "inputUri" : "INPUT_URI"
      }
    },
    "outputConfig": {
      "predictionsFormat":"bigquery",
      "bigqueryDestination":{
        "outputUri": "OUTPUT_URI"
        }
    }
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{
  "name": "projects/{PROJECT_ID}/locations/us-central1/batchPredictionJobs/{BATCH_JOB_ID}",
  "displayName": "My first batch prediction",
  "model": "projects/{PROJECT_ID}/locations/us-central1/models/gemini-1.0-pro-002",
  "inputConfig": {
    "instancesFormat": "bigquery",
    "bigquerySource": {
      "inputUri": "bq://{PROJECT_ID}.mydataset.batch_predictions_input"
    }
  },
  "modelParameters": {},
  "outputConfig": {
    "predictionsFormat": "bigquery",
    "bigqueryDestination": {
      "outputUri": "bq://{PROJECT_ID}.mydataset.batch_predictions_output"
    }
  },
  "state": "JOB_STATE_PENDING",
  "createTime": "2023-07-12T20:46:52.148717Z",
  "updateTime": "2023-07-12T20:46:52.148717Z",
  "modelVersionId": "1"
}

The response includes a unique identifier for the batch job. You can poll for the status of the batch job using the BATCH_JOB_ID until the job state is JOB_STATE_SUCCEEDED. For example:

curl \
  -X GET \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs/BATCH_JOB_ID

Retrieve batch output

When a batch prediction task completes, the output is stored in the BigQuery table that you specified in your request.

BigQuery output example

request response status
'{"content":[{...}]}'
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "In a medium bowl, whisk together the flour, baking soda, baking powder."
          }
        ]
      },
      "finishReason": "STOP",
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.14057204,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.14270912
        }
      ]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 8,
    "candidatesTokenCount": 396,
    "totalTokenCount": 404
  }
}

What's next