Get batch predictions for Gemini

Batch predictions are a way to efficiently send multiple multimodal prompt requests that are non-latency sensitive. Unlike online prediction, where you are limited to one input request at a time, you can send a large number of multimodal requests in a single batch request. A batch prediction workflow consists of determining your output location, adding your input requests (in JSON), and your responses asynchronously populate in your BigQuery storage output location.

After you submit a batch request to a model and review its results, you can fine-tune the model to return more precise results. You can submit your fine-tuned model for batch generations as usual. To learn more about tuning models, see Overview of model tuning for Gemini.

Multimodal models that support batch predictions

The following multimodal models support batch predictions.

  • gemini-1.5-flash-001
  • gemini-1.5-pro-001
  • gemini-1.0-pro-002
  • gemini-1.0-pro-001

Prepare your inputs

Batch requests for multimodal models only accept BigQuery storage sources. To learn more, see Overview of BigQuery storage.

BigQuery input format details

  • The content in the Request column must be valid JSON.
  • The content in the JSON instructions must match the structure of a GenerateContentRequest.
  • Information about models or endpoints included in the request is ignored.
  • You can add more columns to the table. Added columns are ignored for content generation. After the job completes, the extra columns are attached to the results.
  • The system reserves two column names: Response and Status. These are used to provide information about the outcome of the model request job.
  • Batch prediction doesn't support the fileData field for Gemini.

BigQuery input example

  "contents": [
      "role": "user",
      "parts": {
        "text": "Give me a recipe for banana bread."
  "system_instruction": {
    "parts": [
        "text": "You are a chef."
  "generation_config": {
    "top_k": 5

BigQuery output example

request response status
  "candidates": [
      "content": {
        "role": "model",
        "parts": [
            "text": "In a medium bowl, whisk together the flour, baking soda, baking powder."
      "finishReason": "STOP",
      "safetyRatings": [
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.14057204,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.14270912
  "usageMetadata": {
    "promptTokenCount": 8,
    "candidatesTokenCount": 396,
    "totalTokenCount": 404

Request a batch response

Depending on the number of input items that you submitted, a batch generation task can take some time to complete.


To test a multimodal prompt by using the Vertex AI API, send a POST request to the publisher model endpoint.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: The name of your Google Cloud project.
  • BP_JOB_NAME: The job name.
  • INPUT_URI: The input source URI. This is either a BigQuery table URI or a JSONL file URI in Cloud Storage.
  • OUTPUT_URI: Output target URI.

HTTP method and URL:


Request JSON body:

    "name": "BP_JOB_NAME",
    "displayName": "BP_JOB_NAME",
    "model": "publishers/google/models/gemini-1.0-pro-001",
    "inputConfig": {
        "inputUri" : "INPUT_URI"
    "outputConfig": {
        "outputUri": "OUTPUT_URI"

To send your request, choose one of these options:


Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \


Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

  "name": "projects/{PROJECT_ID}/locations/us-central1/batchPredictionJobs/{BATCH_JOB_ID}",
  "displayName": "BP_sample_publisher_BQ_20230712_134650",
  "model": "projects/{PROJECT_ID}/locations/us-central1/models/gemini-1.0-pro-001",
  "inputConfig": {
    "instancesFormat": "bigquery",
    "bigquerySource": {
      "inputUri": "bq://sample.text_input"
  "modelParameters": {},
  "outputConfig": {
    "predictionsFormat": "bigquery",
    "bigqueryDestination": {
      "outputUri": "bq://sample.llm_dataset.embedding_out_BP_sample_publisher_BQ_20230712_134650"
  "state": "JOB_STATE_PENDING",
  "createTime": "2023-07-12T20:46:52.148717Z",
  "updateTime": "2023-07-12T20:46:52.148717Z",
  "labels": {
    "owner": "sample_owner",
    "product": "llm"
  "modelVersionId": "1",
  "modelMonitoringStatus": {}

The response includes a unique identifier for the batch job. You can poll for the status of the batch job using the BATCH_JOB_ID until the job state is JOB_STATE_SUCCEEDED. For example:

curl \
  -X GET \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \

Retrieve batch output

When a batch prediction task completes, the output is stored in the BigQuery table that you specified in your request.

