Get batch predictions for Gemini

Batch predictions let you send a large number of multiple multimodal prompts in a single batch request.

For more information about the batch workflow, see Get batch predictions for Gemini.

Supported Models:

Model Version
Gemini 1.5 Flash gemini-1.5-flash-001
Gemini 1.5 Pro gemini-1.5-pro-001
Gemini 1.0 Pro gemini-1.0-pro-001
gemini-1.0-pro-002

Example syntax

Syntax to send a function call API request.

A batch request uses the same data structure as a Generate content request from the Inference API.

curl

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \

https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
-d '{
  "contents": [
    {
      "role": "user",
      "parts": {
        "text": "..."
      }
    }
  ],
  "system_instruction": {
    "parts": [
      {
        ...
      }
    ]
  },
  "generation_config": {
    ...
  }
}'

Parameters

See examples for implementation details.

Body request

Parameters

name

The name of your Google Cloud project.

displayName

The job name.

model

The model used for batch prediction.

inputConfig

The data format. For Gemini batch prediction, BigQuery input is supported.

outputConfig

The output configuration which determines model output location.

inputConfig

Parameters

instancesFormat

The prompt input format. Use bigquery.

bigquerySource.inputUri

The input source URI. This is a BigQuery table URI.

outputConfig

Parameters

predictionsFormat

The output format of the prediction. It must match the input format. Use bigquery.

bigqueryDestination.outputUri

Output target URI.

Examples

Request a batch response

Batch requests for multimodal models only accept BigQuery storage sources. To learn more, see the following:

Depending on the number of input items that you submitted, a batch generation task can take some time to complete.

REST

To test a multimodal prompt by using the Vertex AI API, send a POST request to the publisher model endpoint.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: The name of your Google Cloud project.
  • BP_JOB_NAME: The job name.
  • INPUT_URI: The input source URI. This is either a BigQuery table URI or a JSONL file URI in Cloud Storage.
  • OUTPUT_URI: Output target URI.

HTTP method and URL:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs

Request JSON body:

{
    "name": "BP_JOB_NAME",
    "displayName": "BP_JOB_NAME",
    "model": "publishers/google/models/gemini-1.0-pro-001",
    "inputConfig": {
      "instancesFormat":"bigquery",
      "bigquerySource":{
        "inputUri" : "INPUT_URI"
      }
    },
    "outputConfig": {
      "predictionsFormat":"bigquery",
      "bigqueryDestination":{
        "outputUri": "OUTPUT_URI"
        }
    }
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{
  "name": "projects/{PROJECT_ID}/locations/us-central1/batchPredictionJobs/{BATCH_JOB_ID}",
  "displayName": "BP_sample_publisher_BQ_20230712_134650",
  "model": "projects/{PROJECT_ID}/locations/us-central1/models/gemini-1.0-pro-001",
  "inputConfig": {
    "instancesFormat": "bigquery",
    "bigquerySource": {
      "inputUri": "bq://sample.text_input"
    }
  },
  "modelParameters": {},
  "outputConfig": {
    "predictionsFormat": "bigquery",
    "bigqueryDestination": {
      "outputUri": "bq://sample.llm_dataset.embedding_out_BP_sample_publisher_BQ_20230712_134650"
    }
  },
  "state": "JOB_STATE_PENDING",
  "createTime": "2023-07-12T20:46:52.148717Z",
  "updateTime": "2023-07-12T20:46:52.148717Z",
  "labels": {
    "owner": "sample_owner",
    "product": "llm"
  },
  "modelVersionId": "1",
  "modelMonitoringStatus": {}
}

The response includes a unique identifier for the batch job. You can poll for the status of the batch job using the BATCH_JOB_ID until the job state is JOB_STATE_SUCCEEDED. For example:

curl \
  -X GET \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs/BATCH_JOB_ID

Retrieve batch output

When a batch prediction task completes, the output is stored in the BigQuery table that you specified in your request.

What's next

Learn how to tune a Gemini model in Overview of model tuning for Gemini.