Batch predictions are a way to efficiently send multiple multimodal prompts that are not latency sensitive. Unlike online prediction, where you are limited to one input prompt at a time, you can send a large number of multimodal prompts in a single batch request. Then, your responses asynchronously populate in your BigQuery storage output location.
Batch requests for Gemini models are discounted 50% from standard requests. To learn more, see the Pricing page.
Multimodal models that support batch predictions
The following multimodal models support batch predictions.
gemini-1.5-flash-002
gemini-1.5-flash-001
gemini-1.5-pro-002
gemini-1.5-pro-001
gemini-1.0-pro-002
gemini-1.0-pro-001
Prepare your inputs
Batch requests for multimodal models accept BigQuery storage sources and Cloud Storage sources.
BigQuery storage input
- The content in the
request
column must be valid JSON. This JSON data represents your input for the model. - The content in the JSON instructions must match the structure of a
GenerateContentRequest
. - Your input table can have columns other than
request
. They are ignored for content generation but included in the output table. The system reserves two column names for output:response
andstatus
. These are used to provide information about the outcome of the batch prediction job. - Batch prediction only supports public YouTube and Cloud Storage bucket URIs in the
fileData
field for Gemini. - The
fileData
support is limited to certain Gemini models.
Example input (JSON) |
---|
|
Cloud Storage input
- File format: JSON Lines (JSONL)
- Located in
us-central1
- Appropriate read permissions for the service account
- Batch prediction only supports public YouTube and Cloud Storage bucket URIs in the
fileData
field for Gemini. - The
fileData
support is limited to certain Gemini models.Example input (JSONL) {"request":{"contents": [{"role": "user", "parts": [{"text": "What is the relation between the following video and image samples?"}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/video/animals.mp4", "mime_type": "video/mp4"}}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/image/cricket.jpeg", "mime_type": "image/jpeg"}}]}]}} {"request":{"contents": [{"role": "user", "parts": [{"text": "Describe what is happening in this video."}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/video/another_video.mov", "mime_type": "video/mov"}}]}]}}
Request a batch response
Depending on the number of input items that you submitted, a batch generation task can take some time to complete.
REST
To test a multimodal prompt by using the Vertex AI API, send a POST request to the publisher model endpoint.
Before using any of the request data, make the following replacements:
PROJECT_ID
: The name of your Google Cloud project.BP_JOB_NAME
: A name you choose for your job.INPUT_URI
: The input source URI. This is a BigQuery table URI in the formbq://PROJECT_ID.DATASET.TABLE
. Or your Cloud Storage bucket URI.INPUT_SOURCE
: The input source type. Options arebigquerySource
andgcsSource
.INSTANCES_FORMAT
: Input instances format - can be `jsonl` or `bigquery`.OUTPUT_URI
: The URI of the output or target output table, in the formbq://PROJECT_ID.DATASET.TABLE
. If the table doesn't already exist, then it is created for you.
HTTP method and URL:
POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs
Request JSON body:
{ "displayName": "BP_JOB_NAME", "model": "publishers/google/models/gemini-1.0-pro-002", "inputConfig": { "instancesFormat":"INSTANCES_FORMAT", "inputSource":{ INPUT_SOURCE "inputUri" : "INPUT_URI" } }, "outputConfig": { "predictionsFormat":"bigquery", "bigqueryDestination":{ "outputUri": "OUTPUT_URI" } } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/{PROJECT_ID}/locations/us-central1/batchPredictionJobs/{BATCH_JOB_ID}", "displayName": "My first batch prediction", "model": "projects/{PROJECT_ID}/locations/us-central1/models/gemini-1.0-pro-002", "inputConfig": { "instancesFormat": "bigquery", "bigquerySource": { "inputUri": "bq://{PROJECT_ID}.mydataset.batch_predictions_input" } }, "modelParameters": {}, "outputConfig": { "predictionsFormat": "bigquery", "bigqueryDestination": { "outputUri": "bq://{PROJECT_ID}.mydataset.batch_predictions_output" } }, "state": "JOB_STATE_PENDING", "createTime": "2023-07-12T20:46:52.148717Z", "updateTime": "2023-07-12T20:46:52.148717Z", "modelVersionId": "1" }
The response includes a unique identifier for the batch job.
You can poll for the status of the batch job using
the BATCH_JOB_ID until the job state
is
JOB_STATE_SUCCEEDED
. For example:
curl \ -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs/BATCH_JOB_ID
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Retrieve batch output
When a batch prediction task completes, the output is stored in the BigQuery table that you specified in your request.
BigQuery output example
request | response | status |
---|---|---|
'{"content":[{...}]}' | { "candidates": [ { "content": { "role": "model", "parts": [ { "text": "In a medium bowl, whisk together the flour, baking soda, baking powder." } ] }, "finishReason": "STOP", "safetyRatings": [ { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "probability": "NEGLIGIBLE", "probabilityScore": 0.14057204, "severity": "HARM_SEVERITY_NEGLIGIBLE", "severityScore": 0.14270912 } ] } ], "usageMetadata": { "promptTokenCount": 8, "candidatesTokenCount": 396, "totalTokenCount": 404 } } |
Cloud Storage output example
PROJECT_ID=[PROJECT ID]
REGION="us-central1"
MODEL_URI="publishers/google/models/gemini-1.0-pro-001@default"
INPUT_URI="[GCS INPUT URI]"
OUTPUT_URI="[OUTPUT URI]"
# Setting variables based on parameters
ENDPOINT="${REGION}-autopush-aiplatform.sandbox.googleapis.com"
API_VERSION=v1
ENV=autopush
BP_JOB_NAME="BP_testing_`date +%Y%m%d_%H%M%S`"
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${ENDPOINT}/${API_VERSION}/projects/${PROJECT_ID}/locations/${REGION}/batchPredictionJobs \
-d '{
"name": "'${BP_JOB_NAME}'",
"displayName": "'${BP_JOB_NAME}'",
"model": "'${MODEL_URI}'",
"inputConfig": {
"instancesFormat":"jsonl",
"gcsSource":{
"uris" : "'${INPUT_URI}'"
}
},
"outputConfig": {
"predictionsFormat":"jsonl",
"gcsDestination":{
"outputUriPrefix": "'${OUTPUT_URI}'"
}
},
"labels": {"stage": "'${ENV}'"},
}'
What's next
- Learn how to tune a Gemini model in Overview of model tuning for Gemini
- Learn more about the Batch prediction API.