Batch predictions let you send a large number of multiple multimodal prompts in a single batch request.
For more information about the batch workflow, see Get batch predictions for Gemini.
Supported Models:
Model | Version |
---|---|
Gemini 1.5 Flash | gemini-1.5-flash-001 |
Gemini 1.5 Pro | gemini-1.5-pro-001 |
Gemini 1.0 Pro | gemini-1.0-pro-001 gemini-1.0-pro-002 |
Example syntax
Syntax to send a function call API request.
A batch request uses the same data structure as a Generate content request from the Inference API.
curl
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \ -d '{ "contents": [ { "role": "user", "parts": { "text": "..." } } ], "system_instruction": { "parts": [ { ... } ] }, "generation_config": { ... } }'
Parameters
See examples for implementation details.
Body request
Parameters | |
---|---|
|
The name of your Google Cloud project. |
|
The job name. |
|
The model used for batch prediction. |
|
The data format. For Gemini batch prediction, BigQuery input is supported. |
|
The output configuration which determines model output location. |
inputConfig
Parameters | |
---|---|
|
The prompt input format. Use |
|
The input source URI. This is a BigQuery table URI. |
outputConfig
Parameters | |
---|---|
|
The output format of the prediction. It must match the input format. Use |
|
Output target URI. |
Examples
Request a batch response
Batch requests for multimodal models only accept BigQuery storage sources. To learn more, see the following:
Depending on the number of input items that you submitted, a batch generation task can take some time to complete.
REST
To test a multimodal prompt by using the Vertex AI API, send a POST request to the publisher model endpoint.
Before using any of the request data, make the following replacements:
- PROJECT_ID: The name of your Google Cloud project.
- BP_JOB_NAME: The job name.
- INPUT_URI: The input source URI. This is either a BigQuery table URI or a JSONL file URI in Cloud Storage.
- OUTPUT_URI: Output target URI.
HTTP method and URL:
POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs
Request JSON body:
{ "name": "BP_JOB_NAME", "displayName": "BP_JOB_NAME", "model": "publishers/google/models/gemini-1.0-pro-001", "inputConfig": { "instancesFormat":"bigquery", "bigquerySource":{ "inputUri" : "INPUT_URI" } }, "outputConfig": { "predictionsFormat":"bigquery", "bigqueryDestination":{ "outputUri": "OUTPUT_URI" } } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/{PROJECT_ID}/locations/us-central1/batchPredictionJobs/{BATCH_JOB_ID}", "displayName": "BP_sample_publisher_BQ_20230712_134650", "model": "projects/{PROJECT_ID}/locations/us-central1/models/gemini-1.0-pro-001", "inputConfig": { "instancesFormat": "bigquery", "bigquerySource": { "inputUri": "bq://sample.text_input" } }, "modelParameters": {}, "outputConfig": { "predictionsFormat": "bigquery", "bigqueryDestination": { "outputUri": "bq://sample.llm_dataset.embedding_out_BP_sample_publisher_BQ_20230712_134650" } }, "state": "JOB_STATE_PENDING", "createTime": "2023-07-12T20:46:52.148717Z", "updateTime": "2023-07-12T20:46:52.148717Z", "labels": { "owner": "sample_owner", "product": "llm" }, "modelVersionId": "1", "modelMonitoringStatus": {} }
The response includes a unique identifier for the batch job.
You can poll for the status of the batch job using
the BATCH_JOB_ID until the job state
is
JOB_STATE_SUCCEEDED
. For example:
curl \ -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs/BATCH_JOB_ID
Retrieve batch output
When a batch prediction task completes, the output is stored in the BigQuery table that you specified in your request.
What's next
Learn how to tune a Gemini model in Overview of model tuning for Gemini.