This page describes how to get batch predictions using Cloud Storage. Batch for Gemini models accept one JSON Lines (JSONL) file stored in
Cloud Storage as input data. Each line in the batch input data is a request
to the model, following the same format for Gemini API. For example: Download the sample batch request file Once you've prepared your input data, and uploaded it to Cloud Storage. Make sure
the AI Platform Service
Agent has permission to the
Cloud Storage file. You can create a batch job through the Google Cloud console, the Google Gen AI SDK,
or the REST API. To create a batch prediction job, use the
Before using any of the request data,
make the following replacements:
HTTP method and URL:
Request JSON body:
To send your request, choose one of these options:
Save the request body in a file named
Save the request body in a file named You should receive a JSON response similar to the following.
To learn more, see the
SDK reference documentation.
Set environment variables to use the Gen AI SDK with Vertex AI:
After the job is submitted, you can check the status of your batch job using
API, SDK and Cloud Console To monitor a batch prediction job, use the
Before using any of the request data,
make the following replacements:
HTTP method and URL:
To send your request, choose one of these options:
Execute the following command:
Execute the following command:
You should receive a JSON response similar to the following.
To learn more, see the
SDK reference documentation.
Set environment variables to use the Gen AI SDK with Vertex AI:
The status of the a given batch job can be any of the following: When a batch prediction job completes, the output is stored in the
Cloud Storage bucket that you specified when you created the job. For
succeeded rows, model responses are stored in the During long-running jobs, completed predictions are continuously exported to the
specified output destination. If the batch prediction job is terminated, all
completed rows are exported. You are only charged for completed predictions. Successful example Failed example1. Prepare your inputs
{"request":{"contents": [{"role": "user", "parts": [{"text": "What is the relation between the following video and image samples?"}, {"fileData": {"fileUri": "gs://cloud-samples-data/generative-ai/video/animals.mp4", "mimeType": "video/mp4"}}, {"fileData": {"fileUri": "gs://cloud-samples-data/generative-ai/image/cricket.jpeg", "mimeType": "image/jpeg"}}]}], "generationConfig": {"temperature": 0.9, "topP": 1, "maxOutputTokens": 256}}}
2. Submit a batch job
Console
REST
projects.locations.batchPredictionJobs.create
method.
publishers/google/models/gemini-2.5-flash
; or the tuned endpoint name, for example,
projects/PROJECT_ID/locations/LOCATION/models/MODEL_ID
,
where MODEL_ID is the model ID of the tuned model.
gs://bucketname/path/to/file.jsonl
.jsonl
.
bigqueryDestination
. For
Cloud Storage, specify gcsDestination
.
outputUri
. For
Cloud Storage, specify outputUriPrefix
.
bq://myproject.mydataset.output_result
. The region of the output
BigQuery dataset must be the same as the Vertex AI batch
prediction job.
For Cloud Storage, specify the bucket and directory location such as
gs://mybucket/path/to/output
.
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs
{
"displayName": "my-cloud-storage-batch-prediction-job",
"model": "MODEL_PATH",
"inputConfig": {
"instancesFormat": "jsonl",
"gcsSource": {
"uris" : "INPUT_URI"
}
},
"outputConfig": {
"predictionsFormat": "OUTPUT_FORMAT",
"DESTINATION": {
"OUTPUT_URI_FIELD_NAME": "OUTPUT_URI"
}
}
}
curl
request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs"PowerShell
request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs" | Select-Object -Expand ContentPython
Install
pip install --upgrade google-genai
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True
3. Monitor the job status and progress
Console
REST
projects.locations.batchPredictionJobs.get
method and view the CompletionStats
field in the response.
GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/BATCH_JOB_ID
curl
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/BATCH_JOB_ID"PowerShell
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/BATCH_JOB_ID" | Select-Object -Expand ContentPython
Install
pip install --upgrade google-genai
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True
JOB_STATE_PENDING
: Queue for capacity. The job can be in queue
state up
to 72-hour before entering running
state.JOB_STATE_RUNNING
: The input file was successfully validated and the batch
is currently being run.JOB_STATE_SUCCEEDED
: The batch has been completed and the results are
readyJOB_STATE_FAILED
: the input file has failed the validation process, or
could not be completed within the 24-hour time window after entering
RUNNING
state.JOB_STATE_CANCELLING
: the batch is being cancelledJOB_STATE_CANCELLED
: the batch was cancelled4. Retrieve batch output
response
field. Otherwise,
error details are stored in the status
field for further inspection.Output examples
{
"status": "",
"processed_time": "2024-11-01T18:13:16.826+00:00",
"request": {
"contents": [
{
"parts": [
{
"fileData": null,
"text": "What is the relation between the following video and image samples?"
},
{
"fileData": {
"fileUri": "gs://cloud-samples-data/generative-ai/video/animals.mp4",
"mimeType": "video/mp4"
},
"text": null
},
{
"fileData": {
"fileUri": "gs://cloud-samples-data/generative-ai/image/cricket.jpeg",
"mimeType": "image/jpeg"
},
"text": null
}
],
"role": "user"
}
]
},
"response": {
"candidates": [
{
"avgLogprobs": -0.5782725546095107,
"content": {
"parts": [
{
"text": "This video shows a Google Photos marketing campaign where animals at the Los Angeles Zoo take self-portraits using a modified Google phone housed in a protective case. The image is unrelated."
}
],
"role": "model"
},
"finishReason": "STOP"
}
],
"modelVersion": "gemini-2.0-flash-001@default",
"usageMetadata": {
"candidatesTokenCount": 36,
"promptTokenCount": 29180,
"totalTokenCount": 29216
}
}
}
{
"status": "Bad Request: {\"error\": {\"code\": 400, \"message\": \"Please use a valid role: user, model.\", \"status\": \"INVALID_ARGUMENT\"}}",
"processed_time": "2025-07-09T19:57:43.558+00:00",
"request": {
"contents": [
{
"parts": [
{
"text": "Explain how AI works in a few words"
}
],
"role": "tester"
}
]
},
"response": {}
}
Batch prediction for Cloud Storage
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-18 UTC.