Getting responses in a batch is a way to efficiently send large numbers of non-latency
sensitive embeddings requests. Different from getting online responses,
where you are limited to one input request at a time, you can send a large number
of LLM requests in a single batch request. Similar to how batch prediction is done
for tabular data in Vertex AI,
you determine your output location, add your input, and your responses asynchronously
populate into your output location. All stable versions of text embedding models support batch predictions. Stable
versions are versions which are no longer in preview and are fully supported for
production environments. To see the full list of supported embedding models, see
Embedding model and versions. The input for batch requests are a list of prompts that can either be stored in
a BigQuery table or as a
JSON Lines (JSONL) file in
Cloud Storage. Each request can include up to 30,000 prompts. This section shows examples of how to format JSONL input and output. This section shows examples of how to format BigQuery input and output. This example shows a single column BigQuery table. Depending on the number of input items that you've submitted, a
batch generation task can take some time to complete. To test a text prompt by using the Vertex AI API, send a POST request to the
publisher model endpoint.
Before using any of the request data,
make the following replacements:
HTTP method and URL:
Request JSON body:
To send your request, choose one of these options:
Save the request body in a file named
Save the request body in a file named You should receive a JSON response similar to the following: The response includes a unique identifier for the batch job.
You can poll for the status of the batch job using
the BATCH_JOB_ID until the job
To learn more, see the
SDK reference documentation.
Set environment variables to use the Gen AI SDK with Vertex AI:
Text embeddings models that support batch predictions
Prepare your inputs
JSONL example
JSONL input example
{"content":"Give a short description of a machine learning model:"}
{"content":"Best recipe for banana bread:"}
JSONL output example
{"instance":{"content":"Give..."},"predictions": [{"embeddings":{"statistics":{"token_count":8,"truncated":false},"values":[0.2,....]}}],"status":""}
{"instance":{"content":"Best..."},"predictions": [{"embeddings":{"statistics":{"token_count":3,"truncated":false},"values":[0.1,....]}}],"status":""}
BigQuery example
BigQuery input example
content
"Give a short description of a machine learning model:"
"Best recipe for banana bread:"
BigQuery output example
content
predictions
status
"Give a short description of a machine learning model:"
'[{"embeddings":
{ "statistics":{"token_count":8,"truncated":false},
"Values":[0.1,....]
}
}
]'
"Best recipe for banana bread:"
'[{"embeddings":
{ "statistics":{"token_count":3,"truncated":false},
"Values":[0.2,....]
}
}
]'
Request a batch response
REST
POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs
{
"name": "BP_JOB_NAME",
"displayName": "BP_JOB_NAME",
"model": "publishers/google/models/textembedding-gecko",
"inputConfig": {
"instancesFormat":"bigquery",
"bigquerySource":{
"inputUri" : "INPUT_URI"
}
},
"outputConfig": {
"predictionsFormat":"bigquery",
"bigqueryDestination":{
"outputUri": "OUTPUT_URI"
}
}
}
curl
request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs"PowerShell
request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs" | Select-Object -Expand Content
{
"name": "projects/123456789012/locations/us-central1/batchPredictionJobs/1234567890123456789",
"displayName": "BP_sample_publisher_BQ_20230712_134650",
"model": "projects/{PROJECT_ID}/locations/us-central1/models/textembedding-gecko",
"inputConfig": {
"instancesFormat": "bigquery",
"bigquerySource": {
"inputUri": "bq://project_name.dataset_name.text_input"
}
},
"modelParameters": {},
"outputConfig": {
"predictionsFormat": "bigquery",
"bigqueryDestination": {
"outputUri": "bq://project_name.llm_dataset.embedding_out_BP_sample_publisher_BQ_20230712_134650"
}
},
"state": "JOB_STATE_PENDING",
"createTime": "2023-07-12T20:46:52.148717Z",
"updateTime": "2023-07-12T20:46:52.148717Z",
"labels": {
"owner": "sample_owner",
"product": "llm"
},
"modelVersionId": "1",
"modelMonitoringStatus": {}
}
state
is
JOB_STATE_SUCCEEDED
. For example:curl \
-X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs/BATCH_JOB_ID
Python
Install
pip install --upgrade google-genai
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True
Retrieve batch output
When a batch prediction task is complete, the output is stored in the Cloud Storage bucket or BigQuery table that you specified in your request.
What's next
- Learn how to get text embeddings.