Batch predictions with Anthropic Claude models

Batch predictions lets you send multiple prompts that aren't latency sensitive to an Anthropic Claude model. Compared to online predictions, where you send one input prompt for each request, you can batch a large number of input prompts in a single request.

Supported Anthropic Claude models

Vertex AI supports batch predictions for the following Anthropic Claude models:

Claude 3.7 Sonnet (Preview): claude-3-7-sonnet@20250219.
Claude 3.5 Sonnet v2 (claude-3-5-sonnet-v2@20241022)
Claude 3.5 Haiku (claude-3-5-haiku@20241022)

Quotas

By default, the number of concurrent batch requests that you can make in a single project is 4.

Prepare input

Before you begin, prepare your input dataset in a BigQuery table or as a JSONL file in Cloud Storage. The input for both sources must follow the Anthropic Claude API Schema JSON format, as shown in the following example:

{
  "custom_id": "request-1",
  "request":  {
    "messages": [{"role": "user", "content": "Hello!"}],
    "anthropic_version": "vertex-2023-10-16",
    "max_tokens": 50
  }
}

BigQuery

Your BigQuery input table must adhere to the following schema:

Column name	Description
custom_id	An ID for each request to match the input with the output.
request	The request body, which is your input prompt and must follow the Anthropic Claude API Schema

Your input table can have other columns, which are ignored by the batch job and passed directly to the output table.
Batch prediction jobs reserve two column names for the batch prediction output: response(JSON) and status. Don't use these columns in the input table.

Cloud Storage

For Cloud Storage, the input file must be a JSONL file that is located in a Cloud Storage bucket.

Request a batch prediction

Make a batch prediction against a Claude model by using input from BigQuery or Cloud Storage. You can independently choose to output predictions to either a BigQuery table or a JSONL file in a Cloud Storage bucket.

BigQuery

Specify your BigQuery input table, model, and output location. The batch prediction job and your table must be in the same region.

REST

Before using any of the request data, make the following replacements:

LOCATION: A region that supports the selected Anthropic Claude model (see Claude Regions).
PROJECT_ID: Your project ID.
MODEL: The name of the model.
INPUT_URI: The BigQuery table where your batch prediction input is located such as bq://myproject.mydataset.input_table.
OUTPUT_FORMAT: To output to a BigQuery table, specify bigquery. To output to a Cloud Storage bucket, specify jsonl.
DESTINATION: For BigQuery, specify bigqueryDestination. For Cloud Storage, specify gcsDestination.
OUTPUT_URI_FIELD_NAME: For BigQuery, specify outputUri. For Cloud Storage, specify outputUriPrefix.
OUTPUT_URI: For BigQuery, specify the table location such as bq://myproject.mydataset.output_result. For Cloud Storage, specify the bucket and folder location such as gs://mybucket/path/to/outputfile.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs

Request JSON body:

'{
  "displayName": "JOB_NAME",
  "model": "publishers/anthropic/models/MODEL",
  "inputConfig": {
    "instancesFormat":"bigquery",
    "bigquerySource":{
      "inputUri" : "INPUT_URI"
    }
  },
  "outputConfig": {
    "predictionsFormat":"OUTPUT_FORMAT",
    "DESTINATION":{
      "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI"
    }
  }
}'

To send your request, choose one of these options:

curl

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login , or by using Cloud Shell, which automatically logs you into the gcloud CLI . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs"

PowerShell

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Response

{
"name":
  "projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/BATCH_JOB_ID",
  "displayName": "JOB_NAME",
  "model": "publishers/anthropic/models/MODEL",
  "inputConfig": {
    "instancesFormat":"bigquery",
    "bigquerySource":{
      "inputUri" : "INPUT_URI"
    }
  },
  "outputConfig": {
    "predictionsFormat":"OUTPUT_FORMAT",
    "DESTINATION":{
      "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI"
    }
  },
  "state": "JOB_STATE_PENDING",
  "createTime": "2024-10-16T19:33:59.153782Z",
  "updateTime": "2024-10-16T19:33:59.153782Z",
  "modelVersionId": "1"
}

Cloud Storage

Specify your JSONL file's Cloud Storage location, model, and output location.

REST

Before using any of the request data, make the following replacements:

LOCATION: A region that supports the selected Anthropic Claude model (see Claude Regions).
PROJECT_ID: Your project ID.
MODEL: The name of the model.
INPUT_URIS: A comma-separated list of the Cloud Storage locations of your JSONL batch prediction input such as gs://bucketname/path/to/jsonl.
OUTPUT_FORMAT: To output to a BigQuery table, specify bigquery. To output to a Cloud Storage bucket, specify jsonl.
DESTINATION: For BigQuery, specify bigqueryDestination. For Cloud Storage, specify gcsDestination.
OUTPUT_URI_FIELD_NAME: For BigQuery, specify outputUri. For Cloud Storage, specify outputUriPrefix.
OUTPUT_URI: For BigQuery, specify the table location such as bq://myproject.mydataset.output_result. For Cloud Storage, specify the bucket and folder location such as gs://mybucket/path/to/outputfile.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs

Request JSON body:

'{
  "displayName": "JOB_NAME",
  "model": "publishers/anthropic/models/MODEL",
  "inputConfig": {
    "instancesFormat":"jsonl",
    "gcsSource":{
      "uris" : "INPUT_URIS"
    }
  },
  "outputConfig": {
    "predictionsFormat":"OUTPUT_FORMAT",
    "DESTINATION":{
      "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI"
    }
  }
}'

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Response

{
"name":
  "projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/BATCH_JOB_ID",
  "displayName": "JOB_NAME",
  "model": "publishers/anthropic/models/MODEL",
  "inputConfig": {
    "instancesFormat": "jsonl",
    "gcsSource": {
      "uris": [
        "INPUT_URIS"
      ]
    }
  },
  "outputConfig": {
    "predictionsFormat":"OUTPUT_FORMAT",
    "DESTINATION":{
      "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI"
    }
  },
  "state": "JOB_STATE_PENDING",
  "createTime": "2024-10-16T19:33:59.153782Z", 
  "updateTime": "2024-10-16T19:33:59.153782Z", 
  "modelVersionId": "1"
}

Get the status of a batch prediction job

Get the status of your batch prediction job to check whether it has completed successfully.

REST

Before using any of the request data, make the following replacements:

PROJECT_ID: Your project ID.
LOCATION: The region where your batch job is located.
JOB_ID: The batch job ID that was returned when you created the job.

HTTP method and URL:

GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/JOB_ID

To send your request, choose one of these options:

curl

Execute the following command:

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/JOB_ID"

PowerShell

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/JOB_ID" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Response

{
"name":
  "projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/BATCH_JOB_ID",
  "displayName": "JOB_NAME",
  "model": "publishers/anthropic/models/MODEL",
  "inputConfig": {
    "instancesFormat":"bigquery",
    "bigquerySource":{
      "inputUri" : "INPUT_URI"
    }
  },
  "outputConfig": {
    "predictionsFormat":"OUTPUT_FORMAT",
    "DESTINATION":{
      "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI"
    }
  },
  "state": "JOB_STATE_SUCCEEDED",
  "createTime": "2024-10-16T19:33:59.153782Z", 
  "updateTime": "2024-10-16T19:33:59.153782Z", 
  "modelVersionId": "1"
}

Retrieve batch prediction output

When a batch prediction job completes, retrieve the output from the location that you specified. For BigQuery, the output is in the response(JSON) column of your destination BigQuery table. For Cloud Storage, the output is saved as a JSONL file in the output Cloud Storage location.

Known issues

Here are the known issues of this feature.

Internal errors for the first batch job in a region.

For first time user of batch prediction in a region, sometimes the Batch prediction job will fail with the following message:

"state": "JOB_STATE_FAILED", "error": { "code": 13, "message": "INTERNAL" }

This issue occurs because the internal service account that runs the batch prediction job is newly created, and hasn't propagated through the provisioning pipeline, causing batch jobs to fail with permission errors internally. Users that encounter this error can retry sending a batch request after ~10 minutes.

Canceled jobs won't return any results.

Due to a known bug, if a job is canceled before it finishes or times out, no results will appear in the output location.