You can stream your Claude responses to reduce the end-user latency perception. A streamed response uses server-sent events (SSE) to incrementally stream the response.
You pay for Claude models as you use them (pay as you go), or you pay a fixed fee when using provision throughput. For pay-as-you-go pricing, see Anthropic's Claude models on the Vertex AI pricing page.
Available Claude models
The following models are available from Anthropic to use in Vertex AI. To access a Claude model, go to its Model Garden model card.
Claude 3.5 Sonnet
Anthropic's Claude 3.5 Sonnet is Anthropic's most powerful AI model and maintains the speed and cost of Claude 3 Sonnet, which is a mid-tier model. Claude 3.5 Sonnet demonstrates what's possible with generative AI. Claude 3.5 Sonnet is optimized for the following use cases:
Coding, such as writing, editing, and running code with sophisticated reasoning and troubleshooting capabilities.
Handle complex queries from customer support by understanding user context and orchestrating multi-step workflows.
Data science and analysis by navigating unstructured data and leveraging multiple tools to generate insights.
Visual processing, such as interpreting charts and graphs that require visual understanding.
Writing content with a more natural, human-like tone.
Go to the Claude 3.5 Sonnet model card
Claude 3 Opus
Anthropic's Claude 3 Opus is Anthropic's second-most powerful AI model, with strong performance on highly complex tasks. It can navigate open-ended prompts and sight-unseen scenarios with remarkable fluency and human-like understanding. Claude 3 Opus is optimized for the following use cases:
Task automation, such as interactive coding and planning, or running complex actions across APIs and databases.
Research and development tasks, such as research review, brainstorming and hypothesis generation, and product testing.
Strategy tasks, such as advanced analysis of charts and graphs, financials and market trends, and forecasting.
Vision tasks, such as processing images to return text output. Also, analysis of charts, graphs, technical diagrams, reports, and other visual content.
Go to the Claude 3 Opus model card
Claude 3 Haiku
Anthropic's Claude 3 Haiku is Anthropic's fastest, most compact vision and text model for near-instant responses to simple queries, meant for seamless AI experiences mimicking human interactions. Claude 3 Haiku is optimized for the following use cases:
Live customer interactions and translations.
Content moderation to catch suspicious behavior or customer requests.
Cost-saving tasks, such as inventory management and knowledge extraction from unstructured data.
Vision tasks, such as processing images to return text output, analysis of charts, graphs, technical diagrams, reports, and other visual content.
Go to the Claude 3 Haiku model card
Claude 3 Sonnet
Anthropic's Claude 3 Sonnet is Anthropic's dependable combination of skills and speed.It is engineered to be dependable for scaled AI deployments across a variety of use cases. Claude 3 Sonnet is optimized for the following use cases:
Data processing, including retrieval-augmented generation (RAG) and search retrieval.
Sales tasks, such as product recommendations, forecasting, and targeted marketing.
Time-saving tasks, such as code generation, quality control, and optical character recognition (OCR) in images.
Vision tasks, such as processing images to return text output. Also, analysis of charts, graphs, technical diagrams, reports, and other visual content.
Go to the Claude 3 Sonnet model card
Use Claude models
You can use Anthropic's SDK or curl commands to send requests to the Vertex AI endpoint using the following model names:
- For Claude 3.5 Sonnet, use
claude-3-5-sonnet@20240620
. - For Claude 3 Opus, use
claude-3-opus@20240229
. - For Claude 3 Haiku, use
claude-3-haiku@20240307
. - For Claude 3 Sonnet, use
claude-3-sonnet@20240229
.
We recommend using the Anthropic's Claude model versions that include a suffix
that starts with an @
symbol (such as claude-3-5-sonnet@20240620
or
claude-3-haiku@20240307
) because of the possible differences between model
versions. If you don't specify a model version, the latest version is always
used, which can inadvertently affect your workflows when a model version
changes.
Before you begin
To use Anthropic's Claude models with Vertex AI, you must perform the
following steps. The Vertex AI API (aiplatform.googleapis.com
) must
be enabled to use Vertex AI. If you already have an existing project with
the Vertex AI API enabled, you can use that project instead of creating
a new project.
Make sure you have the required permissions to enable and use partner models. For more information, see Grant the required permissions.
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Vertex AI API.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Vertex AI API.
- Go to one of the following Model Garden model cards, then click enable:
Use Anthropic's SDK
You can make API requests to Anthropic's Claude models using the Anthropic Claude SDK. To learn more, see the following:
- Claude messages API reference
- Anthropic's Python API library
- Anthropic's Vertex AI TypeScript API Library
Make a streaming call to a Claude model using Anthropic's Vertex SDK
The following code sample uses Anthropic's Vertex SDK to perform a streaming call to a Claude model.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Make a unary call to a Claude model using Anthropic's Vertex SDK
The following code sample uses Anthropic's Vertex SDK to perform a unary call to a Claude model.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Use a curl command
You can use a curl command to make a request to the Vertex AI endpoint. The curl command specifies which supported Claude model you want to use.
We recommend using the Anthropic's Claude model versions that include a suffix
that starts with an @
symbol (such as claude-3-5-sonnet@20240620
or
claude-3-haiku@20240307
) because of the possible differences between model
versions. If you don't specify a model version, the latest version is always
used, which can inadvertently affect your workflows when a model version
changes.
The following topic shows you how to create a curl command and includes a sample curl command.
REST
To test a text prompt by using the Vertex AI API, send a POST request to the publisher model endpoint.
Before using any of the request data, make the following replacements:
- LOCATION: A region that supports Anthropic Claude
models.
Claude 3.5 Sonnet is available in the following regions:us-east5 (Ohio)
asia-southeast1 (Singapore)
europe-west1 (Belgium)
us-east5 (Ohio)
us-east5 (Ohio)
asia-southeast1 (Singapore)
europe-west1 (Belgium)
us-east5 (Ohio)
- MODEL: The model name you want to use.
- ROLE: The role associated with a
message. You can specify a
user
or anassistant
. The first message must use theuser
role. Claude models operate with alternatinguser
andassistant
turns. If the final message uses theassistant
role, then the response content continues immediately from the content in that message. You can use this to constrain part of the model's response. - STREAM: A boolean that specifies whether the response
is streamed or not. Stream your response to reduce the end-use latency perception. Set to
true
to stream the response andfalse
to return the response all at once. - CONTENT: The content, such as text, of the
user
orassistant
message. - MAX_OUTPUT_TOKENS:
Maximum number of tokens that can be generated in the response. A token is
approximately 3.5 characters. 100 tokens correspond to roughly 60-80 words.
Specify a lower value for shorter responses and a higher value for potentially longer responses.
- TOP_P (Optional):
Top-P changes how the model selects tokens for output. Tokens are selected
from the most (see top-K) to least probable until the sum of their probabilities
equals the top-P value. For example, if tokens A, B, and C have a probability of
0.3, 0.2, and 0.1 and the top-P value is
0.5
, then the model will select either A or B as the next token by using temperature and excludes C as a candidate.Specify a lower value for less random responses and a higher value for more random responses.
- TOP_K(Optional):
Top-K changes how the model selects tokens for output. A top-K of
1
means the next selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding), while a top-K of3
means that the next token is selected from among the three most probable tokens by using temperature.For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling.
Specify a lower value for less random responses and a higher value for more random responses.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:streamRawPredict
Request JSON body:
{ "anthropic_version": "vertex-2023-10-16", "messages": [ { "role": "ROLE", "content": "CONTENT" }], "max_tokens": MAX_TOKENS, "stream": STREAM }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:streamRawPredict"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:streamRawPredict" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
Example curl command
MODEL_ID="MODEL"
LOCATION="us-central1"
PROJECT_ID="PROJECT_ID"
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/${LOCATION}/publishers/anthropic/models/${MODEL_ID}:streamRawPredict -d \
'{
"anthropic_version": "vertex-2023-10-16",
"messages": [{
"role": "user",
"content": "Hello!"
}],
"max_tokens": 50,
"stream": true}'
Tool use (function calling)
Anthropic's Claude models support tools and function calling to enhance a model's capabilities. For more information, see the Tool use overview in the Anthropic documentation.
The following samples demonstrate how to use tools by using Anthropic's SDK or curl command. The samples search for nearby restaurants in San Francisco that are currently open.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
REST
Before using any of the request data, make the following replacements:
- LOCATION: A region that supports Anthropic Claude
models.
Claude 3.5 Sonnet is available in the following regions:us-east5 (Ohio)
asia-southeast1 (Singapore)
europe-west1 (Belgium)
us-east5 (Ohio)
us-east5 (Ohio)
asia-southeast1 (Singapore)
europe-west1 (Belgium)
us-east5 (Ohio)
- MODEL: The model name you want to use.
- For Claude 3 Opus, use
claude-3-opus@20240229
. - For Claude 3 Sonnet, use
claude-3-sonnet@20240229
. - For Claude 3 Haiku, use
claude-3-haiku@20240307
.
- For Claude 3 Opus, use
- ROLE: The role associated with a
message. You can specify a
user
or anassistant
. The first message must use theuser
role. Claude models operate with alternatinguser
andassistant
turns. If the final message uses theassistant
role, then the response content continues immediately from the content in that message. You can use this to constrain part of the model's response. - STREAM: A boolean that specifies
whether the response is streamed or not. Stream your response to reduce the
end-use latency perception. Set to
true
to stream the response andfalse
to return the response all at once. - CONTENT: The content, such as
text, of the
user
orassistant
message. - MAX_OUTPUT_TOKENS:
Maximum number of tokens that can be generated in the response. A token is
approximately 3.5 characters. 100 tokens correspond to roughly 60-80 words.
Specify a lower value for shorter responses and a higher value for potentially longer responses.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:rawPredict
Request JSON body:
{ "anthropic_version": "vertex-2023-10-16", "max_tokens": MAX_TOKENS, "stream": STREAM, "tools": [ { "name": "text_search_places_api", "description": "Returns information about a set of places based on a string", "input_schema": { "type": "object", "properties": { "textQuery": { "type": "string", "description": "The text string on which to search" }, "priceLevels": { "type": "array", "description": "Price levels to query places, value can be one of [PRICE_LEVEL_INEXPENSIVE, PRICE_LEVEL_MODERATE, PRICE_LEVEL_EXPENSIVE, PRICE_LEVEL_VERY_EXPENSIVE]", }, "openNow": { "type": "boolean", "description": "Describes whether a place is open for business at the time of the query." }, }, "required": ["textQuery"] } } ], "messages": [ { "role": "user", "content": "What are some affordable and good Italian restaurants that are open now in San Francisco??" } ] }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:rawPredict"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:rawPredict" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
Anthropic's Claude region availability
Claude 3.5 Sonnet is available in the following regions:
us-east5 (Ohio)
asia-southeast1 (Singapore)
europe-west1 (Belgium)
us-east5 (Ohio)
us-east5 (Ohio)
asia-southeast1 (Singapore)
europe-west1 (Belgium)
us-east5 (Ohio)
Anthropic's Claude quotas and supported context length
For Claude models, a quota applies for each region where the model is available. The quota is specified in queries per minute (QPM) and tokens per minute (TPM). TPM includes both input and output tokens.
The default quota limit and supported context length for Claude 3.5 Sonnet are:
Region | Quota system | Supported context length |
---|---|---|
us-east5 (Ohio) |
Supports dynamic shared quota | 200,000 tokens |
asia-southeast1 (Singapore) |
Supports dynamic shared quota | 200,000 tokens |
europe-west1 (Belgium) |
Supports dynamic shared quota | 200,000 tokens |
The default quota limit and supported context length for Claude 3 Opus are:
Region | Default quota limit | Supported context length |
---|---|---|
us-east5 (Ohio) |
Supports dynamic shared quota | 200,000 tokens |
The default quota limit and supported context length for Claude 3 Haiku are:
Region | Default quota limit | Supported context length |
---|---|---|
us-east5 (Ohio) |
Supports dynamic shared quota | 200,000 tokens |
asia-southeast1 (Singapore) |
Supports dynamic shared quota | 200,000 tokens |
europe-west1 (Belgium) |
Supports dynamic shared quota | 200,000 tokens |
The default quota limit and supported context length for Claude 3 Sonnet are:
Region | Default quota limit | Supported context length |
---|---|---|
us-east5 (Ohio) |
Supports dynamic shared quota | 200,000 tokens |
If you want to increase any of your quotas for Generative AI on Vertex AI, you can use the Google Cloud console to request a quota increase. To learn more about quotas, see Work with quotas.