Use the Claude 3 models from Anthropic

Anthropic Claude 3 models on Vertex AI offer fully managed and serverless models as APIs. To use a Claude model on Vertex AI, send a request directly to the Vertex AI API endpoint. Because Anthropic Claude 3 models use a managed API, there's no need to provision or manage infrastructure.

You can stream your Claude responses to reduce the end-user latency perception. A streamed response uses server-sent events (SSE) to incrementally stream the response.

You pay for Claude models as you use them. For more information, see the section about Anthropic Claude models on the Vertex AI pricing page.

Available Anthropic Claude models

The following models are available from Anthropic to use in Vertex AI. To access an Anthropic Claude 3 model, go to its Model Garden model card.

Claude 3 Opus (Preview)

Anthropic Claude 3 Opus (Preview) is the most capable of the Anthropic models at performing complex tasks quickly. It's built to navigate open-ended prompts and new scenarios.

Claude 3 Opus (Preview) is optimized for the following use-cases:

  • Task automation, such as interactive coding and planning, or running complex actions across APIs and databases.

  • Research and development tasks, such as research review, brainstorming and hypothesis generation, and product testing.

  • Strategy tasks, such as advanced analysis of charts and graphs, financials and market trends, and forecasting.

  • Vision tasks, such as processing images to return text output. Also, analysis of charts, graphs, technical diagrams, reports, and other visual content.

Go to the Claude 3 Opus model card

Claude 3 Sonnet

Anthropic Claude 3 Sonnet provides a balance between intelligence and speed for enterprise workloads. It's a high-endurance model for scaled AI that's available at a competitive price. Claude 3 Sonnet is optimized for the following use-cases:

  • Data processing, including retrieval-augmented generation (RAG) and search retrieval.

  • Sales tasks, such as product recommendations, forecasting, and targeted marketing.

  • Time-saving tasks, such as code generation, quality control, and optical character recognition (OCR) in images.

  • Vision tasks, such as processing images to return text output. Also, analysis of charts, graphs, technical diagrams, reports, and other visual content.

Go to the Claude 3 Sonnet model card

Claude 3 Haiku

Anthropic Claude 3 Haiku is the fastest, most compact model available from Anthropic. It is designed to answer queries and requests quickly. You can use it to build AI experiences that mimic human interactions. Claude 3 Haiku is optimized for the following use-cases:

  • Live customer interactions and translations.

  • Content moderation to catch suspicious behavior or customer requests.

  • Cost-saving tasks, such as inventory management and knowledge extraction from unstructured data.

  • Vision tasks, such as processing images to return text output, analysis of charts, graphs, technical diagrams, reports, and other visual content.

Go to the Claude 3 Haiku model card

Use Claude models

You can use an Anthropic SDK or curl commands to send requests to the Vertex AI endpoint using the following model names:

  • For Claude 3 Opus (Preview), use claude-3-opus@20240229.
  • For Claude 3 Sonnet, use claude-3-sonnet@20240229.
  • For Claude 3 Haiku, use claude-3-haiku@20240307.

We don't recommend using the Anthropic Claude 3 model versions that don't include a suffix that starts with an @ symbol (claude-3-opus, claude-3-sonnet, or claude-3-haiku).

Before you begin

To use Anthropic Claude 3 models with Vertex AI, you need to perform the following steps. The Vertex AI API (aiplatform.googleapis.com) must be enabled to use Vertex AI. If you already have an existing project with the Vertex AI API enabled, you can use that project instead of creating a new project.

Make sure you have the required permissions to enable Anthropic Claude 3 models. For more information, see Grant the required permissions.

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Vertex AI API.

    Enable the API

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project.

  7. Enable the Vertex AI API.

    Enable the API

  8. Make sure you have permissions to enable Anthropic Claude models and to send a prompt. For more information, see Set the required permissions to enable Claude models and send prompts.
  9. Go to one of the following Model Garden model cards, then click enable:

Use the Anthropic SDK

You can make API requests to Anthropic Claude models using the Anthropic Claude SDK. To learn more, see the following:

Make a streaming call to a Claude 3 model using the Anthropic Vertex SDK

The following code sample uses the Anthropic Vertex SDK to perform a streaming call to an Anthropic Claude 3 model.

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

# TODO(developer): Vertex AI SDK - uncomment below & run
# pip3 install --upgrade --user google-cloud-aiplatform
# gcloud auth application-default login
# pip3 install -U 'anthropic[vertex]'

from anthropic import AnthropicVertex


def generate_text_streaming(project_id: str, region: str) -> str:
    client = AnthropicVertex(region=region, project_id=project_id)
    result = []

    with client.messages.stream(
        model="claude-3-sonnet@20240229",
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": "Send me a recipe for banana bread.",
            }
        ],
    ) as stream:
        for text in stream.text_stream:
            print(text, end="", flush=True)
            result.append(text)

    return "".join(result)

Make a unary call to a Claude 3 model using the Anthropic Vertex SDK

The following code sample uses the Anthropic Vertex SDK to perform a unary call to an Anthropic Claude 3 model.

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

# TODO(developer): Vertex AI SDK - uncomment below & run
# pip3 install --upgrade --user google-cloud-aiplatform
# gcloud auth application-default login
# pip3 install -U 'anthropic[vertex]'

from anthropic import AnthropicVertex


def generate_text(project_id: str, region: str) -> object:
    client = AnthropicVertex(region=region, project_id=project_id)
    message = client.messages.create(
        model="claude-3-sonnet@20240229",
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": "Send me a recipe for banana bread.",
            }
        ],
    )
    print(message.model_dump_json(indent=2))
    return message

Use a curl command

You can use a curl command to make a request to the Vertex AI endpoint. The curl command specifies which supported Anthropic Claude model you want to use:

  • For Claude 3 Opus (Preview), use claude-3-opus@20240229.
  • For Claude 3 Sonnet, use claude-3-sonnet@20240229.
  • for Claude 3 Haiku, use claude-3-haiku@20240307.

We don't recommend using the Anthropic Claude 3 model versions that don't include a suffix that starts with an @ symbol (claude-3-opus, claude-3-sonnet, or claude-3-haiku).

The following topic shows you how to create a curl command and includes a sample curl command. The sample curl command uses the Claude 3 Sonnet model.

REST

To test a text prompt by using the Vertex AI API, send a POST request to the publisher model endpoint.

Before using any of the request data, make the following replacements:

  • LOCATION: A region that supports Anthropic Claude models.
    Claude 3 Opus is available in the following region:
    • us-east5 (Ohio)
    Claude 3 Sonnet is available in the following regions:
    • us-central1 (Iowa)
    • asia-southeast1 (Singapore)
    Claude 3 Haiku is available in the following regions:
    • us-central1 (Iowa)
    • europe-west4 (Netherlands)
  • MODEL: The model name you want to use.
    • For Claude 3 Opus, use claude-3-opus@20240229.
    • For Claude 3 Sonnet, use claude-3-sonnet@20240229.
    • For Claude 3 Haiku, use claude-3-haiku@20240307.
  • ROLE: The role associated with a message. You can specify a user or an assistant. The first message must use the user role. Claude models operate with alternating user and assistant turns. If the final message uses the assistant role, then the response content continues immediately from the content in that message. You can use this to constrain part of the model's response.
  • STREAM: A boolean that specifies whether the response is streamed or not. Stream your response to reduce the end-use latency perception. Set to true to stream the response and false to return the response all at once.
  • CONTENT: The content, such as text, of the user or assistant message.
  • MAX_OUTPUT_TOKENS: Maximum number of tokens that can be generated in the response. A token is approximately 3.5 characters. 100 tokens correspond to roughly 60-80 words.

    Specify a lower value for shorter responses and a higher value for potentially longer responses.

  • TOP_P (Optional): Top-P changes how the model selects tokens for output. Tokens are selected from the most (see top-K) to least probable until the sum of their probabilities equals the top-P value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is 0.5, then the model will select either A or B as the next token by using temperature and excludes C as a candidate.

    Specify a lower value for less random responses and a higher value for more random responses.

  • TOP_K(Optional): Top-K changes how the model selects tokens for output. A top-K of 1 means the next selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding), while a top-K of 3 means that the next token is selected from among the three most probable tokens by using temperature.

    For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling.

    Specify a lower value for less random responses and a higher value for more random responses.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:streamRawPredict

Request JSON body:

{
  "anthropic_version": "vertex-2023-10-16",
  "messages": [
   {
    "role": "ROLE",
    "content": "CONTENT"
   }],
  "max_tokens": MAX_TOKENS,
  "stream": STREAM
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:streamRawPredict"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:streamRawPredict" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Example curl command

MODEL_ID="claude-3-sonnet@20240229"
LOCATION="us-central1"
PROJECT_ID="PROJECT_ID"

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/${LOCATION}/publishers/anthropic/models/${MODEL_ID}:streamRawPredict -d \
'{
  "anthropic_version": "vertex-2023-10-16",
  "messages": [{
    "role": "user",
    "content": "Hello!"
  }],
  "max_tokens": 50,
  "stream": true}'

Anthropic Claude region availability

Claude 3 Opus is available in the following region:

  • us-east5 (Ohio)
Claude 3 Sonnet is available in the following regions:
  • us-central1 (Iowa)
  • asia-southeast1 (Singapore)
Claude 3 Haiku is available in the following regions:
  • us-central1 (Iowa)
  • europe-west4 (Netherlands)

Anthropic Claude quotas and supported context length

For Claude 3 models, a quota applies for each region where the model is available. The quota is specified in queries per minute (QPM) and tokens per minute (TPM). TPM includes both input and output tokens.

The default quota limit and supported context length for Claude 3 Opus (Preview) are:

Region Default quota limit Supported context length
us-east5 (Ohio) 15 QPM, 50,000 TPM 200,000 tokens

The default quota limit and supported context length for Claude 3 Sonnet are:

Region Default quota limit Supported context length
us-central1 (Iowa) 60 QPM, 50,000 TPM 200,000 tokens
asia-southeast1 (Singapore) 60 QPM, 50,000 TPM 200,000 tokens

The default quota limit and supported context length for Claude 3 Haiku are:

Region Default quota limit Supported context length
us-central1 (Iowa) 60 QPM, 50,000 TPM 200,000 tokens
europe-west4 (Netherlands) 60 QPM, 50,000 TPM 200,000 tokens

If you want to increase any of your quotas for Generative AI on Vertex AI, you can use the Google Cloud console to request a quota increase. To learn more about quotas, see Work with quotas.

Set the required permissions to enable Claude models and send prompts

For a user to enable Anthropic Claude models, an administrator must grant that user the Consumer Procurement Entitlement Manager Identity and Access Management (IAM) role. Any user who's been granted this role can enable an Anthropic Claude model in Model Garden.

For a user to make prompt requests from Vertex AI, an administrator must grant that user the aiplatform.endpoints.predict permission. This permission is included in the Vertex AI User IAM role. For more information, see Vertex AI User and Access control.

Console

  1. To grant the Consumer Procurement Entitlement Manager IAM roles to a user, go to the IAM page.

    Go to IAM

  2. In the Principal column, find the user principal for which you want to enable access Anthropic Claude models, and then click Edit principal in that row.

  3. In the Edit access pane, click Add another role.

  4. In Select a role, select Consumer Procurement Entitlement Manager.

  5. In the Edit access pane, click Add another role.

  6. In Select a role, select Vertex AI User.

  7. Click Save.

gcloud

  1. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

  2. Grant the Consumer Procurement Entitlement Manager role that's required to enable Anthropic Claude models in Model Garden

    gcloud projects add-iam-policy-binding  PROJECT_ID \
    --member=PRINCIPAL --role=roles/consumerprocurement.entitlementManager
    
  3. Grant the Vertex AI User role that includes the aiplatform.endpoints.predict permission which is required to make prompt requests:

    gcloud projects add-iam-policy-binding  PROJECT_ID \
    --member=PRINCIPAL --role=roles/aiplatform.user
    

    Replace PRINCIPAL with the identifier for the principal. The identifier takes the form user|group|serviceAccount:email or domain:domain—for example, user:cloudysanfrancisco@gmail.com, group:admins@example.com, serviceAccount:test123@example.domain.com, or domain:example.domain.com.

    The output is a list of policy bindings that includes the following:

    - members:
      - user:PRINCIPAL
      role: roles/roles/consumerprocurement.entitlementManager
    

    For more information, see Grant a single role and gcloud projects add-iam-policy-binding.