Count tokens for Claude models

The count-tokens endpoint lets you determine the number of tokens in a message before sending it to Claude, helping you make informed decisions about your prompts and usage.

There is no cost for using the count-tokens endpoint.

Supported Claude models

The following models support count tokens:

  • Claude 3.5 Sonnet v2: claude-3-5-sonnet-v2@20241022.
  • Claude 3.5 Haiku: claude-3-5-haiku@20241022.
  • Claude 3 Opus: claude-3-opus@20240229.
  • Claude 3.5 Sonnet: claude-3-5-sonnet@20240620.
  • Claude 3 Haiku: claude-3-haiku@20240307.

Supported regions

The following regions support count tokens:

  • us-east5
  • europe-west1
  • asia-southeast1
  • us-central1
  • europe-west4

Count tokens in basic messages

To count tokens, send a rawPredict request to the count-tokens endpoint. The body of the request must contain the model ID of the model you want to count tokens against.

REST

Before using any of the request data, make the following replacements:

  • LOCATION: A supported region.
  • MODEL: The model to count tokens against.
  • ROLE: The role associated with a message. You can specify a user or an assistant. The first message must use the user role. Claude models operate with alternating user and assistant turns. If the final message uses the assistant role, then the response content continues immediately from the content in that message. You can use this to constrain part of the model's response.
  • CONTENT: The content, such as text, of the user or assistant message.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/count-tokens:rawPredict

Request JSON body:

{
  "model": "claude-3-haiku@20240307",
  "messages": [
    {
      "role": "user",
      "content":"how many tokens are in this request?"
    }
  ],
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/count-tokens:rawPredict"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/count-tokens:rawPredict" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

For information on how to count tokens in messages with tools, images, and PDFs, see Anthropic's documentation.

Quotas

By default, the quota for the count-tokens endpoint is 2000 requests per minute.