List and count tokens

The Vertex AI SDK for Python (1.60.0 and later) includes an integrated tokenizer, which lets you list and count the tokens of a prompt locally without having to make API calls. This page shows you how to list the tokens and their token IDs of a prompt and how to get a total token count of a prompt by using the Vertex AI SDK for Python.

Tokens and the importance of token listing and counting

Generative AI models break down text and other data in a prompt into units called tokens for processing. The way that data is converted into tokens depends on the tokenizer used. A token can be characters, words, or phrases.

Each model has a maximum number of tokens that it can handle in a prompt and response. Knowing the token count of your prompt lets you know whether you've exceeded this limit or not. Additionally, counting tokens also returns the billable characters for the prompt, which helps you estimate cost.

Listing tokens returns a list of the tokens that your prompt is broken down into. Each listed token is associated with a token ID, which helps you perform troubleshooting and analyze model behavior.

Supported models

The following table shows you the models that support token listing and token counting:

List tokens	Count tokens
`gemini-1.5-flash-002`	`gemini-1.5-flash-002`
`gemini-1.5-pro-002`	`gemini-1.5-pro-002`
	`gemini-1.0-pro-002`
	`gemini-1.0-pro-vision-001`

Get a list of tokens and token IDs for a prompt

The following code sample shows you how to get a list of tokens and token IDs for a prompt. The prompt must contain only text. Multimodal prompts are not supported.

Gen AI SDK for Python

Learn how to install or update the Google Gen AI SDK for Python.
For more information, see the Gen AI SDK for Python API reference documentation or the python-genai GitHub repository.
Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True

from google import genai
from google.genai.types import HttpOptions

client = genai.Client(http_options=HttpOptions(api_version="v1"))
response = client.models.compute_tokens(
    model="gemini-2.0-flash-001",
    contents="What's the longest word in the English language?",
)
print(response)

# Example output:
# tokens_info=[TokensInfo(
#    role='user',
#    token_ids=[1841, 235303, 235256, 573, 32514, 2204, 575, 573, 4645, 5255, 235336],
#    tokens=[b'What', b"'", b's', b' the', b' longest', b' word', b' in', b' the', b' English', b' language', b'?']
#  )]

Vertex AI SDK for Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Vertex AI SDK for Python API reference documentation.

from vertexai.preview.tokenization import get_tokenizer_for_model

# init local tokenzier
tokenizer = get_tokenizer_for_model("gemini-1.5-flash-001")

# Count Tokens
prompt = "why is the sky blue?"
response = tokenizer.count_tokens(prompt)
print(f"Tokens count: {response.total_tokens}")
# Example response:
#       Tokens count: 6

# Compute Tokens
response = tokenizer.compute_tokens(prompt)
print(f"Tokens list: {response.tokens_info}")
# Example response:
#     Tokens list: [TokensInfo(token_ids=[18177, 603, 573, 8203, 3868, 235336],
#          tokens=[b'why', b' is', b' the', b' sky', b' blue', b'?'], role='user')]

Get the token count and billable characters of a prompt

The following code sample shows you how to Get the token count and the number of billable characters of a prompt. Both text-only and multimodal prompts are supported.

Gen AI SDK for Python

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True

from google import genai
from google.genai.types import HttpOptions

client = genai.Client(http_options=HttpOptions(api_version="v1"))

prompt = "Why is the sky blue?"

# Send text to Gemini
response = client.models.generate_content(
    model="gemini-2.0-flash-001", contents=prompt
)

# Prompt and response tokens count
print(response.usage_metadata)

# Example output:
#  cached_content_token_count=None
#  candidates_token_count=311
#  prompt_token_count=6
#  total_token_count=317

Vertex AI SDK for Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Vertex AI SDK for Python API reference documentation.

from vertexai.preview.tokenization import get_tokenizer_for_model

# Using local tokenzier
tokenizer = get_tokenizer_for_model("gemini-1.5-flash-002")

prompt = "hello world"
response = tokenizer.count_tokens(prompt)
print(f"Prompt Token Count: {response.total_tokens}")
# Example response:
# Prompt Token Count: 2

prompt = ["hello world", "what's the weather today"]
response = tokenizer.count_tokens(prompt)
print(f"Prompt Token Count: {response.total_tokens}")
# Example response:
# Prompt Token Count: 8