- Limits and requirements: Learn about the constraints for caching content, including size and duration.
- Create a context cache: Follow examples using the REST API, Python SDK, or Go SDK to create a standard context cache.
- Create a context cache with CMEK: Use your own encryption keys for added security when creating a cache.
Before you can use a context cache, you must create one. Key characteristics of a context cache include the following:
- Content type: Cached content can be any of the MIME types supported by Gemini multimodal models, such as text, audio, or video. You can specify more than one file to cache. For more information, see the following media requirements:
- Content source: You can specify content to cache using a blob, text, or a path to a file in a Cloud Storage bucket. For content larger than 10 MB, you must use a file URI from a Cloud Storage bucket.
- Location: The cached content is stored in the same region where you make the request to create the cache.
- Expiration: A context cache has a finite lifespan. The default expiration time is 60 minutes. You can specify a different expiration time using the
ttl
orexpire_time
property when you create the cache. You can also update the expiration time for an unexpired cache. After a cache expires, you must recreate it to use its content again.
Limits
The content that you cache must adhere to the limits shown in the following table:
Context caching limits | |
---|---|
Minimum cache token count |
|
Maximum size of content you can cache using a blob or text |
10 MB |
Minimum time before a cache expires after it's created |
1 minute |
Maximum time before a cache expires after it's created |
There isn't a maximum cache duration |
Before you begin
Location support
Context caching isn't supported in the Sydney, Australia
(australia-southeast1
) region.
Encryption key support
Context caching supports Customer-Managed Encryption Keys (CMEKs), allowing you to control the encryption of your cached data and protect your sensitive information with encryption keys that you manage and own. This provides an additional layer of security and compliance.
Refer to the example for more details.
Access Transparency support
Context caching supports Access Transparency.
Create a context cache
The following examples show how to create a context cache using the REST API or the Google Gen AI SDK for Python and Go. Use this table to help you decide which option is best for you.
Method | Description | Use Case |
---|---|---|
REST API | Send requests directly to the Vertex AI API endpoint using tools like curl or a custom HTTP client. |
Best for quick testing, environments without a supported SDK, or integrating with non-Python/Go applications. |
Gen AI SDK for Python | Use an idiomatic Python library to interact with the API. | Best for Python developers who want a streamlined development experience. |
Gen AI SDK for Go | Use an idiomatic Go library to interact with the API. | Best for Go developers who want to integrate context caching into their Go applications. |
Python
Install
pip install --upgrade google-genai
To learn more, see the SDK reference documentation.
Set environment variables to use the Gen AI SDK with Vertex AI:
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values # with appropriate values for your project. export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT export GOOGLE_CLOUD_LOCATION=us-central1 export GOOGLE_GENAI_USE_VERTEXAI=True
Go
Learn how to install or update the Go.
To learn more, see the SDK reference documentation.
Set environment variables to use the Gen AI SDK with Vertex AI:
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values # with appropriate values for your project. export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT export GOOGLE_CLOUD_LOCATION=us-central1 export GOOGLE_GENAI_USE_VERTEXAI=True
REST
To create a context cache with the REST API, send a POST request to the publisher model endpoint. The following example shows how to create a context cache by using a file stored in a Cloud Storage bucket.
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process the request and where the cached content is stored. For a list of supported regions, see Available regions.
- CACHE_DISPLAY_NAME: A meaningful display name to describe and to help you identify each context cache.
- MIME_TYPE: The MIME type of the content to cache.
- CONTENT_TO_CACHE_URI: The Cloud Storage URI of the content to cache.
- MODEL_ID: The model to use for caching.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents
Request JSON body:
{ "model": "projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID", "displayName": "CACHE_DISPLAY_NAME", "contents": [{ "role": "user", "parts": [{ "fileData": { "mimeType": "MIME_TYPE", "fileUri": "CONTENT_TO_CACHE_URI" } }] }, { "role": "model", "parts": [{ "text": "This is sample text to demonstrate explicit caching." }] }] }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
Example curl command
LOCATION="us-central1"
MODEL_ID="gemini-2.0-flash-001"
PROJECT_ID="test-project"
MIME_TYPE="video/mp4"
CACHED_CONTENT_URI="gs://path-to-bucket/video-file-name.mp4"
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/cachedContents -d \
'{
"model":"projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}",
"contents": [
{
"role": "user",
"parts": [
{
"fileData": {
"mimeType": "${MIME_TYPE}",
"fileUri": "${CACHED_CONTENT_URI}"
}
}
]
}
]
}'
Create a context cache with CMEK
To implement context caching with CMEKs, create a CMEK by following
the instructions and make sure the
Vertex AI per-product, per-project service account (P4SA) has the
necessary Cloud KMS CryptoKey Encrypter/Decrypter permissions on the key.
This lets you securely create and manage cached content as well make
other calls like {List
, Update
, Delete
, Get
} CachedContent
(s) without
repeatedly specifying a KMS key.
REST
You can use REST to create a context cache by using the Vertex AI API to send a POST request to the publisher model endpoint. The following example shows how to create a context cache using a file stored in a Cloud Storage bucket.
Before using any of the request data, make the following replacements:
- PROJECT_ID: .
- LOCATION: The region to process the request and where the cached content is stored. For a list of supported regions, see Available regions.
- MODEL_ID: gemini-2.0-flash-001.
- CACHE_DISPLAY_NAME: A meaningful display name to describe and to help you identify each context cache.
- MIME_TYPE: The MIME type of the content to cache.
- CACHED_CONTENT_URI: The Cloud Storage URI of the content to cache.
- KMS_KEY_NAME: The Cloud KMS key name.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents
Request JSON body:
{ "model": "projects/PROJECT_ID/locations/LOCATION/publishers/google/models/gemini-2.0-flash-001", "displayName": "CACHE_DISPLAY_NAME", "contents": [{ "role": "user", "parts": [{ "fileData": { "mimeType": "MIME_TYPE", "fileUri": "CONTENT_TO_CACHE_URI" } }]}], "encryptionSpec": { "kmsKeyName": "KMS_KEY_NAME" } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
Example curl command
LOCATION="us-central1"
MODEL_ID="gemini-2.0-flash-001"
PROJECT_ID="test-project"
MIME_TYPE="video/mp4"
CACHED_CONTENT_URI="gs://path-to-bucket/video-file-name.mp4"
KMS_KEY_NAME="projects/${PROJECT_ID}/locations/{LOCATION}/keyRings/your-key-ring/cryptoKeys/your-key"
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/cachedContents -d \
'{
"model": "projects/{PROJECT_ID}}/locations/{LOCATION}/publishers/google/models/{MODEL_ID}",
"contents" : [
{
"role": "user",
"parts": [
{
"file_data": {
"mime_type":"{MIME_TYPE}",
"file_uri":"{CACHED_CONTENT_URI}"
}
}
]
}
],
"encryption_spec" :
{
"kms_key_name":"{KMS_KEY_NAME}"
}
}'
GenAI SDK for Python
Install
pip install --upgrade google-genai
To learn more, see the SDK reference documentation.
Set environment variables to use the Gen AI SDK with Vertex AI:
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True
import os
from google import genai
from google.genai.types import Content, CreateCachedContentConfig, HttpOptions, Part
os.environ['GOOGLE_CLOUD_PROJECT'] = 'vertexsdk'
os.environ['GOOGLE_CLOUD_LOCATION'] = 'us-central1'
os.environ['GOOGLE_GENAI_USE_VERTEXAI'] = 'True'
client = genai.Client(http_options=HttpOptions(api_version="v1"))
system_instruction = """
You are an expert researcher. You always stick to the facts in the sources provided, and never make up new facts.
Now look at these research papers, and answer the following questions.
"""
contents = [
Content(
role="user",
parts=[
Part.from_uri(
file_uri="gs://cloud-samples-data/generative-ai/pdf/2312.11805v3.pdf",
mime_type="application/pdf",
),
Part.from_uri(
file_uri="gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
mime_type="application/pdf",
),
],
)
]
content_cache = client.caches.create(
model="gemini-2.0-flash-001",
config=CreateCachedContentConfig(
contents=contents,
system_instruction=system_instruction,
display_name="example-cache",
kms_key_name="projects/vertexsdk/locations/us-central1/keyRings/your-project/cryptoKeys/your-key",
ttl="86400s",
),
)
print(content_cache.name)
print(content_cache.usage_metadata)
GenAI SDK for Go
Learn how to install or update the Gen AI SDK for Go.
To learn more, see the SDK reference documentation.
Set environment variables to use the Gen AI SDK with Vertex AI:
import (
"context"
"encoding/json"
"fmt"
"io"
genai "google.golang.org/genai"
)
// createContentCache shows how to create a content cache with an expiration parameter.
func createContentCache(w io.Writer) (string, error) {
ctx := context.Background()
client, err := genai.NewClient(ctx, &genai.ClientConfig{
HTTPOptions: genai.HTTPOptions{APIVersion: "v1beta1"},
})
if err != nil {
return "", fmt.Errorf("failed to create genai client: %w", err)
}
modelName := "gemini-2.0-flash-001"
systemInstruction := "You are an expert researcher. You always stick to the facts " +
"in the sources provided, and never make up new facts. " +
"Now look at these research papers, and answer the following questions."
cacheContents := []*genai.Content{
{
Parts: []*genai.Part{
{FileData: &genai.FileData{
FileURI: "gs://cloud-samples-data/generative-ai/pdf/2312.11805v3.pdf",
MIMEType: "application/pdf",
}},
{FileData: &genai.FileData{
FileURI: "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
MIMEType: "application/pdf",
}},
},
Role: "user",
},
}
config := &genai.CreateCachedContentConfig{
Contents: cacheContents,
SystemInstruction: &genai.Content{
Parts: []*genai.Part{
{Text: systemInstruction},
},
},
DisplayName: "example-cache",
KmsKeyName: "projects/vertexsdk/locations/us-central1/keyRings/your-project/cryptoKeys/your-key",
TTL: "86400s",
}
res, err := client.Caches.Create(ctx, modelName, config)
if err != nil {
return "", fmt.Errorf("failed to create content cache: %w", err)
}
cachedContent, err := json.MarshalIndent(res, "", " ")
if err != nil {
return "", fmt.Errorf("failed to marshal cache info: %w", err)
}
// See the documentation: https://pkg.go.dev/google.golang.org/genai#CachedContent
fmt.Fprintln(w, string(cachedContent))
return res.Name, nil
}
What's next
- Learn how to use a context cache.
- Learn how to update the expiration time of a context cache.