You must create a context cache before you can use it. The context cache you
create contains a large amount of data that you can use in multiple requests to
a Gemini model. The cached content is stored in the region where you make the
request to create the cache. Cached content can be any of the MIME types supported by Gemini multimodal
models. For example, you can cache a large amount of text, audio, or video. You
can specify more than one file to cache. For more information, see the following
media requirements: You specify the content to cache using a blob, text, or a path to a file that's
stored in a Cloud Storage bucket. If the size of the content you're caching
is greater than 10 MB, then you must specify it using the URI of a file that's
stored in a Cloud Storage bucket. Cached content has a finite lifespan. The default expiration time of a context
cache is 60 minutes after it's created. If you want a different expiration time,
you can specify a different expiration time using the After a context cache expires, it's no longer available. If you want to
reference the content in an expired context cache in future prompt requests,
then you need to recreate the context cache. The content that you cache must adhere to the limits shown in the following
table: Minimum cache token count Maximum size of content you can cache using a blob or text Minimum time before a cache expires after it's created Maximum time before a cache expires after it's created Context caching isn't supported in the Sydney, Australia
( Context caching supports the
global endpoint. Context caching supports Customer-Managed Encryption Keys (CMEKs), allowing you
to control the encryption of your cached data and protect your sensitive
information with encryption keys that you manage and own. This provides an
additional layer of security and compliance. Refer to
the example for more details. CMEK is not supported when using the global endpoint. Context caching supports Access Transparency. The following examples show how to create a context cache.
To learn more, see the
SDK reference documentation.
Set environment variables to use the Gen AI SDK with Vertex AI:
Learn how to install or update the Go.
To learn more, see the
SDK reference documentation.
Set environment variables to use the Gen AI SDK with Vertex AI:
You can use REST to create a context cache by using the Vertex AI API to send a
POST request to the publisher model endpoint. The following example shows
how to create a context cache using a file stored in a Cloud Storage
bucket.
Before using any of the request data,
make the following replacements:
HTTP method and URL:
Request JSON body:
To send your request, choose one of these options:
Save the request body in a file named
Save the request body in a file named You should receive a JSON response similar to the following:
ttl
or the expire_time
property when you create a context cache. You can also update the expiration
time for an unexpired context cache. For information about how to specify
ttl
and expire_time
, see
Update the expiration time.Limits
Context caching limits
2,048
(Gemini 2.5 Pro)1,024
(Gemini 2.5 Flash)1,024
(Gemini 2.0 Flash)1,024
(Gemini 2.0 Flash-Lite)
10 MB
1 minute
There isn't a maximum cache duration
Location support
australia-southeast1
) region.Encryption key support
Access Transparency support
Create context cache example
Python
Install
pip install --upgrade google-genai
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True
Go
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True
REST
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents
{
"model": "projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID",
"displayName": "CACHE_DISPLAY_NAME",
"contents": [{
"role": "user",
"parts": [{
"fileData": {
"mimeType": "MIME_TYPE",
"fileUri": "CONTENT_TO_CACHE_URI"
}
}]
},
{
"role": "model",
"parts": [{
"text": "This is sample text to demonstrate explicit caching."
}]
}]
}
curl
request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents"PowerShell
request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents" | Select-Object -Expand ContentExample curl command
LOCATION="us-central1"
MODEL_ID="gemini-2.0-flash-001"
PROJECT_ID="test-project"
MIME_TYPE="video/mp4"
CACHED_CONTENT_URI="gs://path-to-bucket/video-file-name.mp4"
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/cachedContents -d \
'{
"model":"projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}",
"contents": [
{
"role": "user",
"parts": [
{
"fileData": {
"mimeType": "${MIME_TYPE}",
"fileUri": "${CACHED_CONTENT_URI}"
}
}
]
}
]
}'
Create a context cache with CMEK
To implement context caching with CMEKs, create a CMEK by following
the instructions and make sure the
Vertex AI per-product, per-project service account (P4SA) has the
necessary Cloud KMS CryptoKey Encrypter/Decrypter permissions on the key.
This lets you securely create and manage cached content as well make
other calls like {List
, Update
, Delete
, Get
} CachedContent
(s) without
repeatedly specifying a KMS key.
REST
You can use REST to create a context cache by using the Vertex AI API to send a POST request to the publisher model endpoint. The following example shows how to create a context cache using a file stored in a Cloud Storage bucket.
Before using any of the request data, make the following replacements:
- PROJECT_ID: .
- LOCATION: The region to process the request and where the cached content is stored. For a list of supported regions, see Available regions.
- MODEL_ID: gemini-2.0-flash-001.
- CACHE_DISPLAY_NAME: A meaningful display name to describe and to help you identify each context cache.
- MIME_TYPE: The MIME type of the content to cache.
- CACHED_CONTENT_URI: The Cloud Storage URI of the content to cache.
- KMS_KEY_NAME: The Cloud KMS key name.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents
Request JSON body:
{ "model": "projects/PROJECT_ID/locations/LOCATION/publishers/google/models/gemini-2.0-flash-001", "displayName": "CACHE_DISPLAY_NAME", "contents": [{ "role": "user", "parts": [{ "fileData": { "mimeType": "MIME_TYPE", "fileUri": "CONTENT_TO_CACHE_URI" } }]}], "encryptionSpec": { "kmsKeyName": "KMS_KEY_NAME" } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/cachedContents" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
Example curl command
LOCATION="us-central1"
MODEL_ID="gemini-2.0-flash-001"
PROJECT_ID="test-project"
MIME_TYPE="video/mp4"
CACHED_CONTENT_URI="gs://path-to-bucket/video-file-name.mp4"
KMS_KEY_NAME="projects/${PROJECT_ID}/locations/{LOCATION}/keyRings/your-key-ring/cryptoKeys/your-key"
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/cachedContents -d \
'{
"model": "projects/{PROJECT_ID}}/locations/{LOCATION}/publishers/google/models/{MODEL_ID}",
"contents" : [
{
"role": "user",
"parts": [
{
"file_data": {
"mime_type":"{MIME_TYPE}",
"file_uri":"{CACHED_CONTENT_URI}"
}
}
]
}
],
"encryption_spec" :
{
"kms_key_name":"{KMS_KEY_NAME}"
}
}'
GenAI SDK for Python
Install
pip install --upgrade google-genai
To learn more, see the SDK reference documentation.
Set environment variables to use the Gen AI SDK with Vertex AI:
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True
import os
from google import genai
from google.genai.types import Content, CreateCachedContentConfig, HttpOptions, Part
os.environ['GOOGLE_CLOUD_PROJECT'] = 'vertexsdk'
os.environ['GOOGLE_CLOUD_LOCATION'] = 'us-central1'
os.environ['GOOGLE_GENAI_USE_VERTEXAI'] = 'True'
client = genai.Client(http_options=HttpOptions(api_version="v1"))
system_instruction = """
You are an expert researcher. You always stick to the facts in the sources provided, and never make up new facts.
Now look at these research papers, and answer the following questions.
"""
contents = [
Content(
role="user",
parts=[
Part.from_uri(
file_uri="gs://cloud-samples-data/generative-ai/pdf/2312.11805v3.pdf",
mime_type="application/pdf",
),
Part.from_uri(
file_uri="gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
mime_type="application/pdf",
),
],
)
]
content_cache = client.caches.create(
model="gemini-2.0-flash-001",
config=CreateCachedContentConfig(
contents=contents,
system_instruction=system_instruction,
display_name="example-cache",
kms_key_name="projects/vertexsdk/locations/us-central1/keyRings/your-project/cryptoKeys/your-key",
ttl="86400s",
),
)
print(content_cache.name)
print(content_cache.usage_metadata)
GenAI SDK for Go
Learn how to install or update the Gen AI SDK for Go.
To learn more, see the SDK reference documentation.
Set environment variables to use the Gen AI SDK with Vertex AI:
import (
"context"
"encoding/json"
"fmt"
"io"
genai "google.golang.org/genai"
)
// createContentCache shows how to create a content cache with an expiration parameter.
func createContentCache(w io.Writer) (string, error) {
ctx := context.Background()
client, err := genai.NewClient(ctx, &genai.ClientConfig{
HTTPOptions: genai.HTTPOptions{APIVersion: "v1beta1"},
})
if err != nil {
return "", fmt.Errorf("failed to create genai client: %w", err)
}
modelName := "gemini-2.0-flash-001"
systemInstruction := "You are an expert researcher. You always stick to the facts " +
"in the sources provided, and never make up new facts. " +
"Now look at these research papers, and answer the following questions."
cacheContents := []*genai.Content{
{
Parts: []*genai.Part{
{FileData: &genai.FileData{
FileURI: "gs://cloud-samples-data/generative-ai/pdf/2312.11805v3.pdf",
MIMEType: "application/pdf",
}},
{FileData: &genai.FileData{
FileURI: "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
MIMEType: "application/pdf",
}},
},
Role: "user",
},
}
config := &genai.CreateCachedContentConfig{
Contents: cacheContents,
SystemInstruction: &genai.Content{
Parts: []*genai.Part{
{Text: systemInstruction},
},
},
DisplayName: "example-cache",
KmsKeyName: "projects/vertexsdk/locations/us-central1/keyRings/your-project/cryptoKeys/your-key",
TTL: "86400s",
}
res, err := client.Caches.Create(ctx, modelName, config)
if err != nil {
return "", fmt.Errorf("failed to create content cache: %w", err)
}
cachedContent, err := json.MarshalIndent(res, "", " ")
if err != nil {
return "", fmt.Errorf("failed to marshal cache info: %w", err)
}
// See the documentation: https://pkg.go.dev/google.golang.org/genai#CachedContent
fmt.Fprintln(w, string(cachedContent))
return res.Name, nil
}
What's next
- Learn how to use a context cache.
- Learn how to update the expiration time of a context cache.