You must create a context cache before you can use it. The context cache you create contains a large amount of data that you can use in multiple requests to a Gemini model. The cached content is stored in the region where you make the request to create the cache.
Cached content can be any of the MIME types supported by Gemini multimodal models. For example, you can cache a large amount of text, audio, or video. You can specify more than one file to cache. For more information, see the following media requirements:
You specify the content to cache using a blob, text, or a path to a file that's stored in a Cloud Storage bucket. If the size of the content you're caching is greater than 10 MB, then you must specify it using the URI of a file that's stored in a Cloud Storage bucket.
Cached content has a finite lifespan. The default expiration time of a context
cache is 60 minutes after it's created. If you want a different expiration time,
you can specify a different expiration time using the ttl
or the expire_time
property when you create a context cache. You can also update the expiration
time for an unexpired context cache. For information about how to specify
ttl
and expire_time
, see
Update the expiration time.
After a context cache expires, it's no longer available. If you want to reference the content in an expired context cache in future prompt requests, then you need to recreate the context cache.
Context cache limits
The content that you cache must adhere to the following limits:
Context caching limits | |
---|---|
Minimum size of a cache |
32,769 tokens |
Maximum size of content you can cache using a blob or text |
10 MB |
Minimum time before a cache expires after it's created |
1 minute |
Maximum time before a cache expires after it's created |
There isn't a maximum cache duration |
Create context cache example
The following examples show how to create a context cache.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Vertex AI SDK for Python API reference documentation.
Streaming and non-streaming responses
You can choose whether the model generates streaming responses or non-streaming responses. For streaming responses, you receive each response as soon as its output token is generated. For non-streaming responses, you receive all responses after all of the output tokens are generated.
For a streaming response, use the stream
parameter in
generate_content
.
response = model.generate_content(contents=[...], stream = True)
For a non-streaming response, remove the parameter, or set the parameter to
False
.
Sample code
Go
Before trying this sample, follow the Go setup instructions in the Vertex AI quickstart. For more information, see the Vertex AI Go SDK for Gemini reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Streaming and non-streaming responses
You can choose whether the model generates streaming responses or non-streaming responses. For streaming responses, you receive each response as soon as its output token is generated. For non-streaming responses, you receive all responses after all of the output tokens are generated.
For a streaming response, use the
GenerateContentStream
method.
iter := model.GenerateContentStream(ctx, genai.Text("Tell me a story about a lumberjack and his giant ox. Keep it very short."))
For a non-streaming response, use the GenerateContent
method.
resp, err := model.GenerateContent(ctx, genai.Text("What is the average size of a swallow?"))
Sample code
REST
You can use REST to create a context cache by using the Vertex AI API to send a POST request to the publisher model endpoint. The following example shows how to create a context cache using a file stored in a Cloud Storage bucket.
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process the request and where the cached content is stored. For a list of supported regions, see Available regions.
- CACHE_DISPLAY_NAME: A meaningful display name to describe and to help you identify each context cache.
- MIME_TYPE: The MIME type of the content to cache.
- CONTENT_TO_CACHE_URI: The Cloud Storage URI of the content to cache.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/cachedContents
Request JSON body:
{ "model": "projects/PROJECT_ID/locations/LOCATION/publishers/google/models/gemini-1.5-pro-002", "displayName": "CACHE_DISPLAY_NAME", "contents": [{ "role": "user", "parts": [{ "fileData": { "mimeType": "MIME_TYPE", "fileUri": "CONTENT_TO_CACHE_URI" } }] }, { "role": "model", "parts": [{ "text": "This is sample text to demonstrate explicit caching." }] }] }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/cachedContents"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/cachedContents" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
Example curl command
LOCATION="us-central1"
MODEL_ID="gemini-1.5-pro-002"
PROJECT_ID="test-project"
MIME_TYPE="video/mp4"
CACHED_CONTENT_URI="gs://path-to-bucket/video-file-name.mp4"
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}/cachedContents -d \
'{
"model":"projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}",
"contents": [
{
"role": "user",
"parts": [
{
"fileData": {
"mimeType": "${MIME_TYPE}",
"fileUri": "${CACHED_CONTENT_URI}"
}
}
]
}
]
}'
What's next
- Learn how to use a context cache.
- Learn how to update the expiration time of a context cache.