You can use REST APIs or the Python SDK to reference content stored in a context
cache in a generative AI application. Before it can be used, you must first
create the context cache. The context cache object you use in your code includes the following properties: The following features can be specified when you create a context cache. You
shouldn't specify them again in your request: The The The The following shows how to use a context cache. When you use a context cache,
you can't specify the following properties:
To learn more, see the
SDK reference documentation.
Set environment variables to use the Gen AI SDK with Vertex AI:
Learn how to install or update the Go.
To learn more, see the
SDK reference documentation.
Set environment variables to use the Gen AI SDK with Vertex AI:
You can use REST to use a context cache with a prompt by using the
Vertex AI API to send a POST request to the publisher model endpoint.
Before using any of the request data,
make the following replacements:
HTTP method and URL:
Request JSON body:
To send your request, choose one of these options:
Save the request body in a file named
Save the request body in a file named You should receive a JSON response similar to the following.
name
- The context cache resource name. Its format is
projects/PROJECT_NUMBER/locations/LOCATION/cachedContents/CACHE_ID
.
When you create a context cache, you can find its resource name is in the
response. The project number is a unique identifier for your project. The
cache ID is an ID for your cache. When you specify a context cache in your
code, you must use the full context cache resource name. The following is an
example that shows how you specify a cached content resource name in a request
body:"cached_content": "projects/123456789012/locations/us-central1/123456789012345678"
model
- The resource name for the model used to create the cache.
Its format is
projects/PROJECT_NUMBER/locations/LOCATION/publishers/PUBLISHER_NAME/models/MODEL_ID
.createTime
- A Timestamp
that specifies the create time of the context
cache.updateTime
- A Timestamp
that specifies the most recent update time of a
context cache. After a context cache is created, and before it's updated, its
createTime
and updateTime
are the same.expireTime
- A Timestamp
that specifies when a context cache expires. The
default expireTime
is 60 minutes after the createTime
. You can update the
cache with a new expiration time. For more information, see
Update the context cache.
After a cache expires, it's marked for deletion and you shouldn't assume that
it can be used or updated. If you need to use a context cache that expired,
you need to recreate it with an appropriate expiration time.Context cache use restrictions
GenerativeModel.system_instructions
property. This property is used to
specify instructions to the model before the model receives instructions from
a user. For more information, see
System instructions.GenerativeModel.tool_config
property. The
tool_config
property is
used to specify tools used by the Gemini model, such as a tool used by the
function calling feature.GenerativeModel.tools
property. The GenerativeModel.tools
property is
used to specify functions to create a function calling application. For more
information, see
Function calling.Use a context cache sample
GenerativeModel.system_instructions
GenerativeModel.tool_config
GenerativeModel.tools
Python
Install
pip install --upgrade google-genai
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True
Go
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True
REST
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/gemini-2.0-flash-001:generateContent
{
"cachedContent": "projects/PROJECT_NUMBER/locations/LOCATION/cachedContents/CACHE_ID",
"contents": [
{"role":"user","parts":[{"text":"PROMPT_TEXT"}]}
],
"generationConfig": {
"maxOutputTokens": 8192,
"temperature": 1,
"topP": 0.95,
},
"safetySettings": [
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
}
],
}
curl
request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/gemini-2.0-flash-001:generateContent"PowerShell
request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/gemini-2.0-flash-001:generateContent" | Select-Object -Expand ContentExample curl command
LOCATION="us-central1"
MODEL_ID="gemini-2.0-flash-001"
PROJECT_ID="test-project"
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent" -d \
'{
"cachedContent": "projects/${PROJECT_NUMBER}/locations/${LOCATION}/cachedContents/${CACHE_ID}",
"contents": [
{"role":"user","parts":[{"text":"What are the benefits of exercise?"}]}
],
"generationConfig": {
"maxOutputTokens": 8192,
"temperature": 1,
"topP": 0.95,
},
"safetySettings": [
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
}
],
}'
- Learn how to update the expiration time of a context cache.
- Learn how to create a new context cache.
- Learn how to get information about all context caches associated with a Google Cloud project.
- Learn how to delete a context cache.