Vertex AI API を使用して作成されたキャッシュは、デフォルトの Google キャッシュと連携します。これにより、キャッシュの作成時に指定されたコンテンツ以外のキャッシュが追加される可能性があります。キャッシュ データの保持期間をゼロにするには、デフォルトの Google キャッシュ保存を無効にし、Vertex API でキャッシュを作成しないようにします。詳細については、キャッシュ保存の有効化と無効化をご覧ください。
プロビジョンド スループットのコンテキスト キャッシュ保存のサポートは、デフォルトのキャッシュ保存のプレビュー版です。プロビジョンド スループットでは、Vertex AI API を使用したコンテキスト キャッシュ保存はサポートされていません。詳しくは、プロビジョンド スループット ガイドをご覧ください。
[[["わかりやすい","easyToUnderstand","thumb-up"],["問題の解決に役立った","solvedMyProblem","thumb-up"],["その他","otherUp","thumb-up"]],[["わかりにくい","hardToUnderstand","thumb-down"],["情報またはサンプルコードが不正確","incorrectInformationOrSampleCode","thumb-down"],["必要な情報 / サンプルがない","missingTheInformationSamplesINeed","thumb-down"],["翻訳に関する問題","translationIssue","thumb-down"],["その他","otherDown","thumb-down"]],["最終更新日 2025-08-28 UTC。"],[],[],null,["# Context caching overview\n\n| To see an example of context caching,\n| run the \"Intro to context caching\" notebook in one of the following\n| environments:\n|\n| [Open in Colab](https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/context-caching/intro_context_caching.ipynb)\n|\n|\n| \\|\n|\n| [Open in Colab Enterprise](https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fcontext-caching%2Fintro_context_caching.ipynb)\n|\n|\n| \\|\n|\n| [Open\n| in Vertex AI Workbench](https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https%3A%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fcontext-caching%2Fintro_context_caching.ipynb)\n|\n|\n| \\|\n|\n| [View on GitHub](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/context-caching/intro_context_caching.ipynb)\n\n\u003cbr /\u003e\n\nContext caching helps reduce the cost and latency of requests to\nGemini that contain repeated content. Vertex AI offers two\ntypes of caching:\n\n- **Implicit caching:** Automatic caching enabled by default that provides cost savings when cache hits occur.\n- **Explicit caching:** Manual caching enabled using the Vertex AI API, where you explicitly declare the content you want to cache and whether or not your prompts should refer to the cache content.\n\nFor both implicit and explicit caching, the [`cachedContentTokenCount`](/vertex-ai/docs/reference/rest/v1/GenerateContentResponse#UsageMetadata)\nfield in your response's metadata indicates the number of tokens in the cached\npart of your input. Caching requests must contain a minimum of 2,048 tokens.\n\nBoth implicit and explicit caching are supported when using the following\nmodels:\n\n- [Gemini 2.5 Pro](/vertex-ai/generative-ai/docs/models/gemini/2-5-pro)\n- [Gemini 2.5 Flash](/vertex-ai/generative-ai/docs/models/gemini/2-5-flash)\n- [Gemini 2.5 Flash-Lite](/vertex-ai/generative-ai/docs/models/gemini/2-5-flash-lite)\n- [Gemini 2.0 Flash](/vertex-ai/generative-ai/docs/models/gemini/2-0-flash)\n- [Gemini 2.0 Flash-Lite](/vertex-ai/generative-ai/docs/models/gemini/2-0-flash-lite)\n\nFor both implicit and explicit caching, there is no additional charge to write\nto cache other than the standard input token costs. For explicit caching, there\nare storage costs based on how long caches are stored. There are no storage\ncosts for implicit caching. For more information, see [Vertex AI pricing](/vertex-ai/generative-ai/pricing#context-caching).\n\nImplicit caching\n----------------\n\nAll Google Cloud projects have implicit caching enabled by default. Implicit\ncaching provides a 75% discount on cached tokens compared to standard\ninput tokens.\n\nWhen enabled, implicit cache hit cost savings are automatically passed on to\nyou. To increase the chances of an implicit cache hit:\n\n- Place large and common contents at the beginning of your prompt.\n- Send requests with a similar prefix in a short amount of time.\n\nExplicit caching\n----------------\n\nExplicit caching offers more control and ensures a 75% discount when explicit\ncaches are referenced.\n\nUsing the Vertex AI API, you can:\n\n- [Create context caches](/vertex-ai/generative-ai/docs/context-cache/context-cache-create) and control them more effectively.\n- [Use a context cache](/vertex-ai/generative-ai/docs/context-cache/context-cache-use) by referencing its contents in a prompt request with its resource name.\n- [Update a context cache's expiration time (Time to Live, or TTL)](/vertex-ai/generative-ai/docs/context-cache/context-cache-update) past the default 60 minutes.\n- [Delete a context cache](/vertex-ai/generative-ai/docs/context-cache/context-cache-delete) when no longer needed.\n\nYou can also use the Vertex AI API to\n[retrieve information about a context cache](/vertex-ai/generative-ai/docs/context-cache/context-cache-getinfo).\n\nExplicit caches interact with implicit caching, potentially leading to\nadditional caching beyond the specified contents when [creating a cache](/vertex-ai/generative-ai/docs/context-cache/context-cache-create). To\nprevent cache data retention, disable implicit caching and avoid creating\nexplicit caches. For more information, see [Enable and disable caching](/vertex-ai/generative-ai/docs/data-governance#enabling-disabling-caching).\n\nWhen to use context caching\n---------------------------\n\nContext caching is particularly well suited to scenarios where a substantial\ninitial context is referenced repeatedly by subsequent requests.\n\nCached context items, such as a large amount of text, an audio file, or a video\nfile, can be used in prompt requests to the Gemini API to generate output.\nRequests that use the same cache in the prompt also include text unique to each\nprompt. For example, each prompt request that composes a chat conversation might\ninclude the same context cache that references a video along with unique text\nthat comprises each turn in the chat.\n\nConsider using context caching for use cases such as:\n\n- Chatbots with extensive system instructions\n- Repetitive analysis of lengthy video files\n- Recurring queries against large document sets\n- Frequent code repository analysis or bug fixing\n\nContext caching support for Provisioned Throughput is in\n[Preview](https://cloud.google.com/products#product-launch-stages) for implicit caching. Explicit caching is not supported\nfor Provisioned Throughput. Refer to the [Provisioned Throughput\nguide](/vertex-ai/generative-ai/docs/provisioned-throughput/measure-provisioned-throughput#context_caching) for more details.\n\nAvailability\n------------\n\nContext caching is available in regions where Generative AI on Vertex AI is\navailable. For more information, see [Generative AI on Vertex AI\nlocations](/vertex-ai/generative-ai/docs/learn/locations).\n\nVPC Service Controls support\n----------------------------\n\nContext caching supports VPC Service Controls, meaning your cache cannot be\nexfiltrated beyond your service perimeter. If you use Cloud Storage to build\nyour cache, include your bucket in your service perimeter as well to protect\nyour cache content.\n\nFor more information, see [VPC Service Controls with Vertex AI](/vertex-ai/docs/general/vpc-service-controls)\nin the Vertex AI documentation.\n\nWhat's next\n-----------\n\n- Learn about [the Gemini API](/vertex-ai/generative-ai/docs/overview).\n- Learn how to [use multimodal prompts](/vertex-ai/generative-ai/docs/multimodal/send-multimodal-prompts)."]]