このセクションでは、プロビジョニングされたスループットが Live API でトークン数のカウントと割り当ての適用にどのように機能するかについて説明します。
Live API は、セッションを通じて低レイテンシのマルチモーダル インタラクションをサポートします。セッション メモリを使用して、セッション内のインタラクションから情報を保持し、呼び出します。これにより、モデルは以前に提供または議論された情報を思い出すことができます。プロビジョニングされたスループットは、Gemini 2.5 Flash with Live API モデルをサポートしています。セッションの制限や機能など、Live API の詳細については、Live API リファレンスをご覧ください。
Live API のスループットを計算する
Live API を使用している間、セッション メモリに保存されたトークンは、モデルに対する後続のリクエストで使用できます。その結果、プロビジョニングされたスループットは、同じリクエスト内の受信トークンとセッション メモリトークンを考慮します。これにより、リクエストごとに処理されるトークンの数が、進行中のリクエストでユーザーが送信したトークンの数よりも多くなる可能性があります。
Live API には、セッション メモリに保存できるトークンの合計数に上限があります。また、トークンの合計数を含むメタデータ フィールドもあります。リクエストを処理するために必要なスループットを計算する際は、セッション メモリ内のトークンを考慮する必要があります。従量課金制(PayGo)で Live API を使用したことがある場合は、これらのトラフィック パターンとセッション トークンを使用して、プロビジョニングされたスループットのニーズを見積もることができます。
[[["わかりやすい","easyToUnderstand","thumb-up"],["問題の解決に役立った","solvedMyProblem","thumb-up"],["その他","otherUp","thumb-up"]],[["わかりにくい","hardToUnderstand","thumb-down"],["情報またはサンプルコードが不正確","incorrectInformationOrSampleCode","thumb-down"],["必要な情報 / サンプルがない","missingTheInformationSamplesINeed","thumb-down"],["翻訳に関する問題","translationIssue","thumb-down"],["その他","otherDown","thumb-down"]],["最終更新日 2025-09-04 UTC。"],[],[],null,["# Provisioned Throughput for Live API\n\n| **Request access:** For information about access to this release, see the [access request page](https://docs.google.com/forms/d/e/1FAIpQLScxBeD4UJ8GbUfX4SXjj5a1XJ1K7Urwvb0iSGdGccNcFRBrpQ/viewform).\n\nThis section explains how Provisioned Throughput works with the\nLive API for token counting and quota enforcement.\n\nThe Live API supports low-latency multimodal interactions through\nsessions. It uses a session memory to retain and recall information from\ninteractions within a session. This lets the model recall previously provided or discussed information. Provisioned Throughput supports\nthe Gemini 2.5 Flash with Live API model. For more\ninformation about the Live API, including session limits and\ncapabilities, see the\n[Live API reference](/vertex-ai/generative-ai/docs/model-reference/multimodal-live).\n\nCalculate throughput for Live API\n---------------------------------\n\nWhile using the Live API, the tokens stored in the session memory\ncan be used in subsequent requests to the model. As a result, Provisioned Throughput\ntakes into account the incoming tokens as well as session memory tokens in the\nsame request. This might lead to the number of tokens being processed per request\nbeing greater than the tokens sent by the user in the ongoing request.\n\nThe Live API has a limit on the total tokens that can be stored in\nthe session memory and also has a metadata field containing the total number\nof tokens. While calculating how much throughput is needed to serve your requests,\nyou must account for tokens in the session memory.\nIf you've used the Live API with pay-as-you-go (PayGo), you can\nuse these traffic patterns and session tokens to help estimate your\nProvisioned Throughput needs.\n\n### Example of how to estimate your Provisioned Throughput requirements for Live API\n\nDuring a session, all traffic is processed either as\nProvisioned Throughput or pay-as-you-go. If you reach your\nProvisioned Throughput quota during a session, you'll receive an\nerror message requesting that you try again later. Once you're within your quota,\nyou can resume sending requests. The session state, including the session memory,\nare available as long as the session is live.\n\nThis example illustrates how two consecutive requests are processed by\nincluding the tokens from the session memory.\n\n#### Request#1 details\n\n**Duration**: 10 seconds\n\n**Tokens sent (audio)**: 10 seconds x 25 tokens/second = 250 tokens\n\n**Tokens sent (video)**: 10 seconds x 258 tokens/frame per second = 2580 tokens\n\n**Total tokens processed for Request#1**:\n\n- **Tokens sent**: Sum of audio and video tokens sent = 2580+250 = 2830 tokens\n- **Tokens received**: 100 (audio)\n\n#### Request#2 details\n\n**Duration**: 40 seconds\n\n**Tokens sent (audio)**: 40 seconds x 25 tokens/second = 1000 tokens\n\n**Total tokens processed for Request#2**:\n\n- **Tokens sent**: Tokens sent in Request#2 + session memory tokens from Request#1 = 2830 tokens + 1000 tokens = 3830 tokens\n- **Tokens received**: 200 (audio)\n\n#### Calculate the number of tokens processed in the requests\n\nThe number of tokens processed during these requests is calculated, as follows:\n\n- Request#1 processes only the input and output tokens from\n the ongoing request, as there are no additional tokens in the session\n memory.\n\n- Request #2 processes the input and output tokens from\n the ongoing request, but also includes the input tokens from the\n session memory, consisting of the input tokens from the preceding request\n (Request #1) from the session memory. The burndown rate for tokens in the session\n memory is the same as that for standard input tokens\n (1 input session memory token = 1 input token).\n\n If Request#2 took exactly 1 second to process after you sent it,\n your tokens are processed and applied to your Provisioned Throughput quota, as follows:\n - Multiply your inputs by the burndown rates to get the total input tokens:\n\n 2830 x (1 token per session memory token) + 1000 x (1 token per input text token) = 3830 burndown adjusted input tokens per query\n - Multiply your outputs by the burndown rates to get the total output tokens:\n\n 200 x (6 tokens per audio output token) = 1,200 tokens\n - Add these two totals to get the total number of tokens processed:\n\n 3,830 tokens + 1,200 tokens = 5,030 tokens\n\nIf your Provisioned Throughput quota is greater than 5,030 tokens\nper second, then this request can be processed immediately. If it's less, the\ntokens are processed over time at the rate that you've set for your quota.\n\nWhat's next\n-----------\n\n- [Purchase Provisioned Throughput](/vertex-ai/generative-ai/docs/purchase-provisioned-throughput)."]]