Gemini Enterprise Agent Platform Online Inference Service Level Agreement (SLA)

During the Term of the agreement under which Google has agreed to provide Google Cloud Platform to Customer (as applicable, the "Agreement"), the Covered Service will provide a Monthly Uptime Percentage or Monthly Latency Target Attainment Percentage to Customer as follows (each, a "Service Level Objective" or "SLO"):

Covered Service

Monthly Uptime Percentage

generateContent and streamGenerateContent methods of Gemini Enterprise Agent Platform Online Inference 

99.5%

Covered Service

Monthly Uptime Percentage

generateContent and streamGenerateContent methods of Gemini Enterprise Agent Platform Online Inference 

99.5%

Additionally, Gemini Enterprise Agent Platform Online Inference under Provisioned Throughput will provide the following Service Level Objective:

Covered Service

Monthly Latency Target Attainment Percentage

streamGenerateContent methods of Gemini Online Inference on Gemini Enterprise Agent Platform Online Inference under Provisioned Throughput consumption model for a Covered Model from a Covered Endpoint

99%

Covered Service

Monthly Latency Target Attainment Percentage

streamGenerateContent methods of Gemini Online Inference on Gemini Enterprise Agent Platform Online Inference under Provisioned Throughput consumption model for a Covered Model from a Covered Endpoint

99%

If Google does not meet the SLO, and if Customer meets its obligations under this SLA, Customer will be eligible to receive the Financial Credits described below. Monthly Uptime Percentage, Monthly Latency Target Attainment Percentage, and Financial Credits are determined on a calendar month basis per Project. This SLA states Customer's sole and exclusive remedy for any failure by Google to meet the SLO. Capitalized terms used in this SLA, but not defined in this SLA, have the meaning set forth in the Agreement. If the Agreement authorizes the resale or supply of Google Cloud Platform under a Google Cloud partner or reseller program, then all references to Customer in this SLA mean Partner or Reseller (as applicable), and any Financial Credit(s) will only apply for impacted Partner or Reseller order(s) under the Agreement.

Definitions

The following definitions apply to the Monthly Uptime Percentage SLO:

  • "Covered Service" means the generateContent and streamGenerateContent methods of Gemini Enterprise Agent Platform Online Inference.
  • "Downtime" means more than a five percent Error Rate and is measured based on server side Error Rate.
  • "Downtime Period" means a period of five or more consecutive minutes of Downtime. Partial minutes or intermittent Downtime for a period of less than five minutes will not count towards any Downtime Periods.
  • "Error Rate" means the number of Valid Requests that result in an error code response with HTTP Status 5XX divided by the total number of Valid Requests during the Downtime Period.
  • "Financial Credit" means the following:

Monthly Uptime Percentage

Percentage of monthly bill for the Covered Service that does not meet SLO that will be credited to Customer's future monthly bills

99.0% - < 99.5%

10%

95.0% - < 99.0%

25%

< 95.0%

50%

Monthly Uptime Percentage

Percentage of monthly bill for the Covered Service that does not meet SLO that will be credited to Customer's future monthly bills

99.0% - < 99.5%

10%

95.0% - < 99.0%

25%

< 95.0%

50%

  • "Monthly Uptime Percentage" means total number of minutes in a month, minus the number of minutes of Downtime suffered from all Downtime Periods in a month, divided by the total number of minutes in a month.
  • "Valid Requests" are requests that conform to the Documentation, and that would normally result in a non-error response.

The following definitions apply to the Monthly Latency Target Attainment Percentage SLO:

  • "Covered Service" means streamGenerateContent methods of Gemini Online Inference on Gemini Enterprise Agent Platform Online Inference under Provisioned Throughput consumption model for a Covered Model from a Covered Endpoint.
  • "Monthly Latency Target Attainment Percentage" means for a calendar month, the percentage of 5-minute intervals during which a Covered Model's performance was better than the Latency Target of the specific Covered Model from a Covered Endpoint. The calculation is: 100% minus the total 5-minute Intervals with Latency Target Breach, divided by the total 5-minute intervals in a month.
  • "Covered Models" means Gemini 2.5 Pro, Gemini 2.5 Flash, and Gemini 2.5 Flash-lite.
  • "Covered Endpoint" means the Global Endpoint
  • "Latency Target" varies across Covered Models as follows:

Covered Models

Latency Target (Tokens Per Second)

Gemini 2.5 Pro (gemini-2.5-pro)

60 TPS (Excluding Long Context)

Gemini 2.5 Flash (gemini-2.5-flash)

80 TPS (Excluding Long Context)

Gemini 2.5 Flash-lite (gemini-2.5-flash-lite)

110 TPS (Excluding Long Context)

Covered Models

Latency Target (Tokens Per Second)

Gemini 2.5 Pro (gemini-2.5-pro)

60 TPS (Excluding Long Context)

Gemini 2.5 Flash (gemini-2.5-flash)

80 TPS (Excluding Long Context)

Gemini 2.5 Flash-lite (gemini-2.5-flash-lite)

110 TPS (Excluding Long Context)

  • "5-minute Intervals with Latency Target Breach" means a number of 5-minute intervals during which the p50 TPS of Valid Requests is below the Latency Target. 
  • "Tokens Per Second (TPS)" means a measurement of generation speed. It is calculated by dividing the total number of generated output tokens in a response by the time taken to generate them (measured from the first returned non-thinking token to the last).
  • "Valid Requests" are requests that are eligible for SLA coverage. The request must be a streaming request (streamGenerateContent), generate more than 75 output tokens, and return a successful 200 HTTP status code.
  • "Long Context" means any context that is above 200k. 
  • "Financial Credit" means the following:

Monthly Latency Percentage

Percentage of monthly bill for the Covered Service that does not meet SLO that will be credited to Customer's future monthly bills

95.0% – < 99.0%

10%

90.0% – < 95.0%

25%

< 90.0%

50%

Monthly Latency Percentage

Percentage of monthly bill for the Covered Service that does not meet SLO that will be credited to Customer's future monthly bills

95.0% – < 99.0%

10%

90.0% – < 95.0%

25%

< 90.0%

50%

Customer Must Request Financial Credit

In order to receive any of the Financial Credits described above, Customer must notify Google technical support within 30 days from the time Customer becomes eligible to receive a Financial Credit. Customer must also provide Google with log files showing Downtime Periods or 5-minute Intervals with Latency Target Breach, and the date and time they occurred. If Customer does not comply with these requirements, Customer will forfeit its right to receive a Financial Credit. Further, Customer must be in good standing under the Agreement in order to receive a Financial Credit.

Maximum Financial Credit

The maximum aggregate number of Financial Credits issued by Google to Customer for all missed SLOs in a single billing month will not exceed 50% of the amount due from Customer for the Covered Service that did not meet the SLO for the applicable month. Financial Credits will be in the form of a monetary credit applied to future use of the Covered Service and will be applied within 60 days after the Financial Credit was agreed to by Google.

SLA Exclusions

The SLA does not apply to any (a) features or services designated pre-general availability (unless otherwise set forth in the associated Documentation); (b) features or services excluded from the SLA (in the associated Documentation), including requests made using Grounding with Google Search; or (c) errors (i) caused by factors outside of Google's reasonable control; (ii) that resulted from Customer's software or hardware or third party software or hardware; (iii) that resulted from Customer’s setting a deadline shorter than the current server default (i.e., "deadline_exceeded" errors); (iv) that resulted from abuses or other behaviors that violate the Agreement; or (v) that resulted from quotas applied by the system or listed in the Documentation or Admin Console.

Previous versions (Last modified May 19, 2026)
Google Cloud