Operational guidelines for SQL Server instances

The Cloud SQL SLA agreement excludes outages "caused by factors outside of Google’s reasonable control". This page describes some of the user-controlled configurations that can cause an outage for a Cloud SQL instance to be excluded.

Introduction

Cloud SQL strives to give you as much control over how your instance is configured as possible. This includes some configurations that increase the risk of instance downtime, depending on the load and other configuration parameters. If your instance goes down, and Cloud SQL determines that it was out of compliance with the operational limits as described on this page, then the downtime period is not covered by (or does not count against) the Cloud SQL SLA agreement.

This list of operational limits is presented to inform you which configurations present these risks, ways to avoid inadvertently moving into one of these configurations, and ways to mitigate the risks when the configuration is required for your business environment.

Excluded configurations

The excluded configurations fall into the following categories:

  • General configuration requirements
  • Database flag values
  • Resource constraints

General configuration requirements

Only Cloud SQL instances configured for high availability with at least one dedicated CPU are covered by the SLA. Shared-core instances and single-zone instances are not covered by the SLA.

If the instance is configured and used in a way that the workload overloads the instance, then the SLA does not apply. Some examples of this are cases where:

  • A combination of work_mem, specific workload queries and number of parallel active connections cause the system to run out of memory, resulting in PostgreSQL worker backends crashing with resulting recovery operations run by postgreSQL.
  • A combination of checkpoint_timeout, max_wal_size and a high workload, possibly together with an underpowered VM size, result in a situation where recovery (WA replay) takes a long time.
  • Very long transactions running together with workloads that create a large number of temporary files make it very hard for autovacuum to keep up, can result in table bloat and performance drop.

These examples are not a complete list as there are a lot of ways to overload PostgreSQL database. We strongly advise you to set up alerts and monitoring in Cloud Monitoring.

Database flag values

Cloud SQL lets you configure your instance using database flags. Some of these flags can be set in ways that might compromise the stability or the instance or durability of its data.

Resource constraints

The following resource constraints must be avoided to retain SLA coverage:

Constraint Description Detection Remedy Prevention
Storage full If your instance runs out of storage, and the automatic storage increase capability is not enabled, your instance goes offline; this outage is not covered by the SLA. You can view the amount of storage your instance is using on the Instance details page in the Cloud Console. Learn more.

To monitor your storage usage and receive alerts at a specified threshold, set up a Stackdriver alert. Learn more.

Increase the storage size for the instance. Although storage size can be increased, it cannot be decreased. Enable automatic storage increase for the instance. Learn more.
CPU overloaded If CPU utilization is over 98% for 6 hours, your instance is not properly sized for your workload, and it is not covered by the SLA. You can view the percentage of available CPU your instance is using on the Instance details page in the Cloud Console. Learn more.

To monitor your CPU usage and receive alerts at a specified threshold, set up a Stackdriver alert. Learn more.

Increase the number of CPUs for your instance. Note that changing CPUs requires an instance restart.

If your instance is already at the maximum number of CPUs, shard your database to multiple instances.

Monitor CPU usage and increase when necessary. Note that changing your instance CPUs requires a restart.