Operational Guidelines

The Cloud SQL SLA agreement excludes outages "caused by factors outside of Google’s reasonable control". This page describes some of the user-controlled configurations that can cause an outage for a Second Generation Cloud SQL for MySQL instance to be excluded.

Operational guidelines are not yet available for Cloud SQL for PostgreSQL instances, but the same principles apply for PostgreSQL instances.

Introduction

Cloud SQL strives to give you as much control over how your instance is configured as possible. This includes some configurations that increase the risk of instance downtime, depending on the load and other configuration parameters. If your instance goes down, and Cloud SQL determines that it was out of compliance with the operational limits as described on this page, then the downtime period is not covered by (or does not count against) the Cloud SQL SLA agreement.

This list of operational limits is presented to inform you which configurations present these risks, ways to avoid inadvertently moving into one of these configurations, as well as ways to mitigate the risks when the configuration is required for your business environment.

Excluded configurations

The excluded configurations fall into the following categories:

  • General configuration requirements
  • Database flag values
  • Resource constraints

General configuration requirements

Only Cloud SQL instances configured for high availability with at least one dedicated CPU are covered by the SLA. Shared-core instances and single-zone instances are not covered by the SLA.

Database flag values

Cloud SQL provides the ability to configure your instance using database flags. A few of these flags can be set in ways that might compromise the stability or the instance or durability of its data.

The following table lists the flags that have values that result in an SLA exclusion:

Flag Description Excluded Setting Potential Impact Mitigation
general_log Enables the MySQL general log. On, with the log_output flag set to TABLE Slow restarts. Set log_output flag to FILE
slow_query_log Enables the MySQL slow query log. On, with the log_output flag set to TABLE Slow restarts. Set log_output flag to FILE
max_heap_table_size Determines the size of the memory table. Greater than the default value. Instance outage due to out of memory (OOM) error. Retain the default setting.
temp_table_size Determines the size of the temp table. Greater than the default value. Instance outage due to out of memory (OOM) error. Retain the default setting, or carefully plan your workload to avoid exceeding instance capacity.
query_cache_size and query_cache_type Together, these flags determine the size of the query cache. Greater than the default value. Instance outage due to out of memory (OOM) error. Retain the default setting, or carefully plan your workload to avoid exceeding instance capacity.

Resource constraints

The following resource constraints must be avoided to retain SLA coverage:

Constraint Description Detection Remedy Prevention
Storage full If your instance runs out of storage, and the automatic storage increase capability is not enabled, your instance goes offline; this outage is not covered by the SLA. You can view the amount of storage your instance is using on the Instance details page in the GCP Console. Learn more.

To monitor your storage usage and receive alerts at a specified threshold, set up a Stackdriver alert. Learn more.

Increase the storage size for the instance. Note that storage size can be increased, but it cannot be decreased. Enable automatic storage increase for the instance. Learn more.
CPU overloaded If CPU utilization is over 98% for 6 hours, your instance is not properly sized for your workload, and it is not covered by the SLA. You can view the percentage of available CPU your instance is using on the Instance details page in the GCP Console. Learn more.

To monitor your CPU usage and receive alerts at a specified threshold, set up a Stackdriver alert. Learn more.

Increase the number of CPUs for your instance. Note that changing your tier requires an instance restart.

If your instance is already at the maximum number of CPUs, you must shard your database to multiple instances.

Monitor CPU usage and increase when necessary. Note that changing your instance tier requires a restart.
Replication lag too large Failover downtime caused by replication lag greater than 1200 seconds is not counted against the SLA for the instance. You can monitor replication lag using the Seconds Behind Master metric on the failover replica. Throttle the incoming load on the master, or shard the database. Create an alert for replication lag, and take corrective action as needed. Learn more.
Too many database tables If you have 10,000 or more database tables on a single instance, it could result in the instance becoming unresponsive or unable to perform maintenance operations, and the instance is not covered by the SLA. To see how many tables there are on your instance: SELECT COUNT(*) FROM information_schema.tables; To see how many tables there are in each database: SELECT TABLE_SCHEMA,COUNT(*) FROM information_schema.tables group by TABLE_SCHEMA; Reduce the number of tables to less than 10,000.

If you cannot immediately reduce the number of tables, you can reduce the likelihood of your instance being impacted by the high table count by setting the innodb_file_per_table flag to OFF; however, this setting does not bring the instance back into SLA compliance.

If your data architecture requires a large number of tables, split the data across several instances.
Was this page helpful? Let us know how we did:

Send feedback about...

Cloud SQL Documentation