The Cloud SQL SLA agreement excludes outages "caused by factors outside of Google’s reasonable control". This page describes some of the user-controlled configurations that can cause an outage for a Second Generation Cloud SQL for MySQL instance to be excluded.
Operational guidelines are not yet available for Cloud SQL for PostgreSQL instances, but the same principles apply for PostgreSQL instances.
Cloud SQL strives to give you as much control over how your instance is configured as possible. This includes some configurations that increase the risk of instance downtime, depending on the load and other configuration parameters. If your instance goes down, and Cloud SQL determines that it was out of compliance with the operational limits as described on this page, then the downtime period is not covered by (or does not count against) the Cloud SQL SLA agreement.
This list of operational limits is presented to inform you which configurations present these risks, ways to avoid inadvertently moving into one of these configurations, as well as ways to mitigate the risks when the configuration is required for your business environment.
The excluded configurations fall into the following categories:
- General configuration requirements
- Database flag values
- Resource constraints
General configuration requirements
Only Cloud SQL instances configured for high availability with at least one dedicated CPU are covered by the SLA. Shared-core instances and single-zone instances are not covered by the SLA.
Database flag values
Cloud SQL provides the ability to configure your instance using database flags. A few of these flags can be set in ways that might compromise the stability or the instance or durability of its data.
The following table lists the flags that have values that result in an SLA exclusion:
|Flag||Description||Excluded Setting||Potential Impact||Mitigation|
||Enables the MySQL general log.||On, with the log_output flag set to
||Slow restarts.||Set |
||Enables the MySQL slow query log.||On, with the log_output flag set to
||Determines the size of the memory table.||Greater than the default value.||Instance outage due to out of memory (OOM) error.||Retain the default setting.|
||Determines the size of the temp table.||Greater than the default value.||Instance outage due to out of memory (OOM) error.||Retain the default setting, or carefully plan your workload to avoid exceeding instance capacity.|
||Together, these flags determine the size of the query cache.||Greater than the default value.||Instance outage due to out of memory (OOM) error.||Retain the default setting, or carefully plan your workload to avoid exceeding instance capacity.|
The following resource constraints must be avoided to retain SLA coverage:
|Storage full||If your instance runs out of storage, and the automatic storage increase capability is not enabled, your instance goes offline; this outage is not covered by the SLA.||You can view the amount of storage your instance is using
on the Instance details page in the GCP Console.
To monitor your storage usage and receive alerts at a specified threshold, set up a Stackdriver alert. Learn more.
|Increase the storage size for the instance. Note that storage size can be increased, but it cannot be decreased.||Enable automatic storage increase for the instance. Learn more.|
|CPU overloaded||If CPU utilization is over 98% for 6 hours, your instance is not properly sized for your workload, and it is not covered by the SLA.||You can view the percentage of available CPU your instance is using
on the Instance details page in the GCP Console.
To monitor your CPU usage and receive alerts at a specified threshold, set up a Stackdriver alert. Learn more.
Increase the number of CPUs for your instance. Note that changing
your tier requires an instance restart.
If your instance is already at the maximum number of CPUs, you must shard your database to multiple instances.
|Monitor CPU usage and increase when necessary. Note that changing your instance tier requires a restart.|
|Replication lag too large||Failover downtime caused by replication lag greater than 1200 seconds is not counted against the SLA for the instance.||You can monitor replication lag using the
||Throttle the incoming load on the master, or shard the database.||Create an alert for replication lag, and take corrective action as needed. Learn more.|
|Too many database tables||If you have 10,000 or more database tables on a single instance, it could result in the instance becoming unresponsive or unable to perform maintenance operations, and the instance is not covered by the SLA.||To see how many tables there are on your instance:
Reduce the number of tables to less than 10,000.
If you cannot immediately reduce the number of tables, you can
reduce the likelihood of your instance being impacted by the
high table count by setting the
|If your data architecture requires a large number of tables, split the data across several instances.|