Topics in this page include:
- Backup and recovery
- Creating instances
- High availability
- Import and export
- Managing instances
If you're looking for information about specific error messages, check the error messages page.
Backup and recovery
|You can't see the current operation's status.||The Google Cloud Console reports only success or failure when the operation
is done. It isn't designed to show warnings or other updates.
|You want to find out who issued an on-demand backup operation.||The user interface doesn't show the user who started an operation.
Look in the logs and filter by text to find the user. You may need to use audit logs for private information. Relevant log files include:
|You can't do a backup after an instance was deleted.||The grace period for a Cloud SQL instance purge is four days, with
the exception of read replicas, which are purged immediately.
During this time,
customer support can recreate the instance. After instances are
purged, no data recovery is possible.
If you have done an export operation, you can create a new instance and then do an import operation to recreate the database. Exports are written to Cloud Storage and imports are read from there.
|An automated backup is stuck for many hours and can't be canceled.||Backups can take a long time depending on the database size.
If you really need to cancel the operation, you can ask
customer support to
|A restore operation can fail when one or more users referenced in the SQL dump file don't exist.||Before restoring a SQL dump, all the database users who own objects or
were granted permissions on objects in the dumped database must exist in the
target database. If they don't, the restore operation fails to recreate the
objects with the original ownership or permissions.
Create the database users before restoring the SQL dump.
|You want to increase the number of days that you can keep automatic backups from seven to 30 days, or longer.||Only seven backups are retained. Backups get pruned regularly due to the
cost and size of retaining backups. Unfortunately, this means that the
currently visible backups are the only automated backups you can restore
To keep backups indefinitely, you can create an on-demand backup, as they are not deleted in the same way as automated backups. On-demand backups remain indefinitely. That is, they remain until they're deleted or the instance they belong to is deleted. Because that type of backup is not deleted automatically, it can affect billing.
|A backup failed and you see an
||The backup operation might have timed out.
There are two flags that influence the backup creation:
For this reason, a timeout occurs. Decreasing the value of the
|An automated backup failed and you didn't receive an email notification.||Notifications aren't supported for backup failures.
When an automated backup fails, an
You can find the status of a backup through either the
gcloud sql backups list \ --project=PROJECT_ID \ --instance=INSTANCE_ID
gcloud sql backups describe BACKUP-ID \ --instance=INSTANCE_ID
|An instance is repeatedly failing because it is cycling between the failure and backup restore states. Attempts to connect to and use the database following restore fail.||
Things to try:
|You find you are missing data when performing a backup/restore operation.||Tables were created as unlogged. For example:
These tables are not included in a restore from a backup:
The solution is to avoid using unlogged tables if you want to restore those
tables through a backup. If you're restoring from a database that already
has unlogged tables, then you can dump the database to a file, and reload the
data after modifying the dumped file to
|Cloning fails with
||The cloning operation is blocked by the
||The issue might be:
Applications must tolerate network failures and follow best practices such as connection pooling and retrying. Most connection poolers catch these errors where possible. Otherwise the application must either retry or fail gracefully.
For connection retry, we recommend the following methods:
Combining these methods helps reduce throttling.
Connect with the default user, then change users.
|You want to find out who is connected.||Log into the database and run this command:
SELECT datname, usename, application_name as appname, client_addr, state, now() - backend_start as conn_age, now() - state_change as last_activity_age FROM pg_stat_activity WHERE backend_type = 'client backend' ORDER BY 6 DESC LIMIT 20
|You get the error message:
||There are no more available addresses in the allocated IP range. There
can be several possible scenarios:
For each of the above scenarios you can elect to either expand the existing allocated IP range or allocate an additional IP range to the private service connection.
If you used the
If you're allocating a new range, take care that the allocation doesn't overlap with any existing allocations.
After creating a new IP range, update the vpc peering with the following command:
gcloud services vpc-peerings update \ --service=servicenetworking.googleapis.com \ --ranges=OLD_RESERVED_RANGE_NAME,NEW_RESERVED_RANGE_NAME \ --network=VPC_NETWORK \ --project=PROJECT_ID \ --force
If you're expanding an existing allocation, take care to increase only the allocation range and not decrease it. For example, if the original allocation was 10.0.10.0/24, then make the new allocation at least 10.0.10.0/23.
In general, if starting from a /24 allocation, decrementing the /mask by 1 for each condition (additional instance type group, additional region) is a good rule of thumb. For example, if trying to create both instance type groups on the same allocation, going from /24 to /23 is enough.
After expanding an existing IP range, update the vpc peering with following command:
gcloud services vpc-peerings update \ --service=servicenetworking.googleapis.com \ --ranges=RESERVED_RANGE_NAME \ --network=VPC_NETWORK \ --project=PROJECT_ID
|CSV export worked but SQL export failed.||CSV and SQL formats do export differently. The SQL format exports the
entire database, and likely takes longer to complete. The CSV format lets
you define which elements of the database to include in the export.
Use CSV exports to export only what you need.
|Export is taking too long.||Cloud SQL does not support concurrent synchronous operations.
Use export offloading. At a high level, in export offloading, instead of issuing an export on the source instance, Cloud SQL spins up an offload instance to perform the export. Export offloading has several advantages, including increased performance on the source instance and the unblocking of administrative operations while the export is running. With export offloading, total latency can increase by the amount of time it takes to bring up the offload instance. Generally, for reasonably sized exports, latency is not significant. However, if your export is small enough, then you may notice the increase in latency.
|Create Extension error.||The dump file contains references to unsupported extension.|
||The tool requires superuser role, and the superuser role is not supported in Cloud SQL for PostgreSQL.|
|The export operation times out before exporting anything, and you see
the error message
||If Cloud Storage does not receive any data within a certain
time frame, typically around seven minutes, the connection resets. It's
possible the initial export query is taking too long to run.
Do a manual export using the
|You want exports to be automated.||Cloud SQL does not provide a way to automate exports.
You could build your own automated export system using Google Cloud products such as Cloud Scheduler, Pub/Sub, and Cloud Functions, similar to this article on automating backups.
|There's no flag to set the time zone.||PostgreSQL and SQL Server for Cloud SQL do not support a time zone
flag to adjust timezone based on user needs.
There are some workarounds.
You can set the time zone per session, but this will expire when you log off. A better solution is to connect to the database and set the database timezone to the desired one either per user or per database:
ALTER DATABASE dbname SET TIMEZONE TO 'timezone'; ALTER USER username SET TIMEZONE TO 'timezone';
These settings will remain even after the session is closed, which mimics
|You can't find the metrics for a manual failover.||Only automatic failovers go into the metrics.|
|Cloud SQL instance resources (CPU and RAM) are near 100% usage, causing the high availability instance to go down.||The instance machine size is too small for the load.
Edit the instance to upgrade to a larger machine size to get more CPUs and memory.
|The import operation is taking too long.||Too many active connections can interfere with import operations.
Close unused operations. Check the CPU and memory usage of your Cloud SQL instance to make sure there are plenty of resources available. The best way to ensure maximum resources for the import is to restart the instance before beginning the operation.
|An import operation can fail when one or more users referenced in the dump file don't exist.||Before importing a dump file, all the database users who own objects or
were granted permissions on objects in the dumped database must exist in the
target database. If they don't, the import operation fails to recreate the
objects with the original ownership or permissions.
Create the database users before importing.
|Logging uses a lot of CPU and memory on your Cloud SQL instance.||Logging needs to be tuned.
|Audit logs are not found.||Data-Access logs are only written if the operation is an authenticated user-driven API call that creates, modifies, or reads user-created data, or if the operation accesses configuration files or metadata of resources.|
|Operations information is not found in logs.||You want to find more information about an operation.
For example, a user was deleted but you can't find out who did it. The logs show the operation started but don't provide any more information. You must enable audit logging for detailed and personal identifying information (PII) like this to be logged.
|Some logs are filtered from the
||Filtered logs include
AD logs without timestamps, and include:
|Log files are hard to read.||You'd rather view the logs as json or text.You can use the
To download the logs as JSON:
gcloud logging read \ "resource.type=cloudsql_database \ AND logName=projects/PROJECT_ID \ /logs/cloudsql.googleapis.com%2FLOG_NAME" \ --format json \ --project=PROJECT_ID \ --freshness="1d" \ > downloaded-log.json
To download the logs as TEXT:
gcloud logging read \ "resource.type=cloudsql_database \ AND logName=projects/PROJECT_ID \ /logs/cloudsql.googleapis.com%2FLOG_NAME" \ --format text \ --project=PROJECT_ID \ --freshness="1d"| jq -rnc --stream 'fromstream(1|truncate_stream(inputs)) \ | .textPayload' \ > downloaded-log.txt
|Query logs are not found in PostgreSQL logs.||You need to enable the pgaudit flags.
|You want to find out what queries are running now.||Connect to the database and run the following query:
|You want to find out what units are being used for a specific field.||Connect to the database and run the following query
(using your own
|You want to find the current value of a database setting.||Connect to the database and run the following query
(using your own
|You want to stop a blocked background process.||The user needs to have the
Run the following commands:
|Instance is nearing 100% consumption of transaction IDs.||Your internal monitoring warns that the instance is nearing 100%
consumption of transaction IDs. You want to avoid transaction wraparound,
which can block writes.
The autovacuum job might be blocked, or might not be reclaiming the transaction IDs fast enough to keep up with the workload.
In order to avoid any outages due to transaction wraparound problem, you can review these self-servicing tips for dealing with TXID wraparound.
For general tuning advice, see Optimizing, monitoring, and troubleshooting vacuum operations in PostgreSQL.
|Temporary storage increased automatic storage.||Automatic storage is enabled.
Restart deletes the temporary files but not reduce the storage. Only customer support can reset the instance size.
|Data is being automatically deleted.||Most likely a script is running somewhere in your environment.
Look in the logs around the time of the deletion and see if there's a rogue script running from a dashboard or another automated process.
|The instance cannot be deleted.||You might see the error message
Some possible explanations include:
|The instance is stuck due to large temporary data size.||The system can create many temporary tables at one time, depending on
the queries and the load.
Unfortunately, you can't shrink the
One mitigation option is to create the temporary table with
|Fatal error during upgrade.||Logs may reveal more, but in any case customer support may be needed to force re-create the instance.|
|Instance is stuck on restart after running out of disk space.||Automatic storage increase capability isn't enabled.
If your instance runs out of storage, and the automatic storage increase capability isn't enabled, your instance goes offline. To avoid this issue, you can edit the instance to enable automatic storage increase.
|Your on-premises primary instance is stuck.||Google Cloud can't help with instances that are not in Cloud SQL.|
|Slow shutdown on restart.||When an instance shuts down, any outstanding connections that don't
end within 60 seconds make the shutdown unclean.
By having connections that last less than 60 seconds, most unclean shutdowns can be avoided, including connections from the database command prompt. If you keep these connections open for hours or days, shutdowns can be unclean.
|A user cannot be deleted.||The user probably has objects in the database that depend on it. You
need to drop those objects or reassign them to another user.
Find out which objects are dependent on the user, then drop or reassign those objects to a different user.This thread on Stack Exchange discusses how to find the objects owned by the user.
|Particular queries are running slow.||Queries can be slow for many reasons, mostly due to specific database
aspects. One reason that can involve Cloud SQL is network latency,
when the source (writer or reader) resource and the destination
(Cloud SQL) resource are in different regions.
Refer to general performance tips in particular.
For slow database inserts, updates, or deletes, consider the following actions:
To reduce the latency the recommendation is to locate both the source and destination resources in the same region.
|Out of memory is indicated but monitoring charts don't show that.||An instance can fail and report
There are other factors beside your workload that can impact memory usage, such as the number of active connections and internal overhead processes. These aren't always reflected in the monitoring charts.
Ensure that the instance has enough overhead to account for your workload plus some additional overhead.
|Recovering a deleted instance.||All data on an instance, including backups, is permanently lost when
that instance is deleted.
The Cloud SQL Admin role includes the permission to delete the instance. To prevent accidental deletion, grant this role only as needed.
|You want to rename an existing Cloud SQL instance.||Renaming an existing instance is not supported.
There are other ways to accomplish the goal by creating a new instance.
In both cases, you can delete your old instance after the operation is done. We recommend going with the cloning route since it has no impact on performance and doesn't require you to redo any instance configuration settings such as flags, machine type, storage size and memory.
|Read replica did not start replicating on creation.||There's probably a more specific error in the log files. Inspect the logs in Cloud Logging to find the actual error.|
|Unable to create read replica - invalidFlagValue error.||One of the flags in the request is invalid. It could be a flag you
provided explicitly or one that was set to a default value.
First, check that the value of the
|Unable to create read replica - unknown error.||There's probably a more specific error in the log files.
Inspect the logs in
Cloud Logging to find the actual error.
If the error is:
|Disk is full.||The primary instance disk size can become full during replica creation. Edit the primary instance to upgrade it to a larger disk size.|
|The replica instance is using too much memory.||The replica uses temporary memory to cache often-requested read
operations, which can lead it to use more memory than the primary instance.
Restart the replica instance to reclaim the temporary memory space.
|Replication stopped.||The maximum storage limit was reached and automatic storage
increase isn't enabled.
Edit the instance to enable
|Replication lag is consistently high.||The write load is too high for the replica to handle. Replication lag
takes place when the SQL thread on a replica is unable to keep up with the
IO thread. Some kinds of queries or workloads can cause temporary or
permanent high replication lag for a given schema. Some of the typical
causes of replication lag are:
Some possible solutions include:
|Errors when rebuilding indexes in PostgreSQL 9.6.||You get an error from PostgreSQL informing you that you need to rebuild a
particular index. This can be done only on the primary instance. If you
create a new replica instance, you soon get the same error again.
are not propagated to replicas in PostgreSQL versions below 10.
If you must use hash indexes, upgrade to PostgreSQL 10+. Otherwise, if you also want to use replicas, don't use hash indexes in PostgreSQL 9.6.
|Replica creation fails with timeout.||Long-running uncommitted transactions on the primary instance can cause
read replica creation to fail.
Recreate the replica after stopping all running queries.