This page includes tips for troubleshooting Cloud SQL issues for supported database engines. Some of these tips apply only to specific database engines, while others are common to all.
For troubleshooting tips for specific database engines, see their individual pages:
Check if your question or problem has already been addressed on one of the following pages:
Topics in this page include:
- Backup and recovery
- Cloning
- Connectivity
- Creating instances
- External primary
- External replica
- Flags
- High availability
- Import and export
- Integrate with Vertex AI
- Linked servers
- Logging
- Managing instances
- Private Service Connect
- Replication
Backup and recovery
Issue | Troubleshooting |
---|---|
You can't see the current operation's status. | The Google Cloud console reports only success or failure when the operation
is done. It isn't designed to show warnings or other updates.
Run the
|
You want to find out who issued an on-demand backup operation. | The user interface doesn't show the user who started an operation.
Look in the logs and filter by text to find the user. You may need to use audit logs for private information. Relevant log files include:
|
After an instance is deleted, you can't take a backup of the instance. | After an instance is purged, no data recovery is possible. However, if the instance is restored, then its backups are also restored. For more information on recovering a deleted instance, see Recovery backups. If you have done an export operation, create a new instance and then do an import operation to recreate the database. Exports are written to Cloud Storage and imports are read from there. |
An automated backup is stuck for many hours and can't be canceled. | Backups can take a long time depending on the database size.
If you really need to cancel the operation, you can ask
customer support to |
A restore operation can fail when one or more users referenced in the SQL dump file don't exist. | Before restoring a SQL dump, all the database users who own objects or
were granted permissions on objects in the dumped database must exist in the
target database. If they don't, the restore operation fails to recreate the
objects with the original ownership or permissions.
Create the database users before restoring the SQL dump. |
You want to increase the number of days that you can keep automatic backups from seven to 30 days, or longer. | You can
configure the number of automated backups to retain. Automated backups get pruned
regularly based on the retention value configured. Unfortunately, this means that the
currently visible backups are the only automated backups you can restore from.
To keep backups indefinitely, you can create an on-demand backup, as they are not deleted in the same way as automated backups. On-demand backups remain indefinitely. That is, they remain until they're deleted or the instance they belong to is deleted. Because that type of backup is not deleted automatically, it can affect billing. |
An automated backup failed and you didn't receive an email notification. | To have Cloud SQL notify you of the backup's status, configure a log-based alert. |
An instance is repeatedly failing because it is cycling between the failure and backup restore states. Attempts to connect to and use the database following restore fail. |
Things to try:
|
You find you are missing data when performing a backup/restore operation. | Tables were created as unlogged. For example:
These tables are not included in a restore from a backup:
The solution is to avoid using unlogged tables if you want to restore those
tables through a backup. If you're restoring from a database that already
has unlogged tables, then you can dump the database to a file, and reload the
data after modifying the dumped file to |
Clone
Issue | Troubleshooting |
---|---|
Cloning fails with constraints/sql.restrictAuthorizedNetworks error. |
The cloning operation is blocked by the Authorized Networks configuration.
Authorized Networks are configured for public IP addresses in the Connectivity section
of the Google Cloud console, and cloning is not permitted due to
security considerations.
Remove all |
Error message: Failed to create subnetwork. Couldn't find free
blocks in allocated IP ranges. Please allocate new ranges for this service
provider. Help Token: [help-token-id]. |
You're trying to use the Google Cloud console to clone an instance with a private IP address, but you didn't specify the allocated IP range that you want to use and the source instance isn't created with the specified range. As a result, the cloned instance is created in a random range. Use |
Connect
Issue | Troubleshooting |
---|---|
Aborted connection . |
The issue might be:
Applications must tolerate network failures and follow best practices such as connection pooling and retrying. Most connection poolers catch these errors where possible. Otherwise the application must either retry or fail gracefully. For connection retry, we recommend the following methods:
Combining these methods helps reduce throttling. |
FATAL: database 'user' does not exist . |
gcloud sql connect --user only works with the default
postgres user.
Connect with the default user, then change users. |
You want to find out who is connected. | Log into the database and run this command:
SELECT datname, usename, application_name as appname, client_addr, state, now() - backend_start as conn_age, now() - state_change as last_activity_age FROM pg_stat_activity WHERE backend_type = 'client backend' ORDER BY 6 DESC LIMIT 20 |
Create instances
Issue | Troubleshooting |
---|---|
Error message: Failed to create subnetwork. Router status is
temporarily unavailable. Please try again later. Help Token:
[token-ID] . |
Try to create the Cloud SQL instance again. |
Error message: Failed to create subnetwork. Required
'compute.projects.get' permission for PROJECT_ID . |
When you create an instance using with a Private IP address, a service account is created just-in-time using the Service Networking API. If you have only recently enabled the Service Networking API, then the service account might not get created and the instance creation fails. In this case, you must wait for the service account to propagate throughout the system or manually add it with the required permissions. |
Export
Issue | Troubleshooting |
---|---|
HTTP Error 409: Operation failed because another operation was
already in progress. |
There is already a pending operation for your instance. Only one operation is allowed at a time. Try your request after the current operation is complete. |
HTTP Error 403: The service account does not have the required
permissions for the bucket. |
Ensure that the bucket exists and the service account for the Cloud SQL
instance (which is performing the export) has the
Storage Object Creator role
(roles/storage.objectCreator ) to allow export to the bucket. See
IAM roles for Cloud Storage. |
CSV export worked but SQL export failed. | CSV and SQL formats do export differently. The SQL format exports the
entire database, and likely takes longer to complete. The CSV format lets
you define which elements of the database to include in the export.
Use CSV exports to export only what you need. |
Export is taking too long. | Cloud SQL does not support concurrent synchronous operations.
Use export offloading. At a high level, in export offloading, instead of issuing an export on the source instance, Cloud SQL spins up an offload instance to perform the export. Export offloading has several advantages, including increased performance on the source instance and the unblocking of administrative operations while the export is running. With export offloading, total latency can increase by the amount of time it takes to bring up the offload instance. Generally, for reasonably sized exports, latency is not significant. However, if your export is small enough, then you may notice the increase in latency. |
Create Extension error. | The dump file contains references to unsupported extension. |
Error using pg_dumpall . |
Using the pg_dumpall utility with the --global flag
requires the
superuser role, but
this role isn't supported in Cloud SQL. To prevent errors from
occurring while performing export operations that include user names, also use the
--no-role-passwords flag.
|
The export operation times out before exporting anything, and you see
the error message Could not receive data from client: Connection reset
by peer. |
If Cloud Storage does not receive any data within a certain
time frame, typically around seven minutes, the connection resets. It's
possible the initial export query is taking too long to run.
Do a manual export using the
|
You want exports to be automated. | Cloud SQL does not provide a way to automate exports.
You could build your own automated export system using Google Cloud products such as Cloud Scheduler, Pub/Sub, and Cloud Run functions, similar to this article on automating backups. |
External primary
Issue | Troubleshooting |
---|---|
Lost connection to MySQL server during query when dumping table . |
The source may have become unavailable, or the dump contained packets
too large.
Make sure the external primary is available to connect. You can also modify the values of the net_read_timeout and net_write_timeout flags on the source instance to stop the error. For more information on the allowable values for these flags, see Configure database flags. To learn more about using |
The initial data migration was successful, but no data is being replicated. | One possible root cause could be your source database has defined
replication flags which result in some or all database changes not being
replicated over.
Make sure the replication flags such as Run the command |
The initial data migration was successful but data replication stops working after a while. | Things to try:
|
mysqld check failed: data disk is full . |
The data disk of the replica instance is full.
Increase the disk size of the replica instance. You can either manually increase the disk size or enable auto storage increase. |
External replica
Issue | Troubleshooting |
---|---|
Error message: The slave is connecting ... master has purged
binary logs containing GTIDs that the slave requires . |
The primary Cloud SQL instance has automatic backups and binary
logs and point-in-time recovery is enabled, so it should have enough logs
for the replica to be able to catch up. However, in this case although the
binary logs exist, the replica doesn't know which row to start reading from.
Create a new dump file using the correct flag settings, and configure the external replica using that file
|
Flags
Issue | Troubleshooting |
---|
High availability
Issue | Troubleshooting |
---|---|
You can't find the metrics for a manual failover. | Only automatic failovers go into the metrics. |
Cloud SQL instance resources (CPU and RAM) are near 100% usage, causing the high availability instance to go down. | The instance machine size is too small for the load.
Edit the instance to upgrade to a larger machine size to get more CPUs and memory. |
Import
Issue | Troubleshooting |
---|---|
HTTP Error 409: Operation failed because another operation was already in progress . |
There is already a pending operation for your instance. Only one operation is allowed at a time. Try your request after the current operation is complete. |
The import operation is taking too long. | Too many active connections can interfere with import operations.
Close unused operations. Check the CPU and memory usage of your Cloud SQL instance to make sure there are plenty of resources available. The best way to ensure maximum resources for the import is to restart the instance before beginning the operation. A restart:
|
An import operation can fail when one or more users referenced in the dump file don't exist. | Before importing a dump file, all the database users who own objects or
were granted permissions on objects in the dumped database must exist in the
target database. If they don't, the import operation fails to recreate the
objects with the original ownership or permissions.
Create the database users before importing. |
An import operation fails with an error that a table doesn't exist. | Tables can have foreign key dependencies on other tables, and depending on
the order of operations, one or more of those tables might not yet exist
during the import operation.
Things to try: Add the following line at the start of the dump file: SET FOREIGN_KEY_CHECKS=0; Additionally, add this line at the end of the dump file: SET FOREIGN_KEY_CHECKS=1; These settings deactivate data integrity checks while the import operation is in progress, and reactivate them after the data is loaded. This doesn't affect the integrity of the data on the database, because the data was already validated during the creation of the dump file. |
Integrate with Vertex AI
Issue | Troubleshooting |
---|---|
Error message: Google ML integration API is supported only on Postgres version 12 or above. |
To enable the Vertex AI integration in Cloud SQL, you must have a Cloud SQL for PostgreSQL database, version 12 or later. To upgrade your database to this version, see Upgrade the database major version in-place. |
Error message: Google ML Integration API is not supported on shared core instance. Please upsize your machine type. |
If you selected a shared core for the machine type of your instance, then you can't enable the Vertex AI integration in Cloud SQL. Upgrade your machine type to dedicated core. For more information, see Machine Type. |
Error message: Google ML Integration is unsupported for this maintenance version. Please follow https://cloud.google.com/sql/docs/postgres/self-service-maintenance to update the maintenance version of the instance. |
To enable the Vertex AI integration in Cloud SQL, the maintenance version of your instance must be R20240130 or later. To upgrade your instance to this version, see Self-service maintenance. |
Error message: Cannot invoke ml_predict_row if 'cloudsql.enable_google_ml_integration' is off. |
The cloudsql.enable_google_ml_integration database flag is turned off. Cloud SQL can't integrate with Vertex AI.To turn this flag on, use the gcloud sql instances patch command:gcloud sql instances patch INSTANCE_NAME --database-flags cloudsql.enable_google_ml_integration=on Replace INSTANCE_NAME with the name of the primary Cloud SQL instance. |
Error message: Failed to connect to remote host: Connection refused. |
The integration between Cloud SQL and Vertex AI isn't enabled. To enable this integration, use the gcloud sql instances patch command:gcloud sql instances patch INSTANCE_NAME Replace INSTANCE_NAME with the name of the primary Cloud SQL instance. |
Error message: Vertex AI API has not been used in project PROJECT_ID before or it is disabled. Enable it by visiting /apis/api/aiplatform.googleapis.com/overview?project=PROJECT_ID then retry. |
The Vertex AI API isn't enabled. For more information on enabling this API, see Enable database integration with Vertex AI. |
Error message: Permission 'aiplatform.endpoints.predict' denied on resource. |
Vertex AI permissions aren't added to the Cloud SQL service account for the project where the Cloud SQL instance is located. For more information on adding these permissions to the service account, see Enable database integration with Vertex AI. |
Error message: Publisher Model `projects/PROJECT_ID/locations/REGION_NAME/publishers/google/models/MODEL_NAME` not found. |
The machine learning model or the LLM doesn't exist in Vertex AI. |
Error message: Resource exhausted: grpc: received message larger than max. |
The size of the request that Cloud SQL passes to Vertex AI exceeds the gRPC limit of 4 MB per request. |
Error message: Cloud SQL attempts to send a request to Vertex AI. However, the instance is in the %s region, but the Vertex AI endpoint is in the %s region. Make sure the instance and endpoint are in the same region. |
Cloud SQL attempts to send a request to Vertex AI. However, the instance is in one region, but the Vertex AI endpoint is in a different region. To resolve this issue, both the instance and endpoint must be in the same region. |
Error message: The Vertex AI endpoint isn't formatted properly. |
The Vertex AI endpoint isn't formatted properly. For more information, see Use private endpoints for online prediction. |
Error message: Quota exceeded for aiplatform.googleapis.com/online_prediction_requests_per_base_model with base model: textembedding-gecko. |
The number of requests that Cloud SQL passes to Vertex AI exceeds the limit of 1,500 requests per minute per region per model per project. |
Linked servers
Error message | Troubleshooting |
---|---|
Msg 7411, Level 16, State 1, Line 25
|
The DataAccess option is disabled. Run the
following command to enable data access:EXEC sp_serveroption @server='LINKED_SERVER_NAME', @optname='data access', @optvalue='TRUE' Replace LINKED_SERVER_NAME with the name of the linked server. |
Access to the remote server is denied because no
login-mapping exists. (Microsoft SQL Server, Error: 7416)
|
If you have this issue while establishing an encrypted
connection, you need to try another way to provide the user ID when you
access the linked server. To do this, run the following command:
EXEC master.dbo.sp_addlinkedserver @server = N'LINKED_SERVER_NAME', @srvproduct= N'', @provider= N'SQLNCLI', @datasrc= N'TARGET_SERVER_ID', @provstr= N'Encrypt=yes;TrustServerCertificate=yes;User ID=USER_ID' Replace the following:
|
Logging
Issue | Troubleshooting |
---|---|
Audit logs are not found. | Data-Access logs are only written if the operation is an authenticated user-driven API call that creates, modifies, or reads user-created data, or if the operation accesses configuration files or metadata of resources. |
Operations information is not found in logs. | You want to find more information about an operation.
For example, a user was deleted but you can't find out who did it. The logs show the operation started but don't provide any more information. You must enable audit logging for detailed and personal identifying information (PII) like this to be logged. |
Some logs are filtered from the error.log log of a
Cloud SQL for SQL Server instance.
|
Filtered logs include
AD logs without timestamps, and include:
Login failed for user 'x'. Reason: Token-based server access
validation failed with an infrastructure error. Login lacks connect endpoint
permission. [CLIENT: 127.0.0.1] . These logs are filtered because
they potentially can cause confusion.
|
Logging is using a lot of disk space. | There are three kinds of log files that use disk space: redo logs,
general logs and binary logs.
Connect to the database and run these commands for details on each type: SHOW VARIABLES LIKE 'innodb_log_file%'; SELECT ROUND(SUM(LENGTH(argument)/POW(1024,2),2) AS GB from mysql.general_log; SHOW BINARY LOGS; |
Log files are hard to read. | You'd rather view the logs as json or text.You can use the
gcloud logging read
command along with linux post-processing commands to download the logs.
To download the logs as JSON: gcloud logging read \ "resource.type=cloudsql_database \ AND logName=projects/PROJECT_ID \ /logs/cloudsql.googleapis.com%2FLOG_NAME" \ --format json \ --project=PROJECT_ID \ --freshness="1d" \ > downloaded-log.json To download the logs as TEXT: gcloud logging read \ "resource.type=cloudsql_database \ AND logName=projects/PROJECT_ID \ /logs/cloudsql.googleapis.com%2FLOG_NAME" \ --format json \ --project=PROJECT_ID \ --freshness="1d"| jq -rnc --stream 'fromstream(1|truncate_stream(inputs)) \ | .textPayload' \ --order=asc > downloaded-log.txt |
Query logs are not found in PostgreSQL logs. | You need to enable the pgaudit flags.
|
Manage instances
Issue | Troubleshooting |
---|---|
Slow performance after restarting MySQL. | Cloud SQL allows caching of data in the InnoDB buffer pool. However, after a restart, this cache is always empty, and all reads require a round trip to the backend to get data. As a result, queries can be slower than expected until the cache is filled. |
Slow crash recovery. | A large general_log may have accumulated.
You can reduce crash recovery time by preventing a large
general_log from accumulating. If you have general_log
on, truncate the table and only enable general_log for short
periods of time.
You can find out the size of the general logs by connecting to the database and running this query: SELECT ROUND(SUM(LENGTH(argument)/POW(1024,2)),2) from mysql.general_log;
|
You want to find out what is using up storage. | For example, you notice that your database is using only three GB, but
storage says that 14 GB is being used. Most of the space not used by tables
is used by binary logs and/or temporary files.
Things to try:
|
Queries are blocked. | It's possible for queries to lock the MySQL database causing all
subsequent queries to block/timeout.
Connect to the database and execute this query:
The first item in the list may be the one holding the lock, which the subsequent items are waiting on. The |
You are unable to manually delete binary logs. | Binary logs cannot be manually deleted. Binary logs are automatically deleted with their associated automatic backup, which generally happens after about seven days. |
You want to find information about temporary files. | A file named ibtmp1 is used for storing temporary
data. This file is reset upon database restart. To find information about
temporary file usage, connect to the database and
execute the following query:
|
You want to find out about table sizes. | This information is available in the database.
Connect to the database and execute the following query:
|
mysqld got a signal 11. | Try refactoring queries so that they don't create too many connections.
If that doesn't resolve the issue, contact customer support.
Signal 11 usually represents a MySQL software issue.
|
InnoDB: page_cleaner: 1000ms intended loop took 5215ms. The
settings might not be optimal. |
The page cleaner can't keep up with the rate of change on the instance.
Once per second, the page cleaner scans the buffer pool for dirty pages to
flush from the buffer pool to disk. The warning you see shows it has lots
of dirty pages to flush, and it's taking more than one second to flush a
batch of them to disk.
Shard the instance if possible. Using many smaller Cloud SQL instances is better than one large instance. |
You want to find out what queries are running now. | Connect to the database and run the following query:
|
You want to find out what units are being used for a specific field. | Connect to the database and run the following query
(using your own FIELD_NAME ):
|
You want to find the current value of a database setting. | Connect to the database and run the following query
(using your own SETTING_NAME ):
Run |
You want to stop a blocked background process. | The user needs to have the pg_signal_backend role.
Run the following commands:
|
Instance is nearing 100% consumption of transaction IDs. | Your internal monitoring warns that the instance is nearing 100%
consumption of transaction IDs. You want to avoid transaction wraparound,
which can block writes.
The autovacuum job might be blocked, or might not be reclaiming the transaction IDs fast enough to keep up with the workload. In order to avoid any outages due to transaction wraparound problem, you can review these self-servicing tips for dealing with TXID wraparound. For general tuning advice, see Optimizing, monitoring, and troubleshooting vacuum operations in PostgreSQL. |
Temporary storage increased automatic storage. | Automatic storage is enabled.
Restart deletes the temporary files but not reduce the storage. Only customer support can reset the instance size. |
Data is being automatically deleted. | Most likely a script is running somewhere in your environment.
Look in the logs around the time of the deletion and see if there's a rogue script running from a dashboard or another automated process. |
The instance cannot be deleted. | You might see the error message ERROR: (gcloud.sql.instances.delete) HTTP Error
409: The instance or operation is not in an appropriate state to handle the
request , or the instance may have a INSTANCE_RISKY_FLAG_CONFIG
flag status.
Some possible explanations include:
|
The instance is stuck due to large temporary data size. | The system can create many temporary tables at one time, depending on
the queries and the load.
Unfortunately, you can't shrink the One mitigation option is to create the temporary table with
|
Fatal error during upgrade. | Logs may reveal more, but in any case customer support may be needed to force re-create the instance. |
Instance is stuck on restart after running out of disk space. | Automatic storage increase capability isn't enabled.
If your instance runs out of storage, and the automatic storage increase capability isn't enabled, your instance goes offline. To avoid this issue, you can edit the instance to enable automatic storage increase. |
Your on-premises primary instance is stuck. | Google Cloud can't help with instances that are not in Cloud SQL. |
Slow shutdown on restart. | When an instance shuts down, any outstanding connections that don't
end within 60 seconds make the shutdown unclean.
By having connections that last less than 60 seconds, most unclean shutdowns can be avoided, including connections from the database command prompt. If you keep these connections open for hours or days, shutdowns can be unclean. |
A user cannot be deleted. | The user probably has objects in the database that depend on it. You
need to drop those objects or reassign them to another user.
Find out which objects are dependent on the user, then drop or reassign those objects to a different user. |
Particular queries are running slow. | Queries can be slow for many reasons, mostly due to specific database
aspects. One reason that can involve Cloud SQL is network latency,
when the source (writer or reader) resource and the destination
(Cloud SQL) resource are in different regions.
Refer to general performance tips in particular. For slow database inserts, updates, or deletes, consider the following actions:
To reduce the latency the recommendation is to locate both the source and destination resources in the same region. |
Out of memory is indicated but monitoring charts don't show that. | An instance can fail and report Out of memory but the
Google Cloud console or Cloud Monitoring charts seem to show there's still
memory remaining.
There are other factors beside your workload that can impact memory usage, such as the number of active connections and internal overhead processes. These aren't always reflected in the monitoring charts. Ensure that the instance has enough overhead to account for your workload plus some additional overhead. |
Recovering a deleted instance. | All data on an instance, including backups, is permanently lost when
that instance is deleted.
To preserve your data, export it to Cloud Storage before you delete an instance. The Cloud SQL Admin role includes the permission to delete the instance. To prevent accidental deletion, grant this role only as needed. |
You want to rename an existing Cloud SQL instance. | Renaming an existing instance is not supported.
There are other ways to accomplish the goal by creating a new instance.
In both cases, you can delete your old instance after the operation is done. We recommend going with the cloning route since it has no impact on performance and doesn't require you to redo any instance configuration settings such as flags, machine type, storage size and memory. |
Error when deleting an instance. | If deletion protection is enabled for an instance, confirm your plans to delete the instance. Then disable deletion protection before deleting the instance. |
Private Service Connect
Issue | Troubleshooting |
---|---|
The service attachment of the instance doesn't accept the Private Service Connect endpoint. |
|
Replication
Issue | Troubleshooting |
---|---|
Read replica didn't start replicating on creation. | There's probably a more specific error in the log files. Inspect the logs in Cloud Logging to find the actual error. |
Unable to create read replica - invalidFlagValue error. | One of the flags in the request is invalid. It could be a flag you
provided explicitly or one that was set to a default value.
First, check that the value of the If the |
Unable to create read replica - unknown error. | There's probably a more specific error in the log files.
Inspect the logs in
Cloud Logging to find the actual error.
If the error is: |
Disk is full. | The primary instance disk size can become full during replica creation. Edit the primary instance to upgrade it to a larger disk size. |
The replica instance is using too much memory. | The replica uses temporary memory to cache often-requested read
operations, which can lead it to use more memory than the primary instance.
Restart the replica instance to reclaim the temporary memory space. |
Replication stopped. | The maximum storage limit was reached and automatic storage
increase isn't enabled.
Edit the instance to enable |
Replication lag is consistently high. | The write load is too high for the replica to handle. Replication lag
takes place when the SQL thread on a replica is unable to keep up with the
IO thread. Some kinds of queries or workloads can cause temporary or
permanent high replication lag for a given schema. Some of the typical
causes of replication lag are:
Some possible solutions include:
|
Replica creation fails with timeout. | Long-running uncommitted transactions on the primary instance can cause
read replica creation to fail.
Recreate the replica after stopping all running queries. |