Migration from MySQL to Cloud SQL

This document describes how to plan and execute the process of migrating an unpartitioned MySQL 5.7 database to Cloud SQL, a fully managed database service on Google Cloud. This document assumes that you have a MySQL database and are generally familiar with MySQL and Google Cloud concepts.

The concepts discussed in this document apply to other versions of MySQL and to MariaDB, a popular open-source fork of MySQL. However, for versions other than MySQL 5.7, the processes might need minor modification. There are many third-party tools and Cloud SQL Partners that you can use for planning a database migration. This document discusses only native MySQL and Google Cloud functionality, and does not discuss any additional tools or partners.

A companion tutorial walks you through one of the migration processes described in this document.

Overview

Migrating a MySQL database to another environment requires you to move two elements:

  • The data in the database. This is the storage layer of the database.
  • The management system that handles consistent reads and writes to the storage layer. This is the management layer.

The challenge is that these elements have to be constantly in sync for the database to work. Even a small database might get thousands of requests per second. Any delay in transactions or any network problems between the storage and management layers will have negative consequences on the database, on the dependent applications, and on user satisfaction.

Migration methods

MySQL has two features that can help you migrate to Cloud SQL. These features are the basis for two strategies for migrating MySQL databases: export/import migration and external replica promotion migration.

Export/import migration

In the export/import strategy, you export all of the data in the source database storage layer by using the MySQL mysqldump command. You then import this data directly to a new database management layer. This normally requires downtime for the database during the entire migration in order to keep all data in sync.

External replica promotion migration

In the migration strategy of external replica promotion, you create an external database replica and synchronize the existing data to that replica. This can happen with minimal downtime to the existing database.

When you have a replica database, the two databases have different roles that are referred to in this document as primary and replica.

After the data is synchronized, you promote the replica to be the primary in order to move the management layer with minimal impact to database uptime.

In Cloud SQL, an easy way to accomplish the external replica promotion is to use the automated migration workflow. This process automates many of the steps that are needed for this type of migration.

Preparing any MySQL database for migration

Before you perform the migration steps, you need to ensure that your database is prepared for either option. MySQL is installed with some default configurations, and can be tuned and customized according to the requirements of the application. Before you begin a migration, make sure MySQL is configured as described in this section. This helps lower the risk of migration failure and helps limit downtime.

Perform a logical backup

With either migration method, you'll be making changes to your MySQL database. To protect the data in your database, before you perform any migration actions, you should use the MySQL mysqldump command to create a data export that you keep as a backup. This backup is used as the starting point for both migration methods.

The mysqldump command takes a snapshot of all data in the database. Depending on the size of your database, this process can take from less than a second to several hours. If any data is changed after the snapshot is taken, those changes are not captured in the final snapshot. If you're using the external replica promotion migration process, this will not be a concern. However, for the export/import migration process, this can cause data inconsistencies and lost data.

Store the complete backup in a safe location. Options include Cloud Storage, a local object storage system, and offsite backups.

Benchmark database performance metrics

Before you begin any migration, you should have a complete collection of the performance benchmarks of the existing application and database. This allows you to understand the expected behavior and current performance metrics.

Make sure that the benchmarks you have represent the use cases for your apps and your database. Later, when you use the benchmarks to compare performance of the migrated database, you want to be sure that you can make meaningful comparisons between the old and new database deployments.

Determine your connectivity options

Network connectivity requirements are different for each migration strategy. The most important question to help you decide what method you can use is whether your database can be accessed using a public IPv4 address.

When you perform an external replica promotion migration using the Cloud SQL automated migration workflow, your existing database must be publicly accessible using an IPv4 address. The destination database in Cloud SQL requires a persistent network connection throughout the promotion or migration process. Depending on the size of your database, this could be an extended amount of time.

You should also consider connectivity options on your destination database. Cloud SQL can use private IP addresses. You can also configure it to use public IP connectivity.

Determine your root password

During migration, you run many privileged MySQL commands and tasks. Therefore, you must have access to the MySQL root user, or access to an account with the SUPER and GRANT privileges. If you don't know the username and password for one of these privileged accounts, you need to consult with your database administrators or follow the MySQL password reset procedures.

Perform a test migration and create a runbook

Before you perform a production migration, you should perform test migrations. Doing so lets you verify your steps, gain confidence in your methods, and predict the time required for the migration. You can use the lessons learned from this experience to generate a runbook that supports your migration. Use the runbook as a working document to support coordination, troubleshooting, and execution of your migration.

This document can serve as a primer for the creation of your database migration runbook.

Preparing the application for a database migration

The last preparatory step is to prepare your application for the migration. A discussion of all of the application-to-database connection methods is beyond the scope of this document. However, it's important to understand how IP addresses and domain name system (DNS) connectivity impact your application.

Using public or internal DNS is a common method to access a database; using DNS allows the underlying IP address of the destination database to change without an update to application code. If your application relies on DNS records for database connectivity, you usually want to lower the DNS time-to-live (TTL) values for those DNS records for the migration.

DNS TTL values are measured in seconds, and they represent the validity of a DNS record stored on a client or host. This time is also known as a DNS cache window. Default DNS TTL values are different for every DNS provider. For example, Cloud DNS sets the TTL by default at 5 minutes (300 seconds). Because database DNS entries rarely change, your DNS administrator might have set this much higher.

When you migrate your database, if your DNS TTLs are high, it could prolong the database downtime if clients have a cached DNS value. By lowering the TTL value, you force clients to check more often for updated records. This action helps to keep database downtime to a minimum.

In anticipation of making changes to the DNS records for the database, you should set this number to the lowest possible configurable value. If your DNS provider is SaaS-based, this can lead to increased costs. Make sure that you check with your provider or administrator before making this change.

Export/import migration

The export/import migration strategy relies on the MySQL mysqldump command to export a logical snapshot from the source environment database. You then import the snapshot into the target environment database.

The simplicity of this method is offset by the fact that the database experiences downtime during migration. Because transactions are continuously occurring in large databases, administrators commonly do the following:

  1. Disable the source database's ability to write new data.
  2. Create the snapshot from the source database.
  3. Import the snapshot to the destination database.
  4. Update the application configuration to point to the new database.
  5. Verify performance on the new database.

The following diagram illustrates this sequence:

Sequence of migration from a MySQL database on-premises,to a file export to Cloud Storage, to MySQL running on Cloud SQL

This sequence ensures that no changes are made to the data between the time you stop writes on the source database and the time you start writes on the new database. Depending on the size of your data, this process can result in prolonged downtime for your application. This is often unacceptable for many businesses and applications.

When you use the export/import migration strategy, your database does not need external connectivity to Cloud SQL. However, you need to be able to export the mysqldump files to a location that's accessible by Cloud SQL. Depending on the size of your mysqldump file, this data import/export process can take a long time, and it can require complex file-copy and verification techniques to ensure that all data rows and all transaction information are copied correctly.

In addition, your organization's data classification rules can have an impact on your ability to store the mysqldump files in other locations. Make sure that you consult these rules before you choose any method of storing data.

Export/import migration step by step

The following sections provide details about each of the steps of the overall sequence.

Disable writes

The administrator configures the database to prevent write operations. This is also referred to as locking or freezing the database. After this is accomplished, any write transaction that's sent to the database fails. However, any read transactions against the database, or against any replicas, continue to function normally. You should run a few test queries to verify this behavior before you move to subsequent steps.

Make sure that you understand which applications might be affected by locking the database. You might need to notify users, customers, administrators, and other departments that depend on the database in order to prevent a larger system failure.

Export data

After the database is locked, you execute the mysqldump command in the old database in order to produce a full export of the data. When the export process is completed, this file must be moved to a location where it's accessible by the new database.

Make sure that you send backup files to a physical drive separate from the one that contains the production database. This separation reduces disk I/O contention and can increase the speed of the backup. Taking the write load away from the main drive decreases the effect on your production database operations. This separation also reduces the chance of running out of space on the disk containing your production database.

Compressing the output of the mysqldump command can also help reduce the total number of writes needed for the backup. Compression can require increased CPU resources, so be sure to check how this might affect the performance of the database management layer.

For more information, see Exporting Data for Import into Cloud SQL.

Import data

The new database must import the dump file from the old database. The import process re-creates the database tables and all of the rows from the old database. The time needed for an import is comparable to the time it took for the export, given similar system resources. If you're upgrading system resources as part of your migration, the import process could be faster than the export.

For more information, see Importing Data into Cloud SQL.

After the data is imported, you should review the benchmark performance information that you got from testing the source database. You can then run read-based production queries against the new database to ensure that the new database meets the performance needs of your applications.

Update applications

When you're ready to start using the new database, you need to update your applications to support the new database. There are a few methods to accomplish this task:

  • If your application has the database endpoint or IP address defined directly in the source code, update the code and deploy this update to all applications that need to access the database.
  • If your application points to a DNS record to access the database, update the DNS record through your DNS provider to reflect the new database endpoint address.
  • If your application points to a load balancer to access the database, register the new database endpoint with the load balancer.

When you cut over your applications to use Cloud SQL, you might want to warm up your buffer pool. This helps minimize the query latency that your applications might experience before the database stores data into memory. You can use the MySQL slow-query log from your source production database to create a script to execute all SELECT statements that will populate the cache on your new instance. You should do this just before you enable writes.

There are many third-party tools and partners that can help with warming up your buffer pool.

Enable writes

After you verify that the updated application code can communicate with the new database, you need to remove the read-only lock on the new database. After that step, you can verify the proper function of the database through your application. You should also perform another round of benchmark testing to ensure that your database performance still meets the needs of your application.

When you've finished verification and determined that all of the data is consistent between both databases, you can complete a few clean-up tasks:

  • Notify users that downtime is over.
  • Update status pages and remove any status messages from the application.
  • Revert any modified DNS TTLs to their original values.

Suggested practices for optimizing import time for export/import migrations

There are a number of things that you can do to help optimize the import speed of a mysqldump file. This in turn decreases the total amount of time that a server needs to be in a read-only state.

  • Optimize your schema. Structure your data to avoid foreign keys. If possible, export the data in primary key order.
  • Optimize your resources. If possible, launch your new database with enough RAM to support the entire dataset. In addition, using SSDs for your database storage can greatly increase your data throughput. SSDs allow more input/output operations per second (IOPS) than mechanical disks. Google Cloud SSDs also increase IOPS as disk size increases. This means that it might be worth the extra cost and over-provisioned space of a larger disk in order to gain additional IOPS.
  • Optimize monitoring. Having a robust monitoring solution in place to detect any problems with the old or new database can be extremely useful to a migration. For example, you can use Cloud Monitoring and the MySQL plugin. Monitoring allows you to detect problems early, identify what normal database behavior looks like, and compare the new database against your previous database baseline.

Exporting and importing large databases

The mysqldump process works quickly on databases that are smaller than 10 GB. Importing databases larger than 10 GB can require you to use advanced strategies to minimize the amount of database downtime you experience during the final cutover.

Export the schema and data in subsections

One strategy is to export your database schema and data in sections, without secondary keys. This allows you to add the secondary keys using ALTER TABLE statements at the end of the migration. Using this approach, you export different sets of tables to different files based on their key relationships. You can then run the MySQL import processes in parallel from a Compute Engine instance. One way to export data for parallel import is by manually editing the export files themselves. However, that can be time consuming and error prone.

Take advantage of timestamps

If any of your tables use timestamp columns, you can use a feature of the mysqldump command that allows you to specify a WHERE clause. This allows you to load time-stamped data into Cloud SQL over multiple imports.

When you're ready for the final migration cutover, you can use this technique to export only rows in the source database that have changed since your last export. If you don't have timestamped tables, and you are comfortable introducing a timestamp column to your tables in the source database, you can add a MySQL trigger on each source table to set the timestamp whenever a row changes.

A concern with this method is tracking deleted rows. The timestamp column tracks only rows that have changed through INSERT or UPDATE statements; rows that are deleted are removed from the table. Therefore, using a timestamp to find rows that have been deleted after the initial migration works only if your application and database were built to use a soft delete mechanism. In this technique, instead of issuing a DELETE statement, you set an is_deleted column to true to indicate that a row is deleted. Alternatively, you can create a table to track deleted rows for each of your real tables, and a matching on_delete trigger that inserts deleted rows into the deletion-tracking table. During the final phase of your migration, you can create the necessary DELETE statements to remove those rows from your Cloud SQL instance.

Validating your export files

Whenever you modify your export files manually, or take exports of different tables, you should verify that your data was accurately exported.

Comparing export files

One way to validate the exported data is to implement an idle test database. You can create an idle test database by restoring a recent production backup to a new, stand-alone database instance. Then you restore your mysqldump file to the new idle test database. Next, you can test your multi-phased mysqldump procedure, and then restore that in parallel to yet another idle test database. This should provide you two identical copies of the database. You can then verify the data by running the mysqldump command on the two idle test databases and comparing all three using a diff command or file comparison tool.

The following diagram illustrates this sequence and shows the artifacts created during the process.

Sequence and artifacts for validating export files

External replica promotion

The second option for migrating your MySQL database is to use an external replica promotion. In this strategy, you create a replica database and set your existing database as the primary. You wait until the two databases are in sync, and you then promote your MySQL replica database to be the primary. This process minimizes database downtime related to the database migration.

Prerequisites

Using the external replica promotion migration method requires two additional prerequisites: creating a replication user and enabling GTID replication.

Establish a replication user

Certain versions of MySQL, including version 5.7 (which is used for the examples in this document), store the username and password of any user account used for replication in clear text in the primary database's logs. If you use your root user account to perform replication, you're storing your root password in your plain text logs. Using the root user for replication therefore poses a security risk.

To help limit the possibility of compromising other aspects of your database, you should create a completely separate account for performing the replication. The new account should have only the privileges that are needed for the replication process. This account should also be limited to allow replication only from the IP address of the primary database.

Configure GTID

Your source database must have GTID replication enabled in order to perform an external replica promotion migration. If you haven't configured replication in your source database, GTID replication might not be enabled.

A global transaction identifier (GTID) is a unique identifier that's associated with each transaction committed on the primary database server. These identifiers are used to build the replication process between a replica and primary database.

New instances of Cloud SQL for MySQL use GTID replication by default. You cannot disable this feature, and therefore certain SQL statements and operations are not allowed. For more information, read the Differences between Cloud SQL and standard MySQL functionality and the MySQL restrictions on replication with GTIDs.

External replica promotion migration overview

Compared to an export/import migration, external replica promotion migration can be significantly easier. Cloud SQL can help automate this process if you have completed all of the prerequisites described earlier. The automated migration workflow still requires an export/import process, but it abstracts away some of the manual steps as part of the automation.

The automated migration workflow supports the following scenarios:

  • Migrating from on-premises to Cloud SQL.
  • Migrating from another cloud provider to Cloud SQL.
  • Migrating from one Google Cloud project to another Google Cloud project.

During the automated migration workflow, you do the following:

  1. Provide details about your data source.
  2. Create a Cloud SQL read replica.
  3. Synchronize the read replica with the source.
  4. Promote the Cloud SQL read replica to be the primary instance.

Provide details on your data source

When you run the automated migration workflow with Cloud SQL, you provide a reference name for the data source. This name is used to refer to the data in the migration, and does not need to match the actual name of your source.

You also provide the public IP address and port of the source database. You need to ensure that your Cloud SQL database has direct connectivity to the source database using an IPv4 address. You also need to ensure that any firewalls or security appliances protecting the source database are configured to allow connectivity from the new Cloud SQL database IP address.

You must also provide the automated migration workflow with your MySQL replication user credentials and with the version of MySQL that you want to migrate. This is needed in order to authenticate with the server and to support a secure transfer in the next step. You can also configure SSL/TLS options as needed.

Create a Cloud SQL read replica

When you run the automated migration workflow, you set options for creating the new Cloud SQL instance in Google Cloud. This requires a few steps:

  • Setting a database instance ID.
  • Choosing a Google Cloud region and zone.
  • Selecting an instance type.
  • Selecting the disk storage type.
  • Choosing total storage capacity. We recommend enabling automatic storage increases. This option allows your database to grow as needed, and reduces the risk that your database will run out of disk space.

Finally, you provide a link to the snapshot that you stored in Cloud Storage. Make sure you have the latest export files in a Cloud Storage bucket that's in the same project as the Cloud SQL instance.

Synchronize the read replica with the source

After you begin the migration process, the new Cloud SQL VM launches and begins to replicate from the source. When the synchronization is completed, you can compare the data in the database to confirm that replication is complete. You can also begin benchmarking to test the performance of the Cloud SQL database against the requirements of your application and against the previously measured baseline performance of your source database.

Promote the read replica to be the primary instance

When the data is replicated, you place the old database into read-only mode. This prevents any writes to the database, and ensures that no data is changed during the promotion process. Then you update the application code or DNS entries to support the new endpoint in Cloud SQL, as described earlier.

When the source database is in read-only mode and your application has been updated with the new Cloud SQL endpoint address, you can go to your new database instance in Cloud SQL and promote the replica to be the primary server. This promotion process automatically re-enables writes on the newly promoted Cloud SQL primary.

After you promote the replica to be the primary server, your database might experience several minutes of downtime, regardless of the database size. This is the amount of time that Cloud SQL needs in order to perform all of the tasks and configuration work involved in promoting the database.

When the promotion is finished, your primary database is running in Cloud SQL and your migration is complete. This is a good time to run another set of performance tests against the application requirements and against the source database baseline performance. When you have the new set of Cloud SQL performance benchmarks, you can plan for other database optimizations.

Optimizing after the migration

When migration is complete, regardless of your migration method, you can optimize your Cloud SQL database in these ways:

  • Right-size your VM. By analyzing your monitoring data, you can see if you need to choose a differently sized VM for the database. You might be able to reduce costs by using a smaller instance. Conversely, your database might perform better on a larger VM with more resources.
  • Add additional indexes. If your database could benefit from additional indexes or other optimizations, you can add them after you've finished the migration.
  • Turn on additional transactional logging. If you explore your server logging level, you can investigate the benefits from enabling binary logging or other enhanced logging options.
  • For the external replica promotion migration method, re-enable binary logging and take advantage of automated backups. (After replica promotion migration, Cloud SQL automated backups are disabled because binary logging is off.)

What's next