This document describes the process of migrating your database to Spanner. We describe the stages of migration and the tools that we recommend for each stage, depending on your source database and other factors. The recommended tools include Google Cloud products as well as third-party commercial and open source tools. Together, these tools help you accelerate migrations and reduce risk.
Any Spanner migration involves the following core stages:
- Assess the complexity of your migration.
- Migrate your schema.
- Load sample data.
- Migrate your application.
- Test and tune your performance.
- Migrate the data.
- Validate the migration.
- Configure cutover and failover mechanisms.
Within these stages, your migration plan can vary widely depending on factors such as your database source and size, downtime requirements, application code complexity, sharding schema, custom functions or transformations, and failover and replication strategy.
Migration tools
We recommend using the following tools to assist you in various stages of your migration, depending on your source database and other factors. Some tools only support certain source databases. For some steps of the process, no tool is available, so you complete those steps manually.
Spanner migration tool (SMT) is an open source tool that can perform basic assessments, schema conversion, and data migrations.
Database Migration Assessment (DMA) offers a basic assessment to migrate PostgreSQL to Spanner.
Datastream is a Google Cloud service that lets you read change data capture (CDC) events and bulk data from a source database and write to a specified destination.
Dataflow is a Google Cloud service that helps you to write a large amount of data to Spanner more efficiently using templates.
Bulk data migration is a Dataflow template that lets you migrate large MySQL data sets directly to Spanner.
Minimal downtime migration uses Datastream and Dataflow to migrate:
- Existing data in your source database.
- Stream of changes that are made to your source database during the migration.
Data Validation Tool (DVT) is a standardized data validation method built by Google and supported by the open source community. You can integrate DVT into existing Google Cloud products.
Migration tools for MySQL source databases
If your source database is MySQL, then you can perform some of the initial migration stages using MySQL dump files. You must directly connect to your running source MySQL database to complete a production migration.
The following table recommends migration tools based on the migration stage and whether you're working using a dump file or directly connecting your source database:
Migration stage | Dump file | Direct connection to source database |
---|---|---|
Assessment | Use SMT with mysqldump . |
Use SMT with mysqldump . |
Schema conversion | Use SMT with mysqldump . |
Use SMT to configure and convert schema. |
Sample data load |
|
Perform a bulk migration. |
Data migration | Not applicable | Perform a bulk migration, then perform a minimal downtime migration. |
Data validation | Not applicable | Use DVT. |
Failover configuration | Not applicable | Use SMT for reverse replication. |
Migration tools for PostgreSQL source databases
If your source database uses PostgreSQL, then you can perform some of the migration stages using a PostgreSQL dump file. You must directly connect to your running source PostgreSQL database to complete the migration.
The following table recommends migration tools based on the migration stage and whether you're working with a dump file or directly connecting from your source database:
Migration stage | Dump file | Direct connection to source database |
---|---|---|
Assessment | Use SMT with pg_dump . |
Use DMA. |
Schema conversion | Use SMT with pg_dump . |
Use SMT to configure and convert schema. |
Sample data load |
|
Perform a minimal downtime migration. |
Data migration | Not applicable | Perform a minimal downtime migration. |
Data validation | Not applicable | Use DVT. |
Failover | Not applicable | Not applicable |
Assess the complexity of your migration
To assess the scope and complexity of your migration and plan your approach, you need to gather data about your source database, including the following:
- Query patterns
- Amount of application logic that's dependent on database features such as stored procedures and triggers
- Hardware requirements
- Total cost of ownership (TCO)
Migrate your schema
Before migrating a schema to a Spanner schema, assess the compatibility between the schemas, and optimize your schema for Spanner. For example, you might want to change keys, drop or add indexes, or add or remove columns of existing tables. To optimize your schema for Spanner, see Schema design best practices and Recommended primary key migration strategies.
Spanner migration tool, an open source, community-maintained tool created by Google developers, automatically builds a Spanner schema from your source database schema. You can customize the schema using the Spanner migration tool schema assistant.
Spanner migration tool ingests schema and data from one of the following locations:
- A dump file from a local location or Cloud Storage (MySQL, PostgreSQL, CSV)
- Directly from the source database (MySQL, PostgreSQL)
Spanner migration tool performs the following functions for schema assessments, recommendations, and migrations:
- Data type compatibility assessment and recommendations
- Primary key editing and recommendations
- Secondary index editing and recommendations
- Interleaving table editing and recommendations
- General Spanner schema design recommendations
- Schema versioning
- Collaborative schema modification
For more information about schema migrations with the Spanner migration tool, see the
Spanner migration tool README.md
file.
You also use the Spanner migration tool for data migration.
Load sample data
After you create a Spanner compatible schema, you can prepare your database to be tested using sample data. You can use the BigQuery reverse ETL workflow to load the sample data. For more information, see Load sample data.
Migrate your application
A database migration requires different drivers and libraries, as well as compensation for features that Spanner doesn't support. To optimize to Spanner strengths, you might need to change your code, application flows, and architecture.
Here are some of the changes required to migrate your application to Spanner:
- Spanner doesn't support running user code at the database level, so you need to move any procedures and triggers stored at the database level into the application.
- Use Spanner client libraries and object-relational mappers (ORMs). For more information, see Overview of APIs, client libraries, and ORM drivers.
- If you need to translate queries, translate them manually or use other third-party tools.
- Take note of partitioned DML, read-only transactions, commit timestamps, and read timestamps and how they can optimize application performance.
You also might need to make changes to transaction handling. There's no tooling available to help with this, so you need to complete this step manually. Keep the following in mind:
- The limit of mutations per commit is 40,000. Each secondary index on a table is an additional mutation per row. To modify data using mutations, see Insert, update, and delete data using mutations. To modify a large amount of data, use partitioned DML.
- For transaction isolation level, no handling is required because Spanner transactions are more isolated.
- Because Spanner is linearizable, it handles consistency and locking by default.
Test and tune your schema and application performance
Performance tuning is an iterative process in which you evaluate metrics like CPU utilization and latency based on a subset of your data, adjust your schema and application to improve performance, and test again.
For example, in your schema, you might add or change an index, or change a primary key. In your application, you might batch writes, or you might merge or modify your queries.
For production traffic in particular, performance tuning is important to help avoid surprises. Performance tuning is more effective the closer the setup is to live production traffic throughput and data sizes.
To test and tune your schema and application performance, follow these steps:
- Upload a subset of your data into a Spanner database. For more information, see Migrate your data.
- Point the application to Spanner.
- Verify the correctness by checking for basic flows.
- Verify that performance meets your expectations by performing load tests
on your application. For help identifying and optimizing your most
expensive queries, see
Detect query performance issues with query insights.
In particular, the following factors can contribute to suboptimal query
performance:
- Inefficient queries: For information about writing efficient queries, see SQL best practices.
- High CPU utilization: For more information, see Investigate high CPU utilization.
- Locking: To reduce bottlenecks caused by transaction locking, see Identify transactions that might cause high latencies.
- Inefficient schema design: If the schema isn't designed well, query optimization isn't very useful.
- Hotspotting: Hotspots in Spanner limit write throughput, especially for high-QPS applications. To identify hotspots or antipatterns, check the Key Visualizer statistics from the Google Cloud console. For more information about avoiding hotspots, see Choose a primary key to prevent hotspots.
- If you modify schema or indexes, repeat correctness and performance testing until you achieve satisfactory results.
For more information about fine-tuning your database performance, contact Spanner support.
Migrate your data
After optimizing your Spanner schema and migrating your application, you move your data into an empty production-sized Spanner database, and then switch over to the Spanner database.
Depending on your source database, you might be able to migrate your database with minimal downtime, or you might require prolonged downtime.
For both minimal-downtime migrations and migrations with prolonged downtime, we recommend using Dataflow and the Spanner migration tool.
The following table shows the differences between minimal-downtime migrations and migrations with more downtime, including supported sources, formats, size, and throughput.
Minimal-downtime migration | Migration with downtime | |
---|---|---|
Supported sources | MySQL, PostgreSQL | Any database that can export to CSV or Avro. |
Supported data formats | Connect directly. See Directly connecting to a MySQL database. | MySQL, PostgreSQL, CSV, Avro |
Supported database sizes | No limit | No limit |
Max throughput | 45 GB per hour | 200 GB per hour |
Minimal-downtime migration
Spanner supports minimal-downtime migrations from MySQL, PostgreSQL, and Oracle Database. A minimal-downtime migration consists of two components:
- A consistent snapshot of all the data in the database
- The stream of changes (inserts and updates) since that snapshot, referred to as change data capture (CDC)
While minimal-downtime migrations help protect your data, the process involves challenges, including the following:
- Storing CDC data while the snapshot is migrated.
- Writing the CDC data to Spanner while capturing the incoming CDC stream.
- Ensuring that the migration of CDC data to Spanner is faster than the incoming CDC stream.
To manage a minimal-downtime migration, the Spanner migration tool orchestrates the following processes for you:
- Sets up a Cloud Storage bucket to store CDC events on the source database while the snapshot migration progresses.
- Sets up a Datastream job that moves the bulk load of CDC data and continuously streams incremental CDC data to the Cloud Storage bucket. You set up the source connection profile within Spanner migration tool.
- Sets up the Dataflow job to migrate the CDC events into Spanner.
When Dataflow has copied most of the data, it stops writing to the source database and waits for the data to finish migrating. This results in a short downtime while Spanner catches up to the source database. Afterward, the application can be cut over to Spanner.
The following diagram shows this process:
Migration with downtime
For databases other than MySQL, PostgreSQL, or Oracle Database, if the database can export to CSV or Avro, then you can migrate to the Spanner with downtime. We recommend using Dataflow or Spanner migration tool.
Migrations with downtime are recommended only for test environments or applications that can handle a few hours of downtime. On a live database, a migration with downtime can result in data loss.
To perform a downtime migration, follow these high-level steps:
- Generate a dump file of the data from the source database.
- Upload the dump file to Cloud Storage in a MySQL, PostgreSQL, Avro, or CSV dump format.
- Load the dump file into Spanner using Dataflow or the Spanner migration tool.
Generating multiple small dump files makes it quicker to write to Spanner, as Spanner can read multiple dump files in parallel.
When generating a dump file from the source database, to generate a consistent snapshot of data, keep the following in mind:
- To prevent the data from changing during the generation of the dump file, before you perform the dump, apply a read lock on the source database.
- Generate the dump file using a read replica from the source database with replication disabled.
Recommended formats for bulk migration
Avro is the preferred format for a bulk migration to Spanner. If you're using Avro, consider the following:
- To generate an Avro dump of the data, use a tool like DBeam. For more information about exporting to Avro, see Export data from a non-Spanner database to Avro files.
- To import Avro data, use a Dataflow import job. For more information, see Import Avro files from non-Spanner databases to Spanner.
If you're using CSV, consider the following:
- To generate a CSV dump of the data, use CSV generation supported by the source. If data contains new lines, use a custom line separator.
- To import CSV data, use a Dataflow import job. You can create your own Dataflow import template or use a Google Cloud import template. For more information, see Dataflow data pipeline templates.
If you're using MySQL or PostgreSQL, you can use the Spanner migration tool.
For information about using custom scripts to load data into Spanner, see Performance guidelines for bulk loading.
Validate your data migration
Data validation is the process of comparing data from both the source and the destination tables to ensure they match.
Data Validation Tool (DVT) is an open source tool that can connect to data stores and perform checks between the source and Spanner. We recommend using it to perform basic validations as a part of your migration, such as the following:
- Check that all tables were created and that all schema mappings are correct .
- Match the count of rows for each table.
- Extract random rows to verify correctness.
- Validate your columns (
count
,sum
,avg
,min
,max
,group by
). - Compare any cyclic redundancy checks or hash functions at the row level.
To perform more specific validations, build custom checks during migration.
Configure cutover and failover mechanisms
Migrations are often time consuming and complex. Build in fallbacks to avoid significant impact in case of an error during migration, allowing you to switch back to the source database with minimal downtime.
The current recommendation is to consume change streams to perform reverse replication, and write back to the source database through a stream like Pub/Sub or Cloud Storage.
The reverse replication needs to do the following:
- Handle changes in data types or content.
- Reverse any transformations performed during the migration.
- Push the data to the appropriate destination, taking into account sharding schemes on source.