This document describes the process of migrating your database to Cloud Spanner. We describe the stages of migration and the tools that we recommend for each stage, depending on your source database and other factors. The recommended tools include Google Cloud products as well as third-party commercial and open source tools. Together, these tools help you accelerate migrations and reduce risk.
Any Spanner migration involves the following core stages:
- Assess the complexity of your migration.
- Migrate your schema.
- Migrate your application.
- Test and tune your performance.
- Migrate the data.
- Validate the migration.
- Configure cutover and failover mechanisms.
The following diagram shows this process:
Within these stages, your migration plan can vary widely depending on factors such as your database source and size, downtime requirements, application code complexity, sharding schema, custom functions or transformations, and failover and replication strategy.
We provide migration guides for Amazon DynamoDB, MySQL, Oracle Database, and PostgreSQL. If you're migrating from one of these databases, also follow the relevant guide:
- Migrate from MySQL
- Migrate from PostgreSQL to Cloud Spanner (GoogleSQL dialect)
- Migrate from PostgreSQL to Cloud Spanner (PostgreSQL dialect)
- Migrate from Oracle Database
- Migrate from DynamoDB
If you're migrating from another database, you might need further customizations and tools that this guide doesn't cover.
We recommend using the following tools to assist you in various stages of your migration, depending on your source database and other factors. Some tools only support certain source databases. For some steps of the process, no tool is available, so you complete those steps manually.
- migVisor is a database migration assessment tool that analyzes database workloads and identifies challenges and optimization opportunities.
- HarbourBridge is an open source tool that performs basic assessments as well as schema and data migrations.
- Data Validation Tool (DVT) is a standardized data validation method built by Google and supported by the open source community. You can integrate DVT into existing Google Cloud products.
- Datastream is a Google Cloud service that lets you read change data capture (CDC) events and bulk data from a source database.
- Dataflow is a Google Cloud service that helps you to write large amounts of data to Spanner more efficiently using templates. These templates don't generate a dump file; the dump file needs to be generated by the source database tools or third-party tools.
- Dataproc provides Spark-based templates for bulk-loading data into Spanner.
The following table summarizes the primary tools that we recommend for each stage of your migration for some common source databases. You can migrate from other databases with customizations.
|Source database||Assess the scope||Migrate your schema||Migrate your app||Migrate your data||Validate data migration||Configure cutover and failover|
*Migrations require significant downtime.
Assess the complexity of your migration
To assess the scope and complexity of your migration and plan your approach, you need to gather data about your source database, including the following:
- Query patterns
- Amount of application logic that's dependent on database features such as stored procedures and triggers
- Hardware requirements
- Total cost of ownership (TCO)
If your source database is MySQL, PostgreSQL, or Oracle Database, you can conduct these assessments using migVisor. migVisor is a database migration assessment tool that analyzes database workloads and generates a custom visual migration roadmap to identify the best migration path, including challenges and improvement opportunities.
A typical migration assessment with migVisor is comprised of the following three areas of assessment:
- Feature compatibility. migVisor compares the features of the source database to Spanner features and provides recommendations.
- Application code complexity. migVisor analyzes the source code of the application that you want to migrate to Spanner and determines the code's modernization complexity and migration complexity.
- Total cost of ownership (TCO). migVisor computes the TCO, including compute capacity (nodes or processing units), database storage, backup storage, and network egress. To estimate the number of nodes required, see guidelines for regional configuration or guidelines for multi-region configuration.
migVisor also scans the existing database and workload, requests information about the target database, and provides an estimate of future TCO (broken down by operating expenses, residual capital expenditures, and estimated migration duration). For more information, see TCO Dashboard Overview.
Assessments are read-only operations, meaning changes required are evaluated but not implemented.
Migrate your schema
Before migrating a schema to a Spanner schema, assess the compatibility between the schemas, and optimize your schema for Spanner. For example, you might want to change keys, drop or add indexes, or add or remove columns of existing tables. To optimize your schema for Spanner, see Schema design best practices and Recommended primary key migration strategies.
HarbourBridge, an open source, community-maintained tool created by Google developers, automatically builds a Spanner schema from your source database schema. You can customize the schema using the HarbourBridge schema assistant.
HarbourBridge ingests schema and data from one of the following locations:
- A dump file from a local location or Cloud Storage (MySQL, PostgreSQL, CSV)
- Directly from the source database (MySQL, PostgreSQL, Oracle Database, SQL Server, DynamoDB)
HarbourBridge performs the following functions for schema assessments, recommendations, and migrations:
- Data type compatibility assessment and recommendations
- Primary key editing and recommendations
- Secondary index editing and recommendations
- Interleaving table editing and recommendations
- General Spanner schema design recommendations
- Schema versioning
- Collaborative schema modification
For more information about schema migrations with HarbourBridge, see the
You also use HarbourBridge for data migration.
Migrate your application
A database migration requires different drivers and libraries, as well as compensation for features that Spanner doesn't support. To achieve similar functionality and optimize to Spanner strengths, you might need to change your code, application flows, and architecture.
Here are some of the changes required to migrate your application to Spanner:
- Spanner doesn't support running user code at the database level, so you need to move any procedures and triggers stored at the database level into the application.
- Use Spanner client libraries and object-relational mappers (ORMs). For more information, see Overview of APIs, client libraries, and ORM drivers.
- If you need to translate queries, translate them manually or use other third-party tools.
- Take note of partitioned DML, read-only transactions, commit timestamps, and read timestamps and how they can optimize application performance.
You also might need to make changes to transaction handling. There's no tooling available to help with this, so you need to complete this step manually. Keep the following in mind:
- The limit of mutations per commit is 40,000. Each secondary index on a table is an additional mutation per row. To modify data using mutations, see Insert, update, and delete data using mutations. To modify a large amount of data, use partitioned DML.
- For transaction isolation level, no handling is required because Spanner transactions are more isolated.
- Because Spanner is linearizable, it handles consistency and locking by default.
Test and tune your schema and application performance
Performance tuning is an iterative process in which you evaluate metrics like CPU utilization and latency based on a subset of your data, adjust your schema and application to improve performance, and test again.
For example, in your schema, you might add or change an index, or change a primary key. In your application, you might batch writes, or you might merge or modify your queries.
For production traffic in particular, performance tuning is important to help avoid surprises. Performance tuning is more effective the closer the setup is to live production traffic throughput and data sizes.
To test and tune your schema and application performance, follow these steps:
- Upload a subset of your data into a Spanner database. For more information, see Migrate your data.
- Point the application to Spanner.
- Verify the correctness by checking for basic flows.
- Verify that performance meets your expectations by performing load tests
on your application. For help identifying and optimizing your most
expensive queries, see
Detect query performance issues with query insights.
In particular, the following factors can contribute to suboptimal query
- Inefficient queries: For information about writing efficient queries, see SQL best practices.
- High CPU utilization: For more information, see Investigate high CPU utilization.
- Locking: To reduce bottlenecks caused by transaction locking, see Identify transactions that might cause high latencies.
- Inefficient schema design: If the schema isn't designed well, query optimization isn't very useful.
- Hotspotting: Hotspots in Spanner limit write throughput, especially for high-QPS applications. To identify hotspots or antipatterns, check the Key Visualizer statistics from the Google Cloud console. For more information about avoiding hotspots, see Choose a primary key to prevent hotspots.
- If you modify schema or indexes, repeat correctness and performance testing until you achieve satisfactory results.
For more information about fine-tuning your database performance, contact Spanner support.
Migrate your data
After optimizing your Spanner schema and migrating your application, you move your data into an empty production-sized Spanner database, and then switch over to the Spanner database.
Depending on your source database, you might be able to migrate your database with minimal downtime, or you might require prolonged downtime.
The following table shows the differences between minimal-downtime migrations and migrations with more downtime, including supported sources, formats, size, and throughput.
|Minimal-downtime migration||Migration with downtime|
|Supported sources||MySQL, PostgreSQL, or Oracle Database||SQL Server, DynamoDB, or any database that can export to CSV or Avro|
|Supported data formats||Connect directly. See Directly connecting to a MySQL database.||MySQL, PostgreSQL, CSV, Avro|
|Supported database sizes||No limit||No limit|
|Max throughput||45 GB per hour||200 GB per hour|
Spanner supports minimal-downtime migrations from MySQL, PostgreSQL, and Oracle Database. A minimal-downtime migration consists of two components:
- A consistent snapshot of all the data in the database
- The stream of changes (inserts and updates) since that snapshot, referred to as change data capture (CDC)
While minimal-downtime migrations help protect your data, the process involves challenges, including the following:
- Storing CDC data while the snapshot is migrated.
- Writing the CDC data to Spanner while capturing the incoming CDC stream.
- Ensuring that the migration of CDC data to Spanner is faster than the incoming CDC stream.
To manage a minimal-downtime migration, HarbourBridge orchestrates the following processes for you:
- HarbourBridge sets up a Cloud Storage bucket to store CDC events on the source database while the snapshot migration progresses.
- HarbourBridge sets up a Datastream job that moves the bulk load of CDC data and continuously streams incremental CDC data to the Cloud Storage bucket. You set up the source connection profile within HarbourBridge.
- HarbourBridge sets up the Dataflow job to migrate the CDC events into Spanner.
When Dataflow has copied most of the data, it stops writing to the source database and waits for the data to finish migrating. This results in a short downtime while Spanner catches up to the source database. Afterward, the application can be cut over to Spanner.
The following diagram shows this process:
Migration with downtime
Migrations with downtime are recommended only for test environments or applications that can handle a few hours of downtime. On a live database, a migration with downtime can result in data loss.
To perform a downtime migration, follow these high-level steps:
- Generate a dump file of the data from the source database.
- Upload the dump file to Cloud Storage in a MySQL, PostgreSQL, Avro, or CSV dump format.
- Load the dump file into Spanner using Dataflow or HarbourBridge.
Generating multiple small dump files makes it quicker to write to Spanner, as Spanner can read multiple dump files in parallel.
When generating a dump file from the source database, to generate a consistent snapshot of data, keep the following in mind:
- To prevent the data from changing during the generation of the dump file, before you perform the dump, apply a read lock on the source database.
- Generate the dump file using a read replica from the source database with replication disabled.
Recommended formats for bulk migration
Avro is the preferred format for a bulk migration to Spanner. If you're using Avro, consider the following:
- To generate an Avro dump of the data, use a tool like DBeam. For more information about exporting to Avro, see Export data from a non-Spanner database to Avro files.
- To import Avro data, use a Dataflow import job. For more information, see Import Avro files from non-Spanner databases to Spanner.
If you're using CSV, consider the following:
- To generate a CSV dump of the data, use CSV generation supported by the source. If data contains new lines, use a custom line separator.
- To import CSV data, use a Dataflow import job. You can create your own Dataflow import template or use a Google Cloud import template. For more information, see Dataflow data pipeline templates.
If you're using MySQL or PostgreSQL, you can use HarbourBridge.
For information about using custom scripts to load data into Spanner, see Performance guidelines for bulk loading.
Validate your data migration
Data validation is the process of comparing data from both the source and the destination tables to ensure they match.
Data Validation Tool (DVT) is an open source tool that can connect to data stores and perform checks between the source and Spanner. We recommend using it to perform basic validations as a part of your migration, such as the following:
- Check that all tables were created and that all schema mappings are correct .
- Match the count of rows for each table.
- Extract random rows to verify correctness.
- Validate your columns (
- Compare any cyclic redundancy checks or hash functions at the row level.
To perform more specific validations, build custom checks during migration.
Configure cutover and failover mechanisms
Migrations are often time consuming and complex. Build in fallbacks to avoid significant impact in case of an error during migration, allowing you to switch back to the source database with minimal downtime.
The reverse replication needs to do the following:
- Handle changes in data types or content.
- Reverse any transformations performed during the migration.
- Push the data to the appropriate destination, taking into account sharding schemes on source.