Migration overview

This document describes the process of migrating your database to Cloud Spanner. We describe the stages of migration and the tools that we recommend for each stage, depending on your source database and other factors. The recommended tools include Google Cloud products as well as third-party commercial and open source tools. Together, these tools help you accelerate migrations and reduce risk.

Any Spanner migration involves the following core stages:

  1. Assess the complexity of your migration.
  2. Migrate your schema.
  3. Migrate your application.
  4. Test and tune your performance.
  5. Migrate the data.
  6. Validate the migration.
  7. Configure cutover and failover mechanisms.

The following diagram shows this process:

Diagram of migration process showing assessment, schema and application
migration, testing and tuning, data migration, validation, and
cutover.

Within these stages, your migration plan can vary widely depending on factors such as your database source and size, downtime requirements, application code complexity, sharding schema, custom functions or transformations, and failover and replication strategy.

We provide migration guides for Amazon DynamoDB, MySQL, Oracle Database, and PostgreSQL. If you're migrating from one of these databases, also follow the relevant guide:

If you're migrating from another database, you might need further customizations and tools that this guide doesn't cover.

Migration tools

We recommend using the following tools to assist you in various stages of your migration, depending on your source database and other factors. Some tools only support certain source databases. For some steps of the process, no tool is available, so you complete those steps manually.

  • migVisor is a database migration assessment tool that analyzes database workloads and identifies challenges and optimization opportunities.
  • HarbourBridge is an open source tool that performs basic assessments as well as schema and data migrations.
  • Data Validation Tool (DVT) is a standardized data validation method built by Google and supported by the open source community. You can integrate DVT into existing Google Cloud products.
  • Datastream is a Google Cloud service that lets you read change data capture (CDC) events and bulk data from a source database.
  • Dataflow is a Google Cloud service that helps you to write large amounts of data to Spanner more efficiently using templates. These templates don't generate a dump file; the dump file needs to be generated by the source database tools or third-party tools.
  • Dataproc provides Spark-based templates for bulk-loading data into Spanner.

The following table summarizes the primary tools that we recommend for each stage of your migration for some common source databases. You can migrate from other databases with customizations.

Source database Assess the scope Migrate your schema Migrate your app Migrate your data Validate data migration Configure cutover and failover
MySQL migVisor HarbourBridge Manual HarbourBridge DVT Manual
PostgreSQL migVisor HarbourBridge Manual HarbourBridge DVT Manual
Oracle Database migVisor HarbourBridge Manual HarbourBridge DVT Manual
SQL Server Manual HarbourBridge Manual HarbourBridge* DVT Manual
DynamoDB Manual HarbourBridge Manual HarbourBridge* DVT Manual

*Migrations require significant downtime.

Assess the complexity of your migration

To assess the scope and complexity of your migration and plan your approach, you need to gather data about your source database, including the following:

  • Query patterns
  • Amount of application logic that's dependent on database features such as stored procedures and triggers
  • Hardware requirements
  • Total cost of ownership (TCO)

If your source database is MySQL, PostgreSQL, or Oracle Database, you can conduct these assessments using migVisor. migVisor is a database migration assessment tool that analyzes database workloads and generates a custom visual migration roadmap to identify the best migration path, including challenges and improvement opportunities.

A typical migration assessment with migVisor is comprised of the following three areas of assessment:

  • Feature compatibility. migVisor compares the features of the source database to Spanner features and provides recommendations.
  • Application code complexity. migVisor analyzes the source code of the application that you want to migrate to Spanner and determines the code's modernization complexity and migration complexity.
  • Total cost of ownership (TCO). migVisor computes the TCO, including compute capacity (nodes or processing units), database storage, backup storage, and network egress. To estimate the number of nodes required, see guidelines for regional configuration or guidelines for multi-region configuration.

migVisor also scans the existing database and workload, requests information about the target database, and provides an estimate of future TCO (broken down by operating expenses, residual capital expenditures, and estimated migration duration). For more information, see TCO Dashboard Overview.

Assessments are read-only operations, meaning changes required are evaluated but not implemented.

Migrate your schema

Before migrating a schema to a Spanner schema, assess the compatibility between the schemas, and optimize your schema for Spanner. For example, you might want to change keys, drop or add indexes, or add or remove columns of existing tables. To optimize your schema for Spanner, see Schema design best practices and Recommended primary key migration strategies.

HarbourBridge, an open source, community-maintained tool created by Google developers, automatically builds a Spanner schema from your source database schema. You can customize the schema using the HarbourBridge schema assistant.

HarbourBridge ingests schema and data from one of the following locations:

  • A dump file from a local location or Cloud Storage (MySQL, PostgreSQL, CSV)
  • Directly from the source database (MySQL, PostgreSQL, Oracle Database, SQL Server, DynamoDB)

HarbourBridge performs the following functions for schema assessments, recommendations, and migrations:

  • Data type compatibility assessment and recommendations
  • Primary key editing and recommendations
  • Secondary index editing and recommendations
  • Interleaving table editing and recommendations
  • General Spanner schema design recommendations
  • Schema versioning
  • Collaborative schema modification

For more information about schema migrations with HarbourBridge, see the HarbourBridge README.md file.

You also use HarbourBridge for data migration.

Migrate your application

A database migration requires different drivers and libraries, as well as compensation for features that Spanner doesn't support. To achieve similar functionality and optimize to Spanner strengths, you might need to change your code, application flows, and architecture.

Here are some of the changes required to migrate your application to Spanner:

  • ​Spanner doesn't support running user code at the database level, so you need to move any procedures and triggers stored at the database level into the application.
  • Use Spanner client libraries and object-relational mappers (ORMs). For more information, see Overview of APIs, client libraries, and ORM drivers.
  • If you need to translate queries, translate them manually or use other third-party tools.
  • Take note of partitioned DML, read-only transactions, commit timestamps, and read timestamps and how they can optimize application performance.

You also might need to make changes to transaction handling. There's no tooling available to help with this, so you need to complete this step manually. Keep the following in mind:

  • The limit of mutations per commit is 40,000. Each secondary index on a table is an additional mutation per row. To modify data using mutations, see Insert, update, and delete data using mutations. To modify a large amount of data, use partitioned DML.
  • For transaction isolation level, no handling is required because Spanner transactions are more isolated.
  • Because Spanner is linearizable, it handles consistency and locking by default.

Test and tune your schema and application performance

Performance tuning is an iterative process in which you evaluate metrics like CPU utilization and latency based on a subset of your data, adjust your schema and application to improve performance, and test again.

For example, in your schema, you might add or change an index, or change a primary key. In your application, you might batch writes, or you might merge or modify your queries.

For production traffic in particular, performance tuning is important to help avoid surprises. Performance tuning is more effective the closer the setup is to live production traffic throughput and data sizes.

To test and tune your schema and application performance, follow these steps:

  1. Upload a subset of your data into a Spanner database. For more information, see Migrate your data.
  2. Point the application to Spanner.
  3. Verify the correctness by checking for basic flows.
  4. Verify that performance meets your expectations by performing load tests on your application. For help identifying and optimizing your most expensive queries, see Detect query performance issues with query insights. In particular, the following factors can contribute to suboptimal query performance:
    1. Inefficient queries: For information about writing efficient queries, see SQL best practices.
    2. High CPU utilization: For more information, see Investigate high CPU utilization.
    3. Locking: To reduce bottlenecks caused by transaction locking, see Identify transactions that might cause high latencies.
    4. Inefficient schema design: If the schema isn't designed well, query optimization isn't very useful.
    5. Hotspotting: Hotspots in Spanner limit write throughput, especially for high-QPS applications. To identify hotspots or antipatterns, check the Key Visualizer statistics from the Google Cloud console. For more information about avoiding hotspots, see Choose a primary key to prevent hotspots.
  5. If you modify schema or indexes, repeat correctness and performance testing until you achieve satisfactory results.

For more information about fine-tuning your database performance, contact Spanner support.

Migrate your data

After optimizing your Spanner schema and migrating your application, you move your data into an empty production-sized Spanner database, and then switch over to the Spanner database.

Depending on your source database, you might be able to migrate your database with minimal downtime, or you might require prolonged downtime.

For both minimal-downtime migrations and migrations with prolonged downtime, we recommend using Dataflow and HarbourBridge.

The following table shows the differences between minimal-downtime migrations and migrations with more downtime, including supported sources, formats, size, and throughput.

Minimal-downtime migration Migration with downtime
Supported sources MySQL, PostgreSQL, or Oracle Database SQL Server, DynamoDB, or any database that can export to CSV or Avro
Supported data formats Connect directly. See Directly connecting to a MySQL database. MySQL, PostgreSQL, CSV, Avro
Supported database sizes No limit No limit
Max throughput 45 GB per hour 200 GB per hour

Minimal-downtime migration

Spanner supports minimal-downtime migrations from MySQL, PostgreSQL, and Oracle Database. A minimal-downtime migration consists of two components:

  • A consistent snapshot of all the data in the database
  • The stream of changes (inserts and updates) since that snapshot, referred to as change data capture (CDC)

While minimal-downtime migrations help protect your data, the process involves challenges, including the following:

  • Storing CDC data while the snapshot is migrated.
  • Writing the CDC data to Spanner while capturing the incoming CDC stream.
  • Ensuring that the migration of CDC data to Spanner is faster than the incoming CDC stream.

To manage a minimal-downtime migration, HarbourBridge orchestrates the following processes for you:

  1. HarbourBridge sets up a Cloud Storage bucket to store CDC events on the source database while the snapshot migration progresses.
  2. HarbourBridge sets up a Datastream job that moves the bulk load of CDC data and continuously streams incremental CDC data to the Cloud Storage bucket. You set up the source connection profile within HarbourBridge.
  3. HarbourBridge sets up the Dataflow job to migrate the CDC events into Spanner.

When Dataflow has copied most of the data, it stops writing to the source database and waits for the data to finish migrating. This results in a short downtime while Spanner catches up to the source database. Afterward, the application can be cut over to Spanner.

The following diagram shows this process:

Diagram shows the process of a minimal-downtime migration.

Migration with downtime

For databases other than MySQL, PostgreSQL, or Oracle Database, if the database can export to CSV or Avro, then you can migrate to Spanner with downtime. We recommend using Dataflow or HarbourBridge.

Migrations with downtime are recommended only for test environments or applications that can handle a few hours of downtime. On a live database, a migration with downtime can result in data loss.

To perform a downtime migration, follow these high-level steps:

  1. Generate a dump file of the data from the source database.
  2. Upload the dump file to Cloud Storage in a MySQL, PostgreSQL, Avro, or CSV dump format.
  3. Load the dump file into Spanner using Dataflow or HarbourBridge.

Generating multiple small dump files makes it quicker to write to Spanner, as Spanner can read multiple dump files in parallel.

When generating a dump file from the source database, to generate a consistent snapshot of data, keep the following in mind:

  • To prevent the data from changing during the generation of the dump file, before you perform the dump, apply a read lock on the source database.
  • Generate the dump file using a read replica from the source database with replication disabled.

Avro is the preferred format for a bulk migration to Spanner. If you're using Avro, consider the following:

If you're using CSV, consider the following:

  • To generate a CSV dump of the data, use CSV generation supported by the source. If data contains new lines, use a custom line separator.
  • To import CSV data, use a Dataflow import job. You can create your own Dataflow import template or use a Google Cloud import template. For more information, see Dataflow data pipeline templates.

If you're using MySQL or PostgreSQL, you can use HarbourBridge.

For information about using custom scripts to load data into Spanner, see Performance guidelines for bulk loading.

Validate your data migration

Data validation is the process of comparing data from both the source and the destination tables to ensure they match.

Data Validation Tool (DVT) is an open source tool that can connect to data stores and perform checks between the source and Spanner. We recommend using it to perform basic validations as a part of your migration, such as the following:

  • Check that all tables were created and that all schema mappings are correct .
  • Match the count of rows for each table.
  • Extract random rows to verify correctness.
  • Validate your columns (count, sum, avg, min, max, group by).
  • Compare any cyclic redundancy checks or hash functions at the row level.

To perform more specific validations, build custom checks during migration.

Configure cutover and failover mechanisms

Migrations are often time consuming and complex. Build in fallbacks to avoid significant impact in case of an error during migration, allowing you to switch back to the source database with minimal downtime.

The current recommendation is to consume change streams to perform reverse replication, and write back to the source database through a stream like Pub/Sub or Cloud Storage.

Diagram shows cutover process.

The reverse replication needs to do the following:

  • Handle changes in data types or content.
  • Reverse any transformations performed during the migration.
  • Push the data to the appropriate destination, taking into account sharding schemes on source.