Migrate from MySQL to Spanner

This article explains how to migrate your Online Transactional Processing (OLTP) database from MySQL to Spanner.

Migration constraints

Spanner uses certain concepts differently from other enterprise database management tools, so you might need to adjust your application's architecture to take full advantage of its capabilities. You might also need to supplement Spanner with other services from Google Cloud to meet your needs.

Stored procedures and triggers

Spanner does not support running user code in the database level, so as part of the migration, business logic implemented by database-level stored procedures and triggers must be moved into the application.

Sequences

Spanner recommends using UUID Version 4 as the default method to generate primary key values. The GENERATE_UUID() function (GoogleSQL, PostgreSQL) returns UUID Version 4 values represented as STRING type.

If you need to generate integer values, Spanner supports bit-reversed positive sequences (GoogleSQL, PostgreSQL), which produce values that distribute evenly across the positive 64-bit number space. You can use these numbers to avoid hotspotting issues.

For more information, see primary key default value strategies.

Access controls

Spanner supports fine-grained access control at the table and column level. Fine-grained access control for views is not supported. For more information, see About fine-grained access control.

Data validation constraints

Spanner supports a limited set of data validation constraints in the database layer. If you need more complex data constraints, you must implement them in the application layer.

The following table discusses the types of constraints commonly found in MySQL databases, and how to implement them with Spanner.

Constraint Implementation with Spanner
Not null NOT NULL column constraint
Unique Secondary index with UNIQUE constraint
Foreign key (for normal tables) See Create and manage foreign key relationships.
Foreign key ON DELETE/ON UPDATE actions Only possible for interleaved tables; otherwise, implemented in the application layer
Value checks and validation through CHECK constraints See Creating and managing check constraints.
Value checks and validation through triggers Implemented in the application layer

Generated columns

Spanner supports generated columns whereby the column value will always be generated by a function supplied as part of the table definition. As in MySQL, generated columns cannot be expressly set to a supplied value in a DML statement.

Generated columns are defined as part of the column definition during a CREATE TABLE or ALTER TABLE Data Definition Language (DDL) statement. The AS keyword is followed by a valid SQL function and the required suffix keyword STORED. The STORED keyword is part of the ANSI SQL specification, and indicates that the function results will be stored along with other columns of the table.

The SQL function, the generation expression, can include any deterministic expression, function, and operator and can be used in secondary indexes or used as a foreign key.

Learn more about how to manage this column type by reviewing Creating and managing generated columns.

Supported data types

MySQL and Spanner support different sets of data types. The following table lists the MySQL data types and their equivalent in Spanner. For detailed definitions of each Spanner data type, see Data types.

You might have to further transform your data as described in the Notes column to make MySQL data fit in your Spanner database. For example, you can store a large BLOB as an object in a Cloud Storage bucket rather than in the database, and then store the URI reference to the Cloud Storage object in the database as a STRING.

MySQL data type Spanner equivalent Notes
INTEGER, INT, BIGINT MEDIUMINT, SMALLINT INT64
TINYINT, BOOL, BOOLEAN BOOL, INT64 TINYINT(1) values are used to represent boolean values of 'true' (nonzero) or 'false' (0).
FLOAT, DOUBLE FLOAT64
DECIMAL, NUMERIC NUMERIC, STRING In MySQL,the NUMERIC and DECIMAL data types support up to a total 65 digits of precision and scale, as defined in the column declaration. The Spanner NUMERIC data type supports up to 38 digits of precision and 9 decimal digits of scale.
If you require greater precision, see Storing arbitrary precision numeric data for alternative mechanisms.
BIT BYTES
DATE DATE Both Spanner and MySQL use the 'yyyy-mm-dd' format for dates, so no transformation is necessary. SQL functions are provided to convert dates to a formatted string.
DATETIME, TIMESTAMP TIMESTAMP Spanner stores time independent of time zone. If you need to store a time zone, you must use a separate STRING column. SQL functions are provided to convert timestamps to a formatted string using time zones.
CHAR, VARCHAR STRING Note: Spanner uses Unicode strings throughout.
VARCHAR supports a maximum length of 65,535 bytes, while Spanner supports up to 2,621,440 characters.
BINARY, VARBINARY, BLOB, TINYBLOB BYTES Small objects (less than 10 MiB) can be stored as BYTES. Consider using alternative Google Cloud offerings such as Cloud Storage to store larger objects
TEXT, TINYTEXT, ENUM STRING Small TEXT values (less than 10 MiB) can be stored as STRING. Consider using alternative Google Cloud offerings such as Cloud Storage to support larger TEXT values.
ENUM STRING Validation of ENUM values must be performed in the application
SET ARRAY<STRING> Validation of SET element values must be performed in the application
LONGBLOB, MEDIUMBLOB BYTES or STRING containing URI to object. Small objects (less than 10 MiB) can be stored as BYTES. Consider using alternative Google Cloud offerings such as Cloud Storage to store larger objects.
LONGTEXT, MEDIUMTEXT STRING (either containing data or URI to external object) Small objects (less than 2,621,440 characters) can be stored as STRING. Consider using alternative Google Cloud offerings such as Cloud Storage to store larger objects
JSON JSON Small JSON strings (less than 2,621,440 characters) can be stored as JSON. Consider using alternative Google Cloud offerings such as Cloud Storage to store larger objects.
GEOMETRY, POINT, LINESTRING, POLYGON, MULTIPOINT, MULTIPOLYGON, GEOMETRYCOLLECTION Spanner does not support Geospatial data types. You must store this data using standard data types, and implement any searching/filtering logic in the application layer.

Migration process

An overall timeline of your migration process would be:

  1. Convert your schema and data model.
  2. Translate any SQL queries.
  3. Migrate your application to use Spanner in addition to MySQL.
  4. Bulk export your data from MySQL and import your data into Spanner using Dataflow.
  5. Maintain consistency between both databases during your migration.
  6. Migrate your application away from MySQL.

Step 1: Convert your database and schema

You convert your existing schema to a Spanner schema to store your data. In order to make application modifications simpler, make sure the converted schema matches the existing MySQL schema as closely as possible. However, because of the differences in features, some changes might be necessary.

Using best practices in schema design can help you increase throughput and reduce hot spots in your Spanner database.

Primary keys

In Spanner, every table that must store more than one row must have a primary key consisting of one or more columns of the table. Your table's primary key uniquely identifies each row in a table, and Spanner uses the primary key to sort table rows. Because Spanner is highly distributed, it's important that you choose a primary key generation technique that scales well with your data growth. For more information, see the primary key migration strategies that we recommend.

Note that after you designate your primary key, you can't add or remove a primary key column, or change a primary key value later without deleting and recreating the table. For more information on how to designate your primary key, see Schema and data model - primary keys.

Interleave your tables

Spanner has a feature where you can define two tables as having a 1-many, parent-child relationship. This feature interleaves the child data rows next to their parent row in storage, effectively pre-joining the table and improving data retrieval efficiency when the parent and children are queried together.

The child table's primary key must start with the primary key column(s) of the parent table. From the child row's perspective, the parent row primary key is referred to as a foreign key. You can define up to 6 levels of parent-child relationships.

You can define on-delete actions for child tables to determine what happens when the parent row is deleted: either all child rows are deleted, or the parent row deletion is blocked while child rows exist.

Here is an example of creating an Albums table interleaved in the parent Singers table defined earlier:

CREATE TABLE Albums (
  SingerId     INT64 NOT NULL,
  AlbumId      INT64 NOT NULL,
  AlbumTitle   STRING(MAX),
) PRIMARY KEY (SingerId, AlbumId)
INTERLEAVE IN PARENT (Singers)
ON DELETE CASCADE;

Create secondary indexes

You can also create secondary indexes to index data within the table outside of the primary key. Spanner implements secondary indexes in the same way as tables, so the column values to be used as index keys will have the same constraints as the primary keys of tables. This also means that indexes have the same consistency guarantees as Spanner tables.

Value lookups using secondary indexes are effectively the same as a query with a table join. You can improve the performance of queries using indexes by storing copies of the original table's column values in the secondary index using the STORING clause, making it a covering index.

Spanner's query optimizer only automatically uses a secondary index when the index itself stores all the columns being queried (a covered query). To force the use of an index when querying columns in the original table, you must use a FORCE INDEX directive in the SQL statement, for example:

SELECT *
FROM MyTable@{FORCE_INDEX=MyTableIndex}
WHERE IndexedColumn=@value

Indexes can be used to enforce unique values within a table column, by defining a UNIQUE index on that column. Adding duplicate values will be prevented by the index.

Here is an example DDL statement creating a secondary index for the Albums table:

CREATE INDEX AlbumsByAlbumTitle ON Albums(AlbumTitle);

If you create additional indexes after your data is loaded, populating the index might take some time. We recommend that you limit the rate at which you add them to an average of three per day. For more guidance on creating secondary indexes, see Secondary indexes. For more information on the limitations on index creation, see Schema updates.

Step 2: Translate any SQL queries

Spanner uses the ANSI 2011 dialect of SQL with extensions, and has many functions and operators to help translate and aggregate your data. Any SQL queries using MySQL-specific dialect, functions, and types will need to be converted to be compatible with Spanner.

Although Spanner doesn't support structured data as column definitions, you can use structured data in SQL queries using ARRAY<> and STRUCT<> types. For example, you could write a query that returns all Albums for an artist using an ARRAY of STRUCTs (taking advantage of the pre-joined data). For more information see the Subqueries section of the documentation.

SQL queries can be profiled using the Spanner Studio page in the Google Cloud console to execute the query. In general, queries that perform full table scans on large tables are very expensive, and should be used sparingly. For more information on optimizing SQL queries, see the SQL best practices documentation.

Step 3: Migrate your application to use Spanner

Spanner provides a set of Client libraries for various languages, and the ability to read and write data using Spanner-specific API calls, as well as by using SQL queries and Data Modification Language (DML) statements. Using API calls might be faster for some queries, such as direct row reads by key, because the SQL statement doesn't have to be translated.

You can also use the Java Database Connectivity (JDBC) driver to connect to Spanner, leveraging existing tooling and infrastructure that does not have native integration.

As part of the migration process, features not available in Spanner as mentioned above must be implemented in the application. For example, a trigger to verify data values and update a related table would need to be implemented in the application using a read/write transaction to read the existing row, verify the constraint, then write the updated rows to both tables.

Spanner offers read/write and read-only transactions, which ensure external consistency of your data. Additionally, read transactions can have Timestamp bounds applied, where you are reading a consistent version of the data either:

  • at an exact time in the past (up to 1 hour ago).
  • in the future (where the read will block until that time arrives).
  • with an acceptable amount of bounded staleness, which will return a consistent view up to some time in the past without needing to check that later data is available on another replica. This can give performance benefits at the expense of possibly stale data.

Step 4: Transfer your data from MySQL to Spanner

To transfer your data from MySQL to Spanner, you must export your MySQL database to a portable file format—for example, XML—and then import that data into Spanner using Dataflow.

transferring data from MySQL to Spanner

Bulk export from MySQL

The mysqldump tool included with MySQL is able to export the entire database into well-formed XML files. Alternatively, you can use the SELECT ... INTO OUTFILE SQL statement to create CSV files for each table. However, this approach has the disadvantage that only one table can be exported at a time, which means that you must either pause your application or quiesce your database so that the database remains in a consistent state for export.

After exporting these data files, we recommend that you upload them to a Cloud Storage bucket so that they are accessible for import.

Bulk import into Spanner

Because database schemas likely differ between MySQL and Spanner, you might need to make some data conversions part of the import process. The easiest way to perform these data conversions and import the data into Spanner is by using Dataflow. Dataflow is the Google Cloud distributed extract, transform, and load (ETL) service. It provides a platform for running data pipelines written using the Apache Beam SDK in order to read and process large amounts of data in parallel over multiple machines.

The Apache Beam SDK requires you to write a simple Java program to set read, transform and write the data. Beam connectors exist for Cloud Storage and Spanner, so the only code that you need to write is the data transform itself.

For an example of a simple pipeline that reads from CSV files and writes to Spanner, see the sample code repository.

If you use parent-child interleaved tables in your Spanner schema, take care in the import process that the parent row is created before the child row. The Spanner import pipeline code handles this by importing all data for root-level tables first, then all the level 1 child tables, then all the level 2 child tables, and so on.

You can use the Spanner import pipeline directly to bulk import your data, but this approach requires your data to exist in Avro files using the correct schema.

Step 5: Maintain consistency between both databases

Many applications have availability requirements that make it impossible to keep the application offline for the time required to export and import your data. Therefore, while you are transferring your data to Spanner, your application continues to modify the existing database. So it is necessary to duplicate updates to the Spanner database while the application is running.

There are various methods of keeping your two databases in sync, including change data capture, and implementing simultaneous updates in the application.

Change data capture

MySQL doesn't have a native change data capture (CDC) utility. However, there are various open source projects that can receive MySQL binlogs and convert them into a CDC stream. For example Maxwell's daemon can provide a CDC stream for your database.

You can write an application that subscribes to this stream and applies the same modifications (after data conversion, of course) to your Spanner database.

Simultaneous updates to both databases from the application

An alternative method is to modify your application to perform writes to both databases. One database (initially MySQL) would be considered the source of truth, and after each database write, the entire row is read, converted and written to the Spanner database. In this way, the application constantly overwrites the Spanner rows with the latest data.

When you're confident that all your data has been transferred correctly, you can switch the source of truth to the Spanner database. This mechanism provides a rollback path if issues are found when switching to Spanner.

Verify data consistency

As data streams into your Spanner database, you can periodically run a comparison between your Spanner data and your MySQL data to make sure that the data is consistent. You can validate consistency by querying both data sources and comparing the results.

You can use Dataflow to perform a detailed comparison over large data sets by using the Join transform. This transform takes 2 keyed data sets, and matches the values by key. The matched values can then be compared for equality. You can regularly run this verification until the level of consistency matches your business requirements.

Step 6: Switch to Spanner as your application's source of truth

When you are confident in the data migration, you can switch your application to using Spanner as the source of truth. If you continue writing back changes to the MySQL database, this keeps the MySQL database up to date, giving a rollback path should issues arise.

Finally, you can disable and remove the MySQL database update code and shut down the now-obsolete MySQL database.

Export and import Spanner databases

You can optionally export your tables from Spanner to a Cloud Storage bucket using a Dataflow template to perform the export. The resulting folder contains a set of Avro files and JSON manifest files containing your exported tables. These files can serve various purposes, including:

  • Backing up your database for data-retention policy compliance or disaster recovery.
  • Importing the Avro file into other Google Cloud offerings such as BigQuery.

For more information on the export and import process, see Exporting databases and Importing databases.

What's next