Tips for migrating from MySQL to Cloud Spanner
Stefan Serban
Software Engineering Manager
Since its initial release in 1995, MySQL has not only been the de-facto database for many data storage needs, but has gathered much attention over the years as one of the most well-recognized databases around when it comes to Relational Database Management Systems and transactional data processing.
Any application in retail, e-commerce, or banking has likely had its fair share of business depending on a relational database for its transactional needs. Many of these applications are built on MySQL due to its flexibility, open source nature, and strong community support.
Whether in the context of a migration from a relational database (like MySQL or PostgreSQL) , migration from a NoSQL database (like Cassandra), or a green grass workload, many companies turn to Cloud Spanner seeking a high availability SLA (99.99% for regional instances and 99.999% for multi region instances), unlimited scale, and low operational overhead - no patching required, no maintenance or other planned downtimes, just to name a few benefits.
The tooling and open source ecosystem around Spanner has evolved and grown ever since the service was introduced. HarbourBridge is a part of this ecosystem, and it’s meant to help customers port their existing MySQL or PostgreSQL schema to a Cloud Spanner schema.
Application Migration Tips
Despite helpful tools like HarbourBridge, database migrations are never trivial. Here are a few things to pay attention to when migrating from MySQL to Spanner, and how to update your application logic to address them.
Note - The following snippets use the PHP client for Cloud Spanner. A couple of the snippets reference a partial Magento port that our friends over at Searce have been working on. Once you understand the operations, you should be able to implement the same in any of the other languages that Cloud Spanner supports.
Cloud spanner enforces strict data types
In a MySQL query, the value of an attribute can be referenced as either a string or an integer. Example:
select * from catalog_eav_attribute where attribute_id = 46;
select * from catalog_eav_attribute where attribute_id = "46";
Both are valid and equivalent.
In Cloud Spanner, the query will return an error if you try to reference an integer type by using a string representation.
Here is an example of a working query from Cloud Spanner:
select * from catalog_eav_attribute where attribute_id = 46; -- this will work
Select * from catalog_eav_attribute where attribute_id = “46” -- will fail, since we are supplying a string with “46”
You may use a function like the following to assist you with such transformations.
Using sequences in primary keys is not a Spanner best practice
Cloud Spanner does not implement a sequence generator, and it is not a Spanner best practice to use sequential IDs because doing so can cause hotspotting.
An alternative mechanism of generating a unique primary key is to use a UUID or any other similar mechanisms that result in non-sequential values. For more information, please refer to this article.
In order to convert all existing primary keys to a UUID pattern, you can change the schema using the snippet here
You may use this code snippet to modify the application code to generate the UUID for the auto increment. There are other ways to do this as well, such as using PHP built-in uniqid function.
Implicit casting and the need for explicit casting of field data types
Both MySQL and Cloud Spanner follow the SQL standard, hence much of the query syntax is the same. One notable difference is that the Cloud Spanner field types are implicitly cast to an appropriate data type out of the ones mentioned here. When implicit casting of the field type fails, Spanner returns a read error, so it would be safer to perform the casting to the appropriate type when issuing the SELECT statement.
Modify application code to cast to respective data type before execution of query.
Modify the application code to cast the column to its respective type
Using interleaved tables to improve read performance
Cloud Spanner's table interleaving is a great choice for many parent-child relationships where the child table's primary key includes the parent table's primary key columns. Interleaving ensures that child rows are collocated with their parent rows, which can significantly improve query performance.
Refer to the statements here for a few samples of creating interleaved tables. To learn more about table interleaving, visit the documentation.
Conclusion
Database migrations are complicated. Hopefully, using HarbourBridge and the list of tips in this article can make that task easier. For additional tips, please take a look at this migration guide and read more about HarbourBridge.