How FFN accelerated their migration to a fully managed database in the cloud
Principal Engineer, Freedom Financial Network
Editor’s note: We’re hearing today from Freedom Financial Network, provider of technology-based solutions that help consumers establish financial security and overcome debt. To speed their migration to Google Cloud SQL, they turned to our Database Migration Service.
Freedom Financial Network’s products and services have helped hundreds of thousands of consumers reduce and consolidate their debt. During a period of significant growth, we realized that we needed to drive growth by transitioning from a monolithic to a microservices architecture on Google Cloud, helping us expand our suite of consumer products. As part of that migration, Google’s Database Migration Service (DMS) accelerated a move to fully managed Cloud SQL that took hours, not days. We had initially planned for two to three hours of downtime—but DMS let us migrate a total of 1 TB of data with no more than ten minutes of downtime per application we moved.
Starting from an on-premises infrastructure
Before moving to Google’s data cloud, our system was hosted on Rackspace, with an architecture divided by three business units. Each unit had one highly available MySQL cluster of about 600 GB of space with a SAN at the end, with 1.8 TB of shared disk space split evenly across the three clusters. Each of the three clusters consisted of two servers running MySQL, though only one was active at a time. They were configured for auto-failover, so if one of the servers failed, it would switch to the other active one. The intention of the division of the three business units was that, if one needed a database, they’d set it up in their own cluster. So each unit was essentially a monolith, containing three to four systems.
These were mostly InnoDB databases on MySQL with a variety of sizes and uses of application, including many internal systems to support our call center agents. The largest of these systems had about 500 GB of usage in just one schema. We also had some supplementary systems and public-facing websites.
We’ve had those clusters since 2013, and the three business units were always managed by a small team of two of us. Each business unit was essentially a monolith, even if technically each was split into three to four services. Part of our drive to transition to a microservices architecture was to help us manage communications between our various business units.
With Rackspace, even something small like changing the size of the disk could take a while. We needed to submit a support ticket for them to allocate and configure the disk. It’s a manual intervention, after all, and it would take two to three weeks to update.
Considering a cloud migration
When it came to determining whether to migrate to the cloud, and which provider and tools to use, our clusters were a big early consideration. They were vastly over-provisioned, with way more resources than we needed for the reason described above.
With Cloud SQL, we really liked the ability to split the clusters and size them appropriately. We’d also have more flexibility, since we wanted to migrate to a higher MySQL version. Because of the structure of the clusters, upgrading in place would have required significant time and effort. We would have had to update one cluster at a time, with each cluster probably affecting 60-70 percent of our engineering teams. Just trying to coordinate that would have been a nightmare. So Cloud SQL gave us the ability to do it more slowly, team by team or app by app, once we moved them to different database instances.
Choosing Database Migration Service
At first, we didn’t consider an auto-migrating solution because there wasn’t a lot of complexity in the migration, especially for our small databases. We’d just have to bring the application down, export the database, import it to Cloud SQL, and bring it back up. That worked for most of our applications.
But toward the end of the process, two applications remained. The previous year, we’d tried the migration process with one of them, and it had stalled at the database, because we couldn’t make the dump and load process quick enough. It would have required 12–15 hours downtime, and that just wasn’t an option, as the application was the backbone of our business. That much downtime would have made a tangible impact on our business, not just financially. We needed a new solution.
Through conversations with our product team and our Google contact, we learned about the Google Cloud’s Database Migration Service (which was in private preview at the time), which provides a serverless migration experience from MySQL to Cloud SQL for MySQL with continuous data replication, at no additional cost. Before DMS, we’d been looking at other options—offline migration, external master (pre-DMS solution on Google Cloud), just to name a few, none of which would have worked for us.
Testing, then migrating
At first with DMS, we performed a test run with staging databases just to validate that it would work. Once we corrected some problems with our usage and got it all configured correctly, it worked exactly as it was supposed to. Then we started the process. For that one backbone application team, they launched their migration again, while in the background I set up the replicas of their production instance so that they could manage the staging.
In Rackspace, our applications were running on virtual machines (VMs) and connecting to our MySQL instances. Part of the move to Google Cloud was also to migrate applications to containers on Google Kubernetes Engine (GKE).
We performed three migrations with DMS, one of which involved four applications. Each time, the application teams deployed the applications to the future production environment on GKE, but they were not marked as live. We would test the applications with a Cloud SQL instance that had staging data that was brought in using DMS to test that the applications were running correctly. Then we would initiate migration with DMS and once the environments were in sync and we scheduled the cutover date, we would just bring down the applications on Rackspace and update the DNS records to point to Google Cloud.
With all three clusters, we migrated five logical databases of varying sizes, between 240–500 GB of data, for a total of around 1 TB of data.
Downtimes of minutes, not hours
Migrations were much faster with DMS than we expected. From the time that we told DMS to dump and load to the Cloud SQL instance, to completion, they were all done and fully synchronized within 12–13 hours. We’d kick one off in the afternoon, and by the time we got back the next morning, it was done. We’d actually been setting aside a few days for this task, so this was a great improvement.
Initially, when planning the migration, we figured that a downtime of 2–3 hours might have been workable—not ideal, but workable. But once we were up to speed with our competence on DMS, the actual downtime for each application from the database side was a maximum of ten minutes. This was a great improvement for every team in our organization.
DMS had step-by-step instructions that helped us perform the migrations successfully, without any loss of data. With DMS and Google Cloud’s help, we transformed our monolithic architecture to a microservices architecture, deployed on GKE and using the Cloud SQL Proxy in a sidecar container pattern, or the Go proxy library, to connect to Cloud SQL.
We now have a more secure, versatile, and powerful data cloud infrastructure, and because it’s fully managed by Google Cloud, our developers have more time to focus on expanding our customer products.