Jump to Content
Databases

New Cassandra to Spanner adapter simplifies Yahoo's migration journey

November 18, 2024
Nitin Sagar

Sr. Product Manager, Google Cloud

Eike Falkenberg

Engineering Manager, Google

Join us at Google Cloud Next

Early bird pricing available now through Feb 14th.

Register

Cassandra, a key-value NoSQL database, is prized for its speed and scalability, and used broadly for  applications that require rapid data retrieval and storage such as caching, session management, and real-time analytics. Its simple key-value pair structure helps ensure high performance and easy management, especially for large datasets. 

But this simplicity also leads to limitations like poor support for complex queries, potential data redundancy, and difficulty in modeling intricate relationships. Spanner, Google Cloud’s always-on, globally consistent, and virtually unlimited-scale database, combines the scalability and availability of NoSQL with the strong consistency and relational model of traditional databases, positioning it for traditional Cassandra workloads. And today, it’s easier than ever to switch from Cassandra to Spanner, with the introduction of the Cassandra to Spanner Proxy Adapter, an open-source tool for plug-and-play migrations of Cassandra workloads to Spanner, without any changes to the application logic.

Spanner for NoSQL workloads

Spanner provides strong consistency, high availability, virtually unlimited scalability, and a familiar relational data model with support for SQL and ACID transactions for data integrity. As a fully managed service, it helps simplify operations, allowing teams to focus on application development rather than database administration. Furthermore, Spanner's high availability, even at a massive global scale, supports business continuity by minimizing database downtime.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_G0mCmB0.max-1600x1600.png

We’re constantly evolving Spanner to meet the needs of modern businesses. Some of the latest Spanner capabilities include enhanced multi-model capabilities such as graph, full-text search, vector search, improved performance for analytical queries with Spanner Data Boost, and unique enterprise features such as geo-partitioning and dual-region configurations. For Cassandra users, these powerful features, along with Spanner’s compelling price-performance, unlock a world of new, exciting possibilities.

The Cassandra to Spanner adapter — battle-tested by Yahoo!

If you’re wondering, “Spanner sounds like a leap forward from Cassandra. How do I get started?” the proxy adapter provides a plug-n-play way to forward your client applications' Cassandra Query Language (CQL) traffic to Spanner. Under the hood, the adapter functions as a Cassandra client for the application but operates internally by interacting with Spanner for all data manipulation tasks. With the Cassandra to Spanner proxy adapter there is no migration for your application code needed — it just works! 

Yahoo successfully migrated from Cassandra to Spanner, reaping the benefits of improved performance, scalability, consistency, and operational efficiency. And the proxy adapter made it easy to migrate. 

“The Cassandra Adapter has provided a foundation for migrating the Yahoo Contacts workload from Cassandra to Spanner without changing any of our CQL queries. Our migration strategy has more flexibility, and we can focus on other engineering activities while utilizing the scale, redundancy, and support of Spanner without updating the codebase. Spanner is cost-effective for our specific needs, delivering the performance required for a business of our scale. This transition enables us to maintain operational continuity while optimizing cost and performance.” - Patrick JD Newnan, Principal Product Manager, Core Mail and Analytics, Yahoo 

Another Google Cloud customer that successfully migrated from Cassandra to Spanner recently is Reltio. Reltio benefited from an effortless migration process to minimize downtime and disruption to their services while reaping the benefits of a fully managed, globally distributed, and strongly consistent database.

These success stories demonstrate that migrating from Cassandra to Spanner can be a transformative step for businesses seeking to modernize their data infrastructure, unlock new capabilities, and accelerate innovation.

How does the new proxy adapter simplify your migration? A typical database migration involves the following steps:

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_t87AI4p.max-1200x1200.png

Some of these steps — migrate your application (step 4) and migrate the data (step 6) — are more complex than others. The proxy adapter vastly simplifies migrating a Cassandra-backed application to point to Spanner. Here's a high-level overview of the steps involved when using the new proxy adapter:

1. Assessment: Evaluate your Cassandra schema, data model, and query patterns which ones you can simplify after moving to Spanner. 

2. Schema design: Spanner’s table declaration syntax and data types are similar to Cassandra’s; the documentation covers these similarities and differences in depth. With Spanner, you can also take advantage of relational capabilities and features like interleaved tables for optimal performance.

3. Data migration: There are several steps to migrate your data:

  • Bulk load: Export data from Cassandra and import it into Spanner using tools like the Spanner Dataflow connector or BigQuery reverse ETL.
  • Replicate incoming data: Replicate incoming updates to your Cassandra cluster to Spanner in real-time using Cassandra’s Change Data Capture (CDC).

    Another possibility is to update your application logic to perform dual-writes to Cassandra and Spanner. We don’t recommend this approach if you’re trying to minimize changes to your application code.

4. Set up the proxy adapter and update your Cassandra configuration: Download and run the Cassandra to Spanner Proxy Adapter, which runs as a sidecar next to your application. By default, the proxy adapter runs on port 9042. In case you decide to use a different port, don’t forget to update your application code to point to the proxy adapter.

5. Testing: Thoroughly test your migrated application and data in a non-production environment to ensure everything works as expected.

6. Cutover: Once you're confident in the migration, switch your application traffic to Spanner. Monitor closely for any issues and fine-tune performance as needed.

What’s under the hood of the new proxy adapter?

The new proxy adapter presents itself as a Cassandra client to the application. From the application's perspective, the only noticeable change is the IP address or hostname of the Cassandra endpoint, which now points to the proxy adapter. This streamlines the Spanner migration, without requiring extensive modifications to application code.

https://storage.googleapis.com/gweb-cloudblog-publish/images/3a.max-1600x1600.png

We designed the proxy adapter to establish a one-to-one mapping between each Cassandra cluster and a corresponding Spanner database. The proxy instance employs a multi-listener architecture, with each listener bound to a distinct port. This facilitates concurrent handling of multiple client connections, where each listener manages a distinct connection with the specified Spanner database. 

The proxy’s translation layer handles the intricacies of the Cassandra protocol. This layer performs message decoding and encoding, manages buffers and caches, and crucially, parses incoming CQL queries and translates them into Spanner-compatible equivalents.

The proxy adapter supports OpenTelemetry to collect and export traces to Cloud Trace

For more details about different ways of setting up the adapter, limitations, mapping of CQL data types to Spanner, and more, refer to the proxy adapter documentation.

Addressing common concerns and challenges

Let's address a few concerns you may have with your migrations:

  • Cost: Have a look at Accenture’s benchmark result that demonstrates that Spanner ensures not only consistent latency and throughput but also cost efficiency. Furthermore, Spanner now offers a new tiered pricing model (Spanner editions) that delivers better cost transparency and cost savings opportunities to help you take advantage of all of Spanner’s capabilities.

  • Latency increases: To minimize an increase in query latencies, we recommend running the proxy adapter on the same host as the client application (as a side-car proxy) or running on the same Docker network when running the proxy adapter in a Docker container. We also recommend keeping the CPU utilization of the proxy adapter host to under 80%.

  • Schema flexibility: While Cassandra offers schema flexibility, Spanner's stricter relational schema provides advantages in terms of data integrity, query power, and consistency.

  • Learning curve: Spanner’s data types have some differences with Cassandra’s. Have a look at this comprehensive documentation that can ease the transition.

Get started today 

The benefits of strong consistency, simplified operations, enhanced data integrity, and global scalability make Spanner a compelling option for businesses looking to leverage the cloud's full potential for NoSQL workloads. With the new Cassandra to Spanner proxy adapter, we are making it easier to plan and execute on your migration strategy, so you can unlock a new era of data-driven innovation for your organization.

Download the new Cassandra to Spanner proxy adapter, and try it out on a Spanner Free Trial instance at no cost today.

Posted in