Databases

Cloud Spanner myths busted

February 17, 2022

https://storage.googleapis.com/gweb-cloudblog-publish/images/spanner.max-2600x2600.jpg

Pritam Shah

Director of Engineering

Intro to Cloud Spanner

Cloud Spanner is an enterprise-grade, globally distributed, externally consistent database that offers unlimited scalability and industry-leading 99.999% availability. It requires no maintenance windows and offers a familiar PostgreSQL interface. It combines the benefits of relational databases with the unmatched scalability and availability of non-relational databases.

As organizations modernize and simplify their tech stack, Spanner provides a unique opportunity to transform the way they think about and use databases as part of building new applications and customer experiences.

But choosing a database for your workload can be challenging; there are so many options in the market and each one has a different onboarding and operating experience. At Google Cloud we know it’s hard to navigate this choice and are here to help you. In this blog post, I want to bust the seven most common misconceptions that I regularly hear about Spanner so that you can confidently make your decision.

Myth #1 Only use Spanner if you have a massive workload

The truth is that Spanner powers Google’s most popular, globally available products, like YouTube, Drive, and Gmail, and has enabled many large scale transformations including that of Uber, Niantic and Sharechat. It is also true that Spanner processes more than 1 Billion queries per second at peak.

At the same time, many customers also use Spanner for their smaller workloads (both in terms of transactions per second and storage size) for availability and scalability reasons. For example, Google Password Manager has small workloads that run on Spanner. These customers cannot tolerate downtime, require high availability to power their applications and seek scale insurance for future growth scenarios.

Limitless scalability with the highest availability is critical in many industry verticals such as gaming and retail, especially when a newly launched game goes viral and becomes an overnight success or when a retailer has to handle a sudden surge in traffic due to a Black Friday/Cyber Monday sale.

Regardless of workload size, every customer on the journey to the cloud wants the benefits of scalability and availability while reducing the operational burden and the costs associated with patching, upgrades and other maintenance.

Myth #2 Spanner is too expensive

The truth is, when looking at the cost of a database, it is better to consider Total Cost of Ownership (TCO) and the value it offers rather than the raw list price. We deliver significant value to our customers starting at this price including critical things like availability, price performance, and reduced operational costs.

Availability: Spanner provides high availability and reliability by synchronously replicating data. When it comes to Disaster Recovery, Spanner offers 0-RPO and 0-RTO for zonal failures in case of a regional instance and regional failure in case of multi-regional instances. Less downtime, more revenue!

Price-performance: Spanner offers one of the industry’s leading price-performance ratios which makes it a great choice if you are running a demanding, performance sensitive application. Great customer experiences require consistent, optimal latencies!

Reduced operational cost: With Spanner, customers enjoy zero downtime upgrades and schema changes, and no maintenance windows. Sharding is automatically handled so the challenges associated with scaling up traditional databases don't exist. Spend more time innovating, and less time administering!

Security & Compliance: By default, Spanner already offers encryption for data-in-transit via its client libraries and for data-at-rest using Google-managed encryption keys. CMEK support for Spanner lets you now have complete control of the encryption keys. Spanner also provides VPC Service Controls support and has compliance certifications and necessary approvals so that it can be used for workloads requiring ISO 27001, 27017, 27018, PCI DSS, SOC1|2|3, HIPAA and FedRAMP.
Committed Use Discounts: Spanner has introduced committed use discounts(CUDs) to provide deeply discounted prices in exchange for your commitment to continuously use Spanner compute capacity for a year or longer. You can reduce your costs by purchasing either a One-year CUD that provides a 20% discount or a three-year CUD that provides a 40% discount. Spanner CUDs provide full flexibility in terms of how discounts are applied; you can get discounts on all projects, regions, and multi-regions associated with a billing account with the same commitment .

With Spanner, you have peace of mind knowing that your data’s security, availability and reliability won’t be compromised.

And best of all, with the introduction of Granular Instance Sizing, you can now get started for as little as $65/month and unlock the tremendous value spanner offers.

Pro tip : Use the auto-scaler to right size your Spanner instances. Take advantage of TTL to reduce the amount of data stored.

Myth #3 You have to make a trade off between scale, consistency, and latency

The truth is, depending on the use case and instance configuration, users can use Spanner such that they don’t have to pick between consistency, latency and scale.

To provide strong data consistency, Spanner uses a synchronous, Paxos-based replication scheme, in which replicas acknowledge every write request. A write is committed when a majority of the replicas (e.g 2 out of 3), called a quorum, agree to commit the write. In the case of regional instances, the replicas are within the region and hence the writes are faster than in the case of multi-region instances, where the replicas are distributed across multiple regions. In the latter case, forming a quorum on writes can result in slightly higher latency. Nevertheless, Spanner multi-regions are carefully designed in geographical configurations that ensure that the replicas can communicate fast enough and write latencies are acceptably low.

A read can be served strong (by default) or stale. A strong read is a read at a current timestamp and is guaranteed to see all the data that has been committed up until the start of the read. A stale read is a read executed at a timestamp in the past. In case of a strong read, the serving replica will guarantee that you will see all data that has been committed up until the start of the read. In some cases, this means that the serving replica has to contact the leader to ensure that it has the latest data. In case of a multi-region instance where the read is served from a non-leader replica, this would mean that read latency can be slightly higher than if it was served from a leader region. Stale reads are performed over data that was committed at a timestamp in the past and can, therefore, be served at very low latencies by the closest replica that is caught up until that timestamp. If your application is latency sensitive, stale reads may be a good option and we recommend using a stale read value of 15 seconds.

Myth #4 Spanner does not have a familiar interface

The truth is that Spanner offers the flexibility to interact with the database via a SQL dialect based on ANSI 2011 standard as well as via a REST or gRPC API interface, which are optimized for performance and ease-of-use. In addition to Spanner’s interface, we recently introduced a PostgreSQL interface for Spanner, that leverages the ubiquity of PostgreSQL to meet development teams using an interface that they are familiar with. The PostgreSQL interface provides a rich subset of the open-source PostgreSQL SQL dialect, including common query syntax, functions, and operators. It supports a core collection of open-source PostgreSQL data types, DDL syntax, and information schema views. You get the PostgreSQL familiarity, and relational semantics at Spanner scale.

Learn more about our PostgreSQL interface here.

Myth #5 The only way to get observability data is via the Spanner Console

The truth is that Spanner client libraries support OpenCensus Tracing and Metrics, which gives insight into the client internals and aids in debugging production issues. For instance, client-side traces and metrics include sessions and transactions related information.

Spanner also supports the OpenTelemetery receiver, which provides an easy way for you to process and visualize metrics from Cloud Spanner System tables, and export these to the Application Monitoring (APM) tool of your choice. This could be either an open source combination of a time-series database like Prometheus coupled with a Grafana dashboard, or it could be a commercial offering like Splunk, Datadog, Dynatrace, NewRelic or AppDynamics. We’ve also published reference Grafana dashboards, so that you can debug the most common user journeys such as “Why is my tail latency high” or “Why do I see a CPU spike when my workload did not change”. Here is a sample docker service, to show how the Cloud Spanner receiver can work with Prometheus exporter and Grafana dashboards.

We are continuing to embrace open standards, and continuing to integrate with our partner ecosystem. We also continue to evolve the observability experience offered by the Google console so that our customers get the best experience wherever they are.

Myth #6 Spanner is only for global workloads requiring copies in multiple regions

The truth is that, while Spanner offers a range of multi-region instance configurations, it also offers regional configuration in each GCP region. Each regional node is replicated in 3 zones within the region, while a multi-regional node is replicated at least 5 times across multiple regions. A regional configuration offers 4 nines of availability and protection against zonal failures.

Typically, multi-regional instance configurations are indicated if your application runs workloads in multiple geographical locations or your business needs 99.999% of availability and protection against regional failures. Learn more here.

Myth #7 Spanner schema changes require expensive locks

The truth is that Spanner never has table level locks. Spanner uses a multi-version concurrency control architecture to manage concurrent versions of schema and data allowing ad-hoc and online qualified schema changes that do not require any downtime, additional tools, migration pipelines or complex rollback/backup plans. When issuing a schema update you can continue writing and reading from the database without interruption while Spanner backfills the update, whether you have 10 rows or 10 billion rows in your table.

The same mechanism can be used for Point-in-time recovery (PITR) and snapshot queries using stale reads to restore both schema and the state of data at a given query-condition and timestamp up to a maximum of seven days.

Now that we’ve learned the truth about Cloud Spanner, I invite you to get started - visit our website.

Databases