Spanner for Cassandra users

This document compares Apache Cassandra and Spanner concepts and practices. It assumes you're familiar with Cassandra and want to migrate existing applications or design new applications while using Spanner as your database.

Cassandra and Spanner are both large-scale distributed databases built for applications requiring high scalability and low latency. While both databases can support demanding NoSQL workloads, Spanner provides advanced features for data modeling, querying, and transactional operations. For more information about how Spanner meets NoSQL database criteria, see Spanner for non-relational workloads.

Migrate from Cassandra to Spanner

To migrate from Cassandra to Spanner, you can use the Cassandra to Spanner Proxy Adapter. This open source tool lets you migrate workloads from Cassandra or DataStax Enterprise (DSE) to Spanner without any changes to your application logic.

Core concepts

This section compares key Cassandra and Spanner concepts.

Terminology

Cassandra Spanner
Cluster Instance

A Cassandra cluster is equivalent to a Spanner instance - a collection of servers and storage resources. Because Spanner is a managed service, you don't have to configure the underlying hardware or software. You only need to specify the amount of nodes you want to reserve for your instance or choose autoscaling to automatically scale the instance. An instance acts like a container for databases, and data replication topology (regional, dual-region, or multi-region) is chosen at the instance level.
Keyspace Database

A Cassandra keyspace is equivalent to a Spanner database, which is a collection of tables and other schema elements (for example, indexes and roles). Unlike a keyspace, you don't need to configure the replication factor. Spanner automatically replicates your data to the region designated in your instance.
Table Table

In both Cassandra and Spanner, tables are a collection of rows identified by a primary key specified in the table schema.
Partition Split

Both Cassandra and Spanner scale by sharding data. In Cassandra, each shard is called a partition, while in Spanner, each shard is called a split. Cassandra uses hash-partitioning, which means that each row is independently assigned to a storage node based on a hash of the primary key. Spanner is range-sharded, which means that rows that are contiguous in primary key space are contiguous in storage as well (except at split boundaries). Spanner takes care of splitting and merging based on load and storage, and this is transparent to the application. The key implication is that unlike Cassandra, range scans over a prefix of the primary key is an efficient operation in Spanner.
Row Row

In both Cassandra and Spanner, a row is a collection of columns identified uniquely by a primary key. Like Cassandra, Spanner supports composite primary keys. Unlike Cassandra, Spanner doesn't make a distinction between partition key and sort key, because data is range-sharded. One can think of Spanner as only having sort keys, with partitioning managed behind the scenes.
Column Column

In both Cassandra and Spanner, a column is a set of data values that have the same type. There is one value for each row of a table. For more information about comparing Cassandra column types to Spanner, see Data Types.

Architecture

A Cassandra cluster consists of a set of servers and storage colocated with those servers. A hash function maps rows from a partition key space to a virtual node (vnode). A set of vnodes is then randomly assigned to each server to serve a portion of the cluster key space. Storage for the vnodes is locally attached to the serving node. Client drivers connect directly to the serving nodes and handle load balancing and query routing.

A Spanner instance consists of a set of servers in a replication topology. Spanner dynamically shards each table into row ranges based on CPU and disk usage. Shards are assigned to compute nodes for serving. Data is physically stored on Colossus, Google's distributed file system, separate from the compute nodes. Client drivers connect to Spanner's frontend servers which perform request routing and load balancing. To learn more, see the Life of Spanner Reads & Writes whitepaper.

At a high level, both architectures scale as resources are added to the underlying cluster. Spanner's compute and storage separation allows faster rebalancing of load between compute nodes in response to workload changes. Unlike Cassandra, shard moves don't involve data moves as the data stays on Colossus. Moreover, Spanner's range-based partitioning might be more natural for applications that expect data to be sorted by partition key. The flip-side of range-based partitioning is that workloads that write to one end of the key space (for example, tables keyed by current timestamp) might face hotspotting without additional schema design consideration. For more information about techniques for overcoming hotspotting, see Schema design best practices.

Consistency

With Cassandra, you must specify a consistency level for each operation. If you use the quorum consistency level, a replica node majority must respond to the coordinator node for the operation to be considered successful. If you use a consistency level of one, Cassandra needs a single replica node to respond for the operation to be considered successful.

Spanner provides strong consistency. The Spanner API does not expose replicas to the client. Spanner's clients interact with Spanner as if it were a single machine database. A write is always written to a majority of replicas before being acknowledged to the user. Any subsequent reads reflects the newly written data. Applications can choose to read a snapshot of the database at a time in the past, which might have performance benefits over strong reads. For more information about the consistency properties of Spanner, see the Transactions overview.

Spanner was built to support the consistency and availability needed in large scale applications. Spanner provides strong consistency at scale and with high performance. For use cases that require it, Spanner supports snapshot reads that relax freshness requirements.

Data modeling

This section compares Cassandra and Spanner data models.

Table declaration

Table declaration syntax is fairly similar across Cassandra and Spanner. You specify the table name, column names and types, and the primary key which uniquely identifies a row. The key difference is that Cassandra is hash-partitioned and makes a distinction between partition key and sort key, whereas Spanner is range-partitioned. Spanner can be thought of as only having sort keys, with partitions automatically maintained behind the scenes. Like Cassandra, Spanner supports composite primary keys.

Single primary key part

The difference between Cassandra and Spanner is in the type names and the location of the primary key clause.

Cassandra Spanner
CREATE TABLE users (
  user_id    bigint,
  first_name text,
  last_name  text,
  PRIMARY KEY (user_id)
)
    
CREATE TABLE users (
  user_id    int64,
  first_name string(max),
  last_name  string(max),
) PRIMARY KEY (user_id)
    

Multiple primary key parts

For Cassandra, the first primary key part is the "partition key" and the subsequent primary key parts are "sort keys". For Spanner, there is no separate partition key. Data is stored sorted by the entire composite primary key.

Cassandra Spanner
CREATE TABLE user_items (
  user_id    bigint,
  item_id    bigint,
  first_name text,
  last_name  text,
  PRIMARY KEY (user_id, item_id)
)
    
CREATE TABLE user_items (
  user_id    int64,
  item_id    int64,
  first_name string(max),
  last_name  string(max),
) PRIMARY KEY (user_id, item_id)
    

Composite partition key

For Cassandra, partition keys can be a composite. There is no separate partition key in Spanner. Data is stored sorted by the entire composite primary key.

Cassandra Spanner
CREATE TABLE user_category_items (
  user_id     bigint,
  category_id bigint,
  item_id     bigint,
  first_name  text,
  last_name   text,
  PRIMARY KEY ((user_id, category_id), item_id)
)
    
CREATE TABLE user_category_items (
  user_id     int64,
  category_id int64,
  item_id     int64,
  first_name  string(max),
  last_name   string(max),
) PRIMARY KEY (user_id, category_id, item_id)
    

Data types

This section compares Cassandra and Spanner data types. For more information about Spanner types, see Data types in GoogleSQL.

Cassandra Spanner
Numeric Types Standard integers:

bigint (64-bit signed integer)
int (32-bit signed integer)
smallint (16-bit signed integer)
tinyint (8-bit signed integer)
int64 (64-bit signed integer)

Spanner supports a single 64-bit wide data type for signed integers.
Standard floating point:

double (64-bit IEEE-754 floating point)
float (32-bit IEEE-754 floating point)
float64 (64-bit IEEE-754 floating point)
float32 (32-bit IEEE-754 floating point)
Variable precision numbers:

varint (variable precision integer)
decimal (variable precision decimal)
For fixed precision decimal numbers, use numeric (precision 38 scale 9). Otherwise, use string in conjunction with an application layer variable precision integer library.
String Types text
varchar
string(max)

Both text and varchar store and validate for UTF-8 strings. In Spanner, string columns need to specify their maximum length (there is no impact on storage; this is for validation purposes).
blob bytes(max)

To store binary data, use the bytes data type.
Date and Time Types date date
duration int64

Spanner doesn't support a dedicated duration data type. Use int64 to store nanosecond duration.
time int64

Spanner doesn't support a dedicated time-within-day data type. Use int64 to store nanosecond offset within a day.
timestamp timestamp
Container Types User defined types json or proto
list array

Use array to store a list of typed objects.
map json or proto

Spanner doesn't support a dedicated map type. Use json or proto columns to represent maps. For more information, see Store large maps as interleaved tables.
set array

Spanner doesn't support a dedicated set type. Use array columns to represent a set, with the application managing set uniqueness. For more information, see Store large maps as interleaved tables, which can also be used to store large sets.

Basic usage patterns

The following code examples show the difference between Cassandra and Spanner client code in Go. For more information, see Spanner client libraries.

Client initialization

In Cassandra clients, you create a cluster object representing the underlying Cassandra cluster, instantiate a session object which abstracts a connection to the cluster, and issue queries on the session. In Spanner, you create a client object bound to a specific database, and issue database requests on the client object.

Cassandra example

Go

import "github.com/gocql/gocql"

...

cluster := gocql.NewCluster("<address>")
cluster.Keyspace = "<keyspace>"
session, err := cluster.CreateSession()
if err != nil {
  return err
}
defer session.Close()

// session.Query(...)

Spanner example

Go

import "cloud.google.com/go/spanner"

...

client, err := spanner.NewClient(ctx,
    fmt.Sprintf("projects/%s/instances/%s/databases/%s", project, instance, database))
defer client.Close()

// client.Apply(...)

Read data

Reads in Spanner can be performed through both a key-value style API and a query API. As a Cassandra user, you might find the query API more familiar. A key difference in the query API is that Spanner requires named arguments (unlike positional arguments ? in Cassandra). The name of an argument in a Spanner query must be prefixed by an @.

Cassandra example

Go

stmt := `SELECT
           user_id, first_name, last_name
         FROM
           users
         WHERE
           user_id = ?`

var (
  userID    int
  firstName string
  lastName  string
)

err := session.Query(stmt, 1).Scan(&userID, &firstName, &lastName)

Spanner example

Go

stmt := spanner.Statement{
  SQL: `SELECT
          user_id, first_name, last_name
        FROM
          users
        WHERE
          user_id = @user_id`,
  Params: map[string]any{"user_id": 1},
}

var (
  userID    int64
  firstName string
  lastName  string
)

err := client.Single().Query(ctx, stmt).Do(func(row *spanner.Row) error {
  return row.Columns(&userID, &firstName, &lastName)
})

Insert data

A Cassandra INSERT is equivalent to a Spanner INSERT OR UPDATE. You must specify the full primary key for an insert. Spanner supports both DML and a key-value style mutation API. The key-value style mutation API is recommended for trivial writes due to lower latency. The Spanner DML API has more features as it supports the full SQL surface (including the use of expressions in the DML statement).

Cassandra example

Go

stmt := `INSERT INTO
           users (user_id, first_name, last_name)
         VALUES
           (?, ?, ?)`
err := session.Query(stmt, 1, "John", "Doe").Exec()

Spanner example

Go

_, err := client.Apply(ctx, []*spanner.Mutation{
  spanner.InsertOrUpdateMap(
    "users", map[string]any{
      "user_id":    1,
      "first_name": "John",
      "last_name":  "Doe",
    }
  )})

Batch insert data

In Cassandra, you can insert multiple rows using a batch statement. In Spanner, a commit operation can contain multiple mutations. Spanner inserts these mutations to the database atomically.

Cassandra example

Go

stmt := `INSERT INTO
           users (user_id, first_name, last_name)
         VALUES
           (?, ?, ?)`
b := session.NewBatch(gocql.UnloggedBatch)
b.Entries = []gocql.BatchEntry{
  {Stmt: stmt, Args: []any{1, "John", "Doe"}},
  {Stmt: stmt, Args: []any{2, "Mary", "Poppins"}},
}
err = session.ExecuteBatch(b)

Spanner example

Go

_, err := client.Apply(ctx, []*spanner.Mutation{
  spanner.InsertOrUpdateMap(
    "users", map[string]any{
       "user_id":    1,
       "first_name": "John",
       "last_name":  "Doe"
    },
  ),
  spanner.InsertOrUpdateMap(
    "users", map[string]any{
       "user_id":    2,
       "first_name": "Mary",
       "last_name":  "Poppins",
    },
  ),
})

Delete data

Cassandra deletes require specifying the primary key of the rows to be deleted. This is similar to the DELETE mutation in Spanner.

Cassandra example

Go

stmt := `DELETE FROM
           users
         WHERE
           user_id = ?`
err := session.Query(stmt, 1).Exec()

Spanner example

Go

_, err := client.Apply(ctx, []*spanner.Mutation{
  spanner.Delete("users", spanner.Key{1}),
})

Advanced topics

This section contains information on how to use more advanced Cassandra features in Spanner.

Write timestamp

Cassandra allows mutations to explicitly specify a write timestamp for a particular cell using the USING TIMESTAMP clause. Typically, this feature is used to manipulate Cassandra's last-writer-wins semantics.

Spanner doesn't allow clients to specify the timestamp of each write. Each cell is internally marked with the TrueTime timestamp at the time when the cell value was committed. Because Spanner provides a strongly consistent and strictly serializable interface, most applications don't need the functionality of USING TIMESTAMP.

If you rely on Cassandra's USING TIMESTAMP for application specific logic, you can add an extra TIMESTAMP column to your Spanner schema, which can track modification time at the application level. Updates to a row can then be wrapped in a read-write transaction. For example:

Cassandra example

Go

stmt := `INSERT INTO
           users (user_id, first_name, last_name)
         VALUES
           (?, ?, ?)
         USING TIMESTAMP
           ?`
err := session.Query(stmt, 1, "John", "Doe", ts).Exec()

Spanner example

  1. Create schema with an explicit update timestamp column.

    GoogleSQL

    CREATE TABLE users (
      user_id    INT64,
      first_name STRING(MAX),
      last_name  STRING(MAX),
      update_ts  TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true),
    ) PRIMARY KEY (user_id)
    
  2. Customize logic to update the row and include a timestamp.

    Go

    func ShouldUpdateRow(ctx context.Context, txn *spanner.ReadWriteTransaction, updateTs time.Time) (bool, error) {
      // Read the existing commit timestamp.
      row, err := txn.ReadRow(ctx, "users", spanner.Key{1}, []string{"update_ts"})
    
      // Treat non-existent row as NULL timestamp - the row should be updated.
      if spanner.ErrCode(err) == codes.NotFound {
        return true, nil
      }
    
      // Propagate unexpected errors.
      if err != nil {
        return false, err
      }
    
      // Check if the committed timestamp is newer than the update timestamp.
      var committedTs *time.Time
      err = row.Columns(&committedTs)
      if err != nil {
        return false, err
      }
      if committedTs != nil && committedTs.Before(updateTs) {
        return false, nil
      }
    
      // Committed timestamp is older than update timestamp - the row should be updated.
      return true, nil
    }
    
  3. Check custom condition before updating the row.

    Go

    _, err := client.ReadWriteTransaction(ctx, func(ctx context.Context, txn *spanner.ReadWriteTransaction) error {
      // Check if the row should be updated.
      ok, err := ShouldUpdateRow(ctx, txn, time.Now())
      if err != nil {
        return err
      }
      if !ok {
        return nil
      }
    
      // Update the row.
      txn.BufferWrite([]*spanner.Mutation{
        spanner.InsertOrUpdateMap("users", map[string]any{
          "user_id":    1,
          "first_name": "John",
          "last_name":  "Doe",
          "update_ts":  spanner.CommitTimestamp,
        })})
    
      return nil
    })
    

Conditional mutations

The INSERT ... IF EXISTS statement in Cassandra is equivalent to the INSERT statement in Spanner. In both cases, the insert fails if the row already exists.

In Cassandra, you can also create DML statements that specify a condition, and the statement fails if the condition evaluates to false. In Spanner, you can use conditional UPDATE mutations in read-write transactions. For example, to update a row only if a particular condition exists:

Cassandra example

Go

stmt := `UPDATE
           users
         SET
           last_name = ?
         WHERE
           user_id = ?
         IF
           first_name = ?`
err := session.Query(stmt, 1, "Smith", "John").Exec()

Spanner example

  1. Customize logic to update the row and include a condition.

    Go

    func ShouldUpdateRow(ctx context.Context, txn *spanner.ReadWriteTransaction) (bool, error) {
      row, err := txn.ReadRow(ctx, "users", spanner.Key{1}, []string{"first_name"})
      if err != nil {
        return false, err
      }
    
      var firstName *string
      err = row.Columns(&firstName)
      if err != nil {
        return false, err
      }
      if firstName != nil && firstName == "John" {
        return false, nil
      }
      return true, nil
    }
    
    
  2. Check custom condition before updating the row.

    Go

    _, err := client.ReadWriteTransaction(ctx, func(ctx context.Context, txn *spanner.ReadWriteTransaction) error {
      ok, err := ShouldUpdateRow(ctx, txn, time.Now())
      if err != nil {
        return err
      }
      if !ok {
        return nil
      }
    
      txn.BufferWrite([]*spanner.Mutation{
        spanner.InsertOrUpdateMap("users", map[string]any{
          "user_id":    1,
          "last_name":  "Smith",
          "update_ts":  spanner.CommitTimestamp,
        })})
    
      return nil
    })
    

TTL

Cassandra supports setting a time to live (TTL) value at the row or column level. In Spanner, TTL is configured at the row level, and you designate a named column as the expiration time for the row. For more information, see the Time to live (TTL) overview.

Cassandra example

Go

stmt := `INSERT INTO
           users (user_id, first_name, last_name)
         VALUES
           (?, ?, ?)
         USING TTL 86400
           ?`
err := session.Query(stmt, 1, "John", "Doe", ts).Exec()

Spanner example

  1. Create schema with an explicit update timestamp column

    GoogleSQL

    CREATE TABLE users (
      user_id    INT64,
      first_name STRING(MAX),
      last_name  STRING(MAX),
      update_ts  TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true),
    ) PRIMARY KEY (user_id),
      ROW DELETION POLICY (OLDER_THAN(update_ts, INTERVAL 1 DAY));
    
  2. Insert rows with a commit timestamp.

    Go

    _, err := client.Apply(ctx, []*spanner.Mutation{
      spanner.InsertOrUpdateMap("users", map[string]any{
                  "user_id":    1,
                  "first_name": "John",
                  "last_name":  "Doe",
                  "update_ts":  spanner.CommitTimestamp}),
    })
    

Store large maps as interleaved tables.

Cassandra supports the map type for storing ordered, key-value pairs. To store map types that contain a small amount of data in Spanner, you can use the JSON or PROTO types, which let you store semi-structured and structured data respectively. Updates to such columns require the entire column value to be re-written. If you have a use case where a large amount of data is stored in a Cassandra map, and only a small portion of the map needs to be updated, using INTERLEAVED tables might be a good fit. For example, to associate a large amount of key-value data with a particular user:

Cassandra example

CREATE TABLE users (
  user_id     bigint,
  attachments map<string, string>,
  PRIMARY KEY (user_id)
)

Spanner example

CREATE TABLE users (
  user_id  INT64,
) PRIMARY KEY (user_id);

CREATE TABLE user_attachments (
  user_id        INT64,
  attachment_key STRING(MAX),
  attachment_val STRING(MAX),
) PRIMARY KEY (user_id, attachment_key);

In this case, a user attachments row is stored colocated with the corresponding user row, and can be retrieved and updated efficiently along with the user row. You can use the read-write APIs in Spanner to interact with interleaved tables. For more information on interleaving, see Create parent and child tables.

Developer experience

This section compares Spanner and Cassandra developer tools.

Local development

You can run Cassandra locally for development and unit testing. Spanner provides a similar environment for local development through the Spanner emulator. The emulator provides a high fidelity environment for interactive development and unit tests. For more information, see Emulate Spanner locally.

Command line

The Spanner equivalent to Cassandra's nodetool is the Google Cloud CLI. You can perform control plane and data plane operations using gcloud spanner. For more information, see the Google Cloud CLI Spanner reference guide.

If you need a REPL interface to issue queries to Spanner similar to cqlsh, you can use the spanner-cli tool. To install and run spanner-cli in Go:

go install github.com/cloudspannerecosystem/spanner-cli@latest

$(go env GOPATH)/bin/spanner-cli

For more information, see the spanner-cli GitHub repository.