Developers & Practitioners

Google Cloud Spanner Dialect for SQLAlchemy

We're very excited to announce the general availability of the Google Cloud Spanner dialect that enables SQLAlchemy applications to take advantage of Cloud Spanner's scale, strong consistency, and up to 99.999% availability. SQLAlchemy is a Python SQL toolkit and Object Relational Mapper, giving application developers the full power and flexibility of SQL. Its object-relational mapper (ORM) allows the object model and database schema to be developed in a cleanly decoupled way, while also being transparent to the user about the underlying SQL and object relational details. 

Here, we'll show how you can get started with using the dialect and also highlight the supported Spanner-specific features.

Set up the Dialect

To set up the Cloud Spanner dialect for SQLAlchemy in your application, install the package that is available through PyPI.

  pip3 install sqlalchemy-spanner

You may also install the dialect from source.

  git clone https://github.com/googleapis/python-spanner-sqlalchemy.git
cd python-spanner-sqlalchemy
python setup.py install

Set up Cloud Spanner

Before you begin using Cloud Spanner:

  1. Follow the Set Up guide to configure a Cloud Project, authentication and authorization.

  2. Then, create a Cloud Spanner instance and database following the Quickstart using the Cloud Console.

Quickstart application

Create an Engine

First, we create an Engine that can then be used to make a connection to a Cloud Spanner database. We provide the Database URL as an input which has the format of spanner:///projects/<project-id>/instances/<instance-id>/databases/<database-id>.

  from sqlalchemy import create_engine

engine = create_engine(
    "spanner:///projects/project-id/instances/instance-id/databases/database-id"
)

Create tables

We then create a MetaData object bound to the engine that can hold a collection of Table objects and their schemas. We declare three tables representing "Singers", "Albums" and "Tracks", then call MetaData.create_all() to create the tables.

  from sqlalchemy import (
  Column,
  Computed,
  ForeignKey,
  Integer,
  MetaData,
  String,
  Table,
)

metadata = MetaData(bind=engine)

singers = Table(
    "Singers",
    metadata,
    Column("SingerId", String(36), primary_key=True, nullable=False),
    Column("FirstName", String(200)),
    Column("LastName", String(200), nullable=False),
    Column("FullName", String(400), Computed("COALESCE(FirstName || ' ', '') || LastName")),
)

albums = Table(
    "Albums",
    metadata,
    Column("AlbumId", String(36), primary_key=True, nullable=False),
    Column("Title", String(100), nullable=False),
    Column("SingerId", String(36), ForeignKey("Singers.SingerId", name="FK_Albums_Singers"), nullable=False),
)

tracks = Table(
    "Tracks",
    metadata,
    Column("AlbumId", String(36), primary_key=True, nullable=False),
    Column("TrackId", Integer, primary_key=True, nullable=False),
    Column("Title", String(200), nullable=False),
    spanner_interleave_in="Albums",
    spanner_interleave_on_delete_cascade=True,
)
tracks.add_is_dependent_on(albums)

metadata.create_all(engine)

Notice that we use two different approaches to defining relationships between tables:

  1. Albums reference Singers using a foreign key constraint, by including the "SingerId" in the "Albums" table. This ensures that each Album references an existing Singer record, and that a Singer cannot be deleted without also deleting all Albums of that Singer.

  2. Tracks reference Albums by being interleaved in the parent "Albums" table, configured with the "spanner_interleave_in" keyword argument. This ensures that all Track records are stored physically together with the parent Album, which makes accessing them together more efficient. Setting "spanner_interleave_on_delete_cascade" ensures that when an Album is deleted, the interleaved Tracks are also deleted. We also call Table.add_is_dependent_on() to ensure that the Albums table is created before the Tracks table.

Insert data

Data can be inserted into the created tables by calling the Table.insert() method through Connection.execute(). We use the uuid module to generate primary key fields to avoid creating monotonically increasing keys that may result in hotspots.

  import uuid

from sqlalchemy import MetaData, Table

singers = Table("Singers", MetaData(bind=engine), autoload=True)
albums = Table("Albums", MetaData(bind=engine), autoload=True)
tracks = Table("Tracks", MetaData(bind=engine), autoload=True)

with engine.begin() as connection:
  singer_id = uuid.uuid4().hex[:6].lower()
  connection.execute(singers.insert(), {"SingerId": singer_id, "FirstName": "Bob", "LastName": "Allison"})
  album_id = uuid.uuid4().hex[:6].lower()
  connection.execute(albums.insert(), {"AlbumId": album_id, "Title": "Let's Go", "SingerId": singer_id})
  connection.execute(tracks.insert(), {"AlbumId": album_id, "TrackId": 1, "Title": "Go, Go, Go"})

Query data

We can then query the inserted data by running a select() statement through Connection.execute().

  from sqlalchemy import MetaData, Table, select

singers = Table("Singers", MetaData(bind=engine), autoload=True)

with engine.begin() as connection:
  for row in connection.execute(select([singers]).where(singers.c.FullName == "Bob Allison")):
    print(row)

Migrate an existing database

The Cloud Spanner dialect for SQLAlchemy supports migrations through Alembic.

Note that a migration script can produce a lot of DDL statements, and if each statement is executed separately, the migration will be slow. Therefore, it is highly recommended that you use the Alembic batch context feature to pack DDL statements into groups of statements.

Best practices

For optimal performance, use explicit connections, and reuse the connection across multiple queries:

  with engine.begin() as connection:
    # execute() is called on a Connection() object
    connection.execute(user.insert(), {"user_id": 1, "user_name": "Full Name"})

It's also possible to use an implicit connection but this is not recommended as the dialect will then need to establish a new connection to the database for every Connection.execute() call. Therefore, avoid calls like the following:

  # execute() is called on object, which is not a Connection() object
insert(user).values(user_id=1, user_name="Full Name").execute()

Features

Transaction support

By default, a connection executes all transactions in ReadWrite mode. But if the "read_only" execution option is provided when creating a connection, as in the following example, then transactions can be executed in ReadOnly mode.

  with engine.connect().execution_options(read_only=True) as connection:
    connection.execute(select(["*"], from_obj=table)).fetchall()

Isolation levels

The dialect supports "SERIALIZABLE" and "AUTOCOMMIT" isolation levels. "SERIALIZABLE" is the default isolation level, and transactions need to be committed explicitly. If the "AUTOCOMMIT" isolation level is selected, then each statement will be committed automatically.

The isolation level can be set as follows:

  engine = create_engine("spanner:///projects/project-id/instances/instance-id/databases/database-id")
autocommit_engine = engine.execution_options(isolation_level="AUTOCOMMIT")

Cloud Spanner features

Besides interleaved tables and generated columns mentioned above, the provider also supports the following Cloud Spanner features.

Query Hints

Cloud Spanner supports various statement hints and table hints, which can be configured in the dialect through Query.with_hint(). This example shows how to set a table hint.

Stale Reads

Cloud Spanner provides two read types. By default all read-only transactions will default to performing strong reads. You can opt into performing a stale read when querying data by using an explicit timestamp bound as shown in this example. Note that the connection must be a "read_only" connection to perform a stale read.

Differences and limitations

Unique Constraints

Cloud Spanner doesn't support direct UNIQUE constraints creation. In order to achieve column values uniqueness UNIQUE indexes should be used, as shown in this example.

DDL in Transactions

Cloud Spanner doesn't allow DDL statements to be executed in a transaction. Therefore, DDL statements will not be rolled back on transaction rollback.

Spanner Mutations

The dialect and the underlying DB API driver doesn't support mutations. Therefore, only DML statements can be used for executing updates.

Other Limitations

  • "WITH RECURSIVE" statements and named schemas are not supported since Spanner doesn't support these features.

  • "CREATE TEMPORARY TABLE" is not supported since Spanner doesn't support this feature.

  • The scale and precision for numeric types are fixed as Spanner doesn't support arbitrary scale or precision in numeric types.

Getting involved

The Cloud Spanner dialect for SQLAlchemy is an open-source project on GitHub and we welcome contributions in the form of feedback or pull requests.

We would like to thank Skylar Pottinger, Ilya Gurov, Hemang Chothani, Alex Vaksman and Knut Olav Løite for their work on this integration.

See Also