Objectives
This tutorial walks you through the following steps using the Spanner client library for Python:
- Create a Spanner instance and database.
- Write, read, and execute SQL queries on data in the database.
- Update the database schema.
- Update data using a read-write transaction.
- Add a secondary index to the database.
- Use the index to read and execute SQL queries on data.
- Retrieve data using a read-only transaction.
Costs
This tutorial uses Spanner, which is a billable component of the Google Cloud. For information on the cost of using Spanner, see Pricing.
Before you begin
Complete the steps described in Set up, which cover creating and setting a default Google Cloud project, enabling billing, enabling the Cloud Spanner API, and setting up OAuth 2.0 to get authentication credentials to use the Cloud Spanner API.
In particular, make sure that you run gcloud auth
application-default login
to set up your local development environment with authentication
credentials.
Prepare your local Python environment
Follow the instructions in Setting Up a Python Development Environment.
Clone the sample app repository to your local machine:
git clone https://github.com/googleapis/python-spanner
Alternatively, you can download the sample as a zip file and extract it.
Change to the directory that contains the Spanner sample code:
cd python-spanner/samples/samples
Create an isolated Python environment, and install dependencies:
virtualenv env source env/bin/activate pip install -r requirements.txt
Create an instance
When you first use Spanner, you must create an instance, which is an allocation of resources that are used by Spanner databases. When you create an instance, you choose an instance configuration, which determines where your data is stored, and also the number of nodes to use, which determines the amount of serving and storage resources in your instance.
Execute the following command to create a Spanner instance in the region
us-central1
with 1 node:
gcloud spanner instances create test-instance --config=regional-us-central1 \
--description="Test Instance" --nodes=1
Note that this creates an instance with the following characteristics:
- Instance ID
test-instance
- Display name
Test Instance
- Instance configuration
regional-us-central1
(Regional configurations store data in one region, while multi-region configurations distribute data across multiple regions. For more information, see About instances.) - Node count of 1 (
node_count
corresponds to the amount of serving and storage resources available to databases in the instance. Learn more in Nodes and processing units.)
You should see:
Creating instance...done.
Look through sample files
The samples repository contains a sample that shows how to use Spanner with Python.
Take a look through thesnippets.py
file, which shows how to use
Spanner. The code shows how to create and use a new database. The data
uses the example schema shown in the
Schema and data model page.
Create a database
GoogleSQL
python snippets.py test-instance --database-id example-db create_database
PostgreSQL
python pg_snippets.py test-instance --database-id example-db create_database
You should see:
Created database example-db on instance test-instance
The following code creates a database and two tables in the database.
GoogleSQL
PostgreSQL
The next step is to write data to your database.
Create a database client
Before you can do reads or writes, you must create aClient
. You
can think of a Client
as a database connection: all of your interactions with
Spanner must go through a Client
. Typically you create a Client
when
your application starts up, then you re-use that Client
to read, write, and
execute transactions. The following code shows how to create a client.
Read more in the Client
reference.
Write data with DML
You can insert data using Data Manipulation Language (DML) in a read-write transaction.
You use the execute_update()
method to execute a DML statement.
Run the sample using the insert_with_dml
argument.
python snippets.py test-instance --database-id example-db insert_with_dml
You should see:
4 record(s) inserted.
Write data with mutations
You can also insert data using mutations.
You write data using a
Batch
object. A Batch
object is a container for mutation operations. A mutation
represents a sequence of inserts, updates, and deletes that Spanner
applies atomically to different rows and tables in a Spanner database.
The
insert()
method in the Batch
class adds one or more insert mutations to the
batch. All mutations in a single batch are applied atomically.
This code shows how to write the data using mutations:
Run the sample using the insert_data
argument.
python snippets.py test-instance --database-id example-db insert_data
You should see:
Inserted data.
Query data using SQL
Spanner supports a SQL interface for reading data, which you can access on the command line using the Google Cloud CLI or programmatically using the Spanner client library for Python.
On the command line
Execute the following SQL statement to read the values of all columns from the
Albums
table:
gcloud spanner databases execute-sql example-db --instance=test-instance \
--sql='SELECT SingerId, AlbumId, AlbumTitle FROM Albums'
The result should be:
SingerId AlbumId AlbumTitle
1 1 Total Junk
1 2 Go, Go, Go
2 1 Green
2 2 Forever Hold Your Peace
2 3 Terrified
Use the Spanner client library for Python
In addition to executing a SQL statement on the command line, you can issue the same SQL statement programmatically using the Spanner client library for Python.
Use the
execute_sql()
method of a
Snapshot
object to
run the SQL query. To get a Snapshot
object, call the
snapshot()
method of the Database
class in a with
statement.
Here's how to issue the query and access the data:
Run the sample using the query_data
argument.
python snippets.py test-instance --database-id example-db query_data
You should see the following result:
SingerId: 2, AlbumId: 2, AlbumTitle: Forever Hold Your Peace
SingerId: 1, AlbumId: 2, AlbumTitle: Go, Go, Go
SingerId: 2, AlbumId: 1, AlbumTitle: Green
SingerId: 2, AlbumId: 3, AlbumTitle: Terrified
SingerId: 1, AlbumId: 1, AlbumTitle: Total Junk
Query using a SQL parameter
If your application has a frequently executed query, you can improve its performance by parameterizing it. The resulting parametric query can be cached and reused, which reduces compilation costs. For more information, see Use query parameters to speed up frequently executed queries.
Here is an example of using a parameter in the WHERE
clause to
query records containing a specific value for LastName
.
Run the sample using the query_data_with_parameter argument.
python snippets.py test-instance --database-id example-db query_data_with_parameter
You should see the following result:
SingerId: 12, FirstName: Melissa, LastName: Garcia
Read data using the read API
In addition to Spanner's SQL interface, Spanner also supports a read interface.
Use the read()
method of a Snapshot
object to read rows from the database.
To get a Snapshot
object, call the
snapshot()
method of the Database
class in a with
statement.
Use a KeySet
object to define a collection of keys and key ranges to read.
Here's how to read the data:
Run the sample using the read_data
argument.
python snippets.py test-instance --database-id example-db read_data
You should see output similar to:
SingerId: 1, AlbumId: 1, AlbumTitle: Total Junk
SingerId: 1, AlbumId: 2, AlbumTitle: Go, Go, Go
SingerId: 2, AlbumId: 1, AlbumTitle: Green
SingerId: 2, AlbumId: 2, AlbumTitle: Forever Hold Your Peace
SingerId: 2, AlbumId: 3, AlbumTitle: Terrified
Update the database schema
Assume you need to add a new column called MarketingBudget
to the Albums
table. Adding a new column to an existing table requires an update to your
database schema. Spanner supports schema updates to a database while the
database continues to serve traffic. Schema updates don't require taking the
database offline and they don't lock entire tables or columns; you can continue
writing data to the database during the schema update. Read more about supported
schema updates and schema change performance in
Make schema updates.
Add a column
You can add a column on the command line using the Google Cloud CLI or programmatically using the Spanner client library for Python.
On the command line
Use the following ALTER TABLE
command to
add the new column to the table:
GoogleSQL
gcloud spanner databases ddl update example-db --instance=test-instance \
--ddl='ALTER TABLE Albums ADD COLUMN MarketingBudget INT64'
PostgreSQL
gcloud spanner databases ddl update example-db --instance=test-instance \
--ddl='ALTER TABLE Albums ADD COLUMN MarketingBudget BIGINT'
You should see:
Schema updating...done.
Use the Spanner client library for Python
Use theupdate_ddl()
method of the Database
class to modify the schema:
Run the sample using the add_column
argument.
python snippets.py test-instance --database-id example-db add_column
You should see:
Added the MarketingBudget column.
Write data to the new column
The following code writes data to the new column. It sets MarketingBudget
to
100000
for the row keyed by Albums(1, 1)
and to 500000
for the row keyed
by Albums(2, 2)
.
Run the sample using the update_data
argument.
python snippets.py test-instance --database-id example-db update_data
You can also execute a SQL query or a read call to fetch the values that you just wrote.
Here's the code to execute the query:
To execute this query, run the sample using the query_data_with_new_column
argument.
python snippets.py test-instance --database-id example-db query_data_with_new_column
You should see:
SingerId: 2, AlbumId: 2, MarketingBudget: 500000
SingerId: 1, AlbumId: 2, MarketingBudget: None
SingerId: 2, AlbumId: 1, MarketingBudget: None
SingerId: 2, AlbumId: 3, MarketingBudget: None
SingerId: 1, AlbumId: 1, MarketingBudget: 100000
Update data
You can update data using DML in a read-write transaction.
You use the execute_update()
method to execute a DML statement.
Run the sample using the write_with_dml_transaction
argument.
python snippets.py test-instance --database-id example-db write_with_dml_transaction
You should see:
Transferred 200000 from Album2's budget to Album1's
Use a secondary index
Suppose you wanted to fetch all rows of Albums
that have AlbumTitle
values
in a certain range. You could read all values from the AlbumTitle
column using
a SQL statement or a read call, and then discard the rows that don't meet the
criteria, but doing this full table scan is expensive, especially for tables
with a lot of rows. Instead you can speed up the retrieval of rows when
searching by non-primary key columns by creating a
secondary index on the table.
Adding a secondary index to an existing table requires a schema update. Like other schema updates, Spanner supports adding an index while the database continues to serve traffic. Spanner automatically backfills the index with your existing data. Backfills might take a few minutes to complete, but you don't need to take the database offline or avoid writing to the indexed table during this process. For more details, see Add a secondary index.
After you add a secondary index, Spanner automatically uses it for SQL queries that are likely to run faster with the index. If you use the read interface, you must specify the index that you want to use.
Add a secondary index
You can add an index on the command line using the gcloud CLI or programmatically using the Spanner client library for Python.
On the command line
Use the following CREATE INDEX
command
to add an index to the database:
gcloud spanner databases ddl update example-db --instance=test-instance \
--ddl='CREATE INDEX AlbumsByAlbumTitle ON Albums(AlbumTitle)'
You should see:
Schema updating...done.
Using the Spanner client library for Python
Use theupdate_ddl()
method of the Database
class to add an index:
Run the sample using the add_index
argument.
python snippets.py test-instance --database-id example-db add_index
Adding an index can take a few minutes. After the index is added, you should see:
Added the AlbumsByAlbumTitle index.
Read using the index
For SQL queries, Spanner automatically uses an appropriate index. In the read interface, you must specify the index in your request.
To use the index in the read interface, provide an Index
argument to the
read()
method of a Snapshot
object. To get a Snapshot
object, call the
snapshot()
method of the
Database
class in a with
statement.
Run the sample using the read_data_with_index
argument.
python snippets.py test-instance --database-id example-db read_data_with_index
You should see:
AlbumId: 2, AlbumTitle: Forever Hold Your Peace
AlbumId: 2, AlbumTitle: Go, Go, Go
AlbumId: 1, AlbumTitle: Green
AlbumId: 3, AlbumTitle: Terrified
AlbumId: 1, AlbumTitle: Total Junk
Add an index for index-only reads
You might have noticed that the previous read example doesn't include reading
the MarketingBudget
column. This is because Spanner's read interface
doesn't support the ability to join an index with a data table to look up values
that are not stored in the index.
Create an alternate definition of AlbumsByAlbumTitle
that stores a copy of
MarketingBudget
in the index.
On the command line
GoogleSQL
gcloud spanner databases ddl update example-db --instance=test-instance \
--ddl='CREATE INDEX AlbumsByAlbumTitle2 ON Albums(AlbumTitle) STORING (MarketingBudget)
PostgreSQL
gcloud spanner databases ddl update example-db --instance=test-instance \
--ddl='CREATE INDEX AlbumsByAlbumTitle2 ON Albums(AlbumTitle) INCLUDE (MarketingBudget)
Adding an index can take a few minutes. After the index is added, you should see:
Schema updating...done.
Using the Spanner client library for Python
Use theupdate_ddl()
method of the Database
class to add an index with a STORING
clause:
Run the sample using the add_storing_index
argument.
python snippets.py test-instance --database-id example-db add_storing_index
You should see:
Added the AlbumsByAlbumTitle2 index.
Now you can execute a read that fetches all AlbumId
, AlbumTitle
, and
MarketingBudget
columns from the AlbumsByAlbumTitle2
index:
Run the sample using the read_data_with_storing_index
argument.
python snippets.py test-instance --database-id example-db read_data_with_storing_index
You should see output similar to:
AlbumId: 2, AlbumTitle: Forever Hold Your Peace, MarketingBudget: 300000
AlbumId: 2, AlbumTitle: Go, Go, Go, MarketingBudget: None
AlbumId: 1, AlbumTitle: Green, MarketingBudget: None
AlbumId: 3, AlbumTitle: Terrified, MarketingBudget: None
AlbumId: 1, AlbumTitle: Total Junk, MarketingBudget: 300000
Retrieve data using read-only transactions
Suppose you want to execute more than one read at the same timestamp. Read-only
transactions observe a consistent
prefix of the transaction commit history, so your application always gets
consistent data.
Use a Snapshot
object for executing read-only transactions. To get a Snapshot
object, call the
snapshot()
method of the Database
class in a with
statement.
The following shows how to run a query and perform a read in the same read-only transaction:
Run the sample using the read_only_transaction
argument.
python snippets.py test-instance --database-id example-db read_only_transaction
You should see output similar to:
Results from first read:
SingerId: 2, AlbumId: 2, AlbumTitle: Forever Hold Your Peace
SingerId: 1, AlbumId: 2, AlbumTitle: Go, Go, Go
SingerId: 2, AlbumId: 1, AlbumTitle: Green
SingerId: 2, AlbumId: 3, AlbumTitle: Terrified
SingerId: 1, AlbumId: 1, AlbumTitle: Total Junk
Results from second read:
SingerId: 1, AlbumId: 1, AlbumTitle: Total Junk
SingerId: 1, AlbumId: 2, AlbumTitle: Go, Go, Go
SingerId: 2, AlbumId: 1, AlbumTitle: Green
SingerId: 2, AlbumId: 2, AlbumTitle: Forever Hold Your Peace
SingerId: 2, AlbumId: 3, AlbumTitle: Terrified
Cleanup
To avoid incurring additional charges to your Cloud Billing account for the resources used in this tutorial, drop the database and delete the instance that you created.
Delete the database
If you delete an instance, all databases within it are automatically deleted. This step shows how to delete a database without deleting an instance (you would still incur charges for the instance).
On the command line
gcloud spanner databases delete example-db --instance=test-instance
Using the Google Cloud console
Go to the Spanner Instances page in the Google Cloud console.
Click the instance.
Click the database that you want to delete.
In the Database details page, click Delete.
Confirm that you want to delete the database and click Delete.
Delete the instance
Deleting an instance automatically drops all databases created in that instance.
On the command line
gcloud spanner instances delete test-instance
Using the Google Cloud console
Go to the Spanner Instances page in the Google Cloud console.
Click your instance.
Click Delete.
Confirm that you want to delete the instance and click Delete.
What's next
Learn how to access Spanner with a virtual machine instance.
Learn about authorization and authentication credentials in Authenticate to Cloud services using client libraries.
Learn more about Spanner Schema design best practices.