Reads

This page describes how to do reads in Cloud Spanner outside the context of read-only and read-write transactions. That means if either of the following applies, you should read the Transactions page instead:

  • If you need to write, depending on the value of one or more reads, you should execute the read as part of a read-write transaction. Read more about read-write transactions.

  • If you are making multiple read calls that require a consistent view of your data, you should execute the reads as part of a read-only transaction. Read more about read-only transactions.

If you need to make a single read call or read data in parallel and you don't also need to write, read on.

Read types

Cloud Spanner allows you to determine how current the data should be when you read data by offering two types of reads:

  • A strong read is a read at a current timestamp and is guaranteed to see all data that has been committed up until the start of this read. Cloud Spanner defaults to using strong reads to serve read requests.
  • A stale read is read at a timestamp in the past. If your application is latency sensitive but tolerant of stale data, then stale reads can provide performance benefits.

Choosing timestamp bounds

To choose which type of read you want, set a timestamp bound on the read request. Use the following best practices when choosing a timestamp bound:

  • Choose strong reads whenever possible. These are the default timestamp bound for Cloud Spanner reads, including read-only transactions. Strong reads are guaranteed to observe the effects of all transactions that committed before the start of the operation, independent of which replica receives the read. Because of this, strong reads make application code simpler and applications more trustworthy. Read more about Cloud Spanner's consistency properties in TrueTime and External Consistency.

  • If latency makes strong reads infeasible in some situations, then use stale reads (bounded-staleness or exact-staleness) to improve performance in places where you do not need reads to be as recent as possible. As described in Cloud Spanner Replication, 15 seconds is a reasonable staleness value to use for good performance.

Single read methods

Cloud Spanner supports single read methods (that is, a read outside the context of a transaction) on a database for:

  • Executing the read as a SQL query statement or using Cloud Spanner's read API.
  • Performing a strong read from a single row or multiple rows in a table.
  • Performing a stale read from a single row or multiple rows in a table.
  • Reading from a single row or multiple rows in a secondary index.

The sections below describe how to use read methods using the Cloud Client Libraries for the Cloud Spanner API.

Execute a query

The following shows how to execute a SQL query statement against a database.

C#

Use ExecuteReaderAsync() to query the database.

string connectionString =
$"Data Source=projects/{projectId}/instances/"
+ $"{instanceId}/databases/{databaseId}";
// Create connection to Cloud Spanner.
using (var connection = new SpannerConnection(connectionString))
{
    var cmd = connection.CreateSelectCommand(
        "SELECT SingerId, AlbumId, AlbumTitle FROM Albums");
    using (var reader = await cmd.ExecuteReaderAsync())
    {
        while (await reader.ReadAsync())
        {
            Console.WriteLine("SingerId : "
                + reader.GetFieldValue<string>("SingerId")
                + " AlbumId : "
                + reader.GetFieldValue<string>("AlbumId")
                + " AlbumTitle : "
                + reader.GetFieldValue<string>("AlbumTitle"));
        }
    }
}

Go

Use Client.Single().Query to query the database.

func query(ctx context.Context, w io.Writer, client *spanner.Client) error {
	stmt := spanner.Statement{SQL: `SELECT SingerId, AlbumId, AlbumTitle FROM Albums`}
	iter := client.Single().Query(ctx, stmt)
	defer iter.Stop()
	for {
		row, err := iter.Next()
		if err == iterator.Done {
			return nil
		}
		if err != nil {
			return err
		}
		var singerID, albumID int64
		var albumTitle string
		if err := row.Columns(&singerID, &albumID, &albumTitle); err != nil {
			return err
		}
		fmt.Fprintf(w, "%d %d %s\n", singerID, albumID, albumTitle)
	}
}

Java

Use ReadContext.executeQuery to query the database.

static void query(DatabaseClient dbClient) {
  // singleUse() can be used to execute a single read or query against Cloud Spanner.
  ResultSet resultSet =
      dbClient
          .singleUse()
          .executeQuery(Statement.of("SELECT SingerId, AlbumId, AlbumTitle FROM Albums"));
  while (resultSet.next()) {
    System.out.printf(
        "%d %d %s\n", resultSet.getLong(0), resultSet.getLong(1), resultSet.getString(2));
  }
}

Node.js

Use Database.run to query the database.

// Imports the Google Cloud client library
const Spanner = require('@google-cloud/spanner');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = 'my-project-id';
// const instanceId = 'my-instance';
// const databaseId = 'my-database';

// Creates a client
const spanner = new Spanner({
  projectId: projectId,
});

// Gets a reference to a Cloud Spanner instance and database
const instance = spanner.instance(instanceId);
const database = instance.database(databaseId);

const query = {
  sql: 'SELECT SingerId, AlbumId, AlbumTitle FROM Albums',
};

// Queries rows from the Albums table
database
  .run(query)
  .then(results => {
    const rows = results[0];

    rows.forEach(row => {
      const json = row.toJSON();
      console.log(
        `SingerId: ${json.SingerId}, AlbumId: ${json.AlbumId}, AlbumTitle: ${
          json.AlbumTitle
        }`
      );
    });
  })
  .catch(err => {
    console.error('ERROR:', err);
  })
  .then(() => {
    // Close the database when finished.
    return database.close();
  });

PHP

Use Database::execute to query the database.

use Google\Cloud\Spanner\SpannerClient;

/**
 * Queries sample data from the database using SQL.
 * Example:
 * ```
 * query_data($instanceId, $databaseId);
 * ```
 *
 * @param string $instanceId The Spanner instance ID.
 * @param string $databaseId The Spanner database ID.
 */
function query_data($instanceId, $databaseId)
{
    $spanner = new SpannerClient();
    $instance = $spanner->instance($instanceId);
    $database = $instance->database($databaseId);

    $results = $database->execute(
        'SELECT SingerId, AlbumId, AlbumTitle FROM Albums'
    );

    foreach ($results as $row) {
        printf('SingerId: %s, AlbumId: %s, AlbumTitle: %s' . PHP_EOL,
            $row['SingerId'], $row['AlbumId'], $row['AlbumTitle']);
    }
}

Python

Use Database.execute_sql to query the database.

def query_data(instance_id, database_id):
    """Queries sample data from the database using SQL."""
    spanner_client = spanner.Client()
    instance = spanner_client.instance(instance_id)
    database = instance.database(database_id)

    with database.snapshot() as snapshot:
        results = snapshot.execute_sql(
            'SELECT SingerId, AlbumId, AlbumTitle FROM Albums')

        for row in results:
            print(u'SingerId: {}, AlbumId: {}, AlbumTitle: {}'.format(*row))

Ruby

Use Client#execute to query the database.

# project_id  = "Your Google Cloud project ID"
# instance_id = "Your Spanner instance ID"
# database_id = "Your Spanner database ID"

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new project: project_id
client  = spanner.client instance_id, database_id

client.execute("SELECT SingerId, AlbumId, AlbumTitle FROM Albums").rows.each do |row|
  puts "#{row[:SingerId]} #{row[:AlbumId]} #{row[:AlbumTitle]}"
end

Consult the SQL Query Syntax and Functions and Operators references when constructing a SQL statement.

Perform a strong read

The following shows how to perform a strong read of zero or more rows from a database.

C#

The code to read data is the same as the previous sample for querying Cloud Spanner by executing a SQL query.

string connectionString =
$"Data Source=projects/{projectId}/instances/"
+ $"{instanceId}/databases/{databaseId}";
// Create connection to Cloud Spanner.
using (var connection = new SpannerConnection(connectionString))
{
    var cmd = connection.CreateSelectCommand(
        "SELECT SingerId, AlbumId, AlbumTitle FROM Albums");
    using (var reader = await cmd.ExecuteReaderAsync())
    {
        while (await reader.ReadAsync())
        {
            Console.WriteLine("SingerId : "
                + reader.GetFieldValue<string>("SingerId")
                + " AlbumId : "
                + reader.GetFieldValue<string>("AlbumId")
                + " AlbumTitle : "
                + reader.GetFieldValue<string>("AlbumTitle"));
        }
    }
}

Go

Use Client.Single().Read to read rows from the database.

func read(ctx context.Context, w io.Writer, client *spanner.Client) error {
	iter := client.Single().Read(ctx, "Albums", spanner.AllKeys(),
		[]string{"SingerId", "AlbumId", "AlbumTitle"})
	defer iter.Stop()
	for {
		row, err := iter.Next()
		if err == iterator.Done {
			return nil
		}
		if err != nil {
			return err
		}
		var singerID, albumID int64
		var albumTitle string
		if err := row.Columns(&singerID, &albumID, &albumTitle); err != nil {
			return err
		}
		fmt.Fprintf(w, "%d %d %s\n", singerID, albumID, albumTitle)
	}
}

The example uses AllKeys to define a collection of keys or key ranges to read.

Java

Use ReadContext.read to read rows from the database.

static void read(DatabaseClient dbClient) {
  ResultSet resultSet =
      dbClient
          .singleUse()
          .read("Albums",
              // KeySet.all() can be used to read all rows in a table. KeySet exposes other
              // methods to read only a subset of the table.
              KeySet.all(),
              Arrays.asList("SingerId", "AlbumId", "AlbumTitle"));
  while (resultSet.next()) {
    System.out.printf(
        "%d %d %s\n", resultSet.getLong(0), resultSet.getLong(1), resultSet.getString(2));
  }
}

The example uses KeySet to define a collection of keys or key ranges to read.

Node.js

Use Table.read to read rows from the database.

// Imports the Google Cloud client library
const Spanner = require('@google-cloud/spanner');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = 'my-project-id';
// const instanceId = 'my-instance';
// const databaseId = 'my-database';

// Creates a client
const spanner = new Spanner({
  projectId: projectId,
});

// Gets a reference to a Cloud Spanner instance and database
const instance = spanner.instance(instanceId);
const database = instance.database(databaseId);

// Reads rows from the Albums table
const albumsTable = database.table('Albums');

const query = {
  columns: ['SingerId', 'AlbumId', 'AlbumTitle'],
  keySet: {
    all: true,
  },
};

albumsTable
  .read(query)
  .then(results => {
    const rows = results[0];

    rows.forEach(row => {
      const json = row.toJSON();
      console.log(
        `SingerId: ${json.SingerId}, AlbumId: ${json.AlbumId}, AlbumTitle: ${
          json.AlbumTitle
        }`
      );
    });
  })
  .catch(err => {
    console.error('ERROR:', err);
  })
  .then(() => {
    // Close the database when finished.
    return database.close();
  });

The example uses keySet to define a collection of keys or key ranges to read.

PHP

Use Database::read to read rows from the database.

use Google\Cloud\Spanner\SpannerClient;

/**
 * Reads sample data from the database.
 * Example:
 * ```
 * read_data($instanceId, $databaseId);
 * ```
 *
 * @param string $instanceId The Spanner instance ID.
 * @param string $databaseId The Spanner database ID.
 */
function read_data($instanceId, $databaseId)
{
    $spanner = new SpannerClient();
    $instance = $spanner->instance($instanceId);
    $database = $instance->database($databaseId);

    $keySet = $spanner->keySet(['all' => true]);
    $results = $database->read(
        'Albums',
        $keySet,
        ['SingerId', 'AlbumId', 'AlbumTitle']
    );

    foreach ($results->rows() as $row) {
        printf('SingerId: %s, AlbumId: %s, AlbumTitle: %s' . PHP_EOL,
            $row['SingerId'], $row['AlbumId'], $row['AlbumTitle']);
    }
}

The example uses keySet to define a collection of keys or key ranges to read.

Python

Use Database.read to read rows from the database.

def read_data(instance_id, database_id):
    """Reads sample data from the database."""
    spanner_client = spanner.Client()
    instance = spanner_client.instance(instance_id)
    database = instance.database(database_id)

    with database.snapshot() as snapshot:
        keyset = spanner.KeySet(all_=True)
        results = snapshot.read(
            table='Albums',
            columns=('SingerId', 'AlbumId', 'AlbumTitle',),
            keyset=keyset,)

        for row in results:
            print(u'SingerId: {}, AlbumId: {}, AlbumTitle: {}'.format(*row))

The example uses KeySet to define a collection of keys or key ranges to read.

Ruby

Use Client#read to read rows from the database.

# project_id  = "Your Google Cloud project ID"
# instance_id = "Your Spanner instance ID"
# database_id = "Your Spanner database ID"

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new project: project_id
client  = spanner.client instance_id, database_id

client.read("Albums", [:SingerId, :AlbumId, :AlbumTitle]).rows.each do |row|
  puts "#{row[:SingerId]} #{row[:AlbumId]} #{row[:AlbumTitle]}"
end

Perform a stale read

The following sample code shows how to perform a stale read of zero or more rows from a database using an exact-staleness timestamp bound. For instructions on how to perform a stale read using a bounded-staleness timestamp bound, see the note after the sample code. See Timestamp Bounds for more information on the different types of timestamp bounds that are available.

C#

Use the BeginReadOnlyTransactionAsync method on a connection with a specified TimestampBound.OfExactStaleness() value to query the database.

string connectionString =
    $"Data Source=projects/{projectId}/instances/{instanceId}"
    + $"/databases/{databaseId}";

// Create connection to Cloud Spanner.
using (var connection = new SpannerConnection(connectionString))
{
    await connection.OpenAsync();

    // Open a new read only transaction.
    var staleness = TimestampBound.OfExactStaleness(
        TimeSpan.FromSeconds(15));
    using (var transaction =
        await connection.BeginReadOnlyTransactionAsync(staleness))
    {
        var cmd = connection.CreateSelectCommand(
            "SELECT SingerId, AlbumId, AlbumTitle FROM Albums");
        cmd.Transaction = transaction;

        using (var reader = await cmd.ExecuteReaderAsync())
        {
            while (await reader.ReadAsync())
            {
                Console.WriteLine("SingerId : "
                    + reader.GetFieldValue<string>("SingerId")
                    + " AlbumId : "
                    + reader.GetFieldValue<string>("AlbumId")
                    + " AlbumTitle : "
                    + reader.GetFieldValue<string>("AlbumTitle"));
            }
        }
    }
}

Go

Use Client.ReadOnlyTransaction().WithTimestampBound() and specify an ExactStaleness value to perform a read of rows from the database using an exact-staleness timestamp bound.

func readStaleData(ctx context.Context, w io.Writer, client *spanner.Client) error {
	ro := client.ReadOnlyTransaction().WithTimestampBound(spanner.ExactStaleness(15 * time.Second))
	defer ro.Close()

	iter := ro.Read(ctx, "Albums", spanner.AllKeys(), []string{"SingerId", "AlbumId", "AlbumTitle"})
	defer iter.Stop()
	for {
		row, err := iter.Next()
		if err == iterator.Done {
			return nil
		}
		if err != nil {
			return err
		}
		var singerID int64
		var albumID int64
		var albumTitle string
		if err := row.Columns(&singerID, &albumID, &albumTitle); err != nil {
			return err
		}
		fmt.Fprintf(w, "%d %d %s\n", singerID, albumID, albumTitle)
	}
}

The example uses AllKeys to define a collection of keys or key ranges to read.

Java

Use the read method of a ReadContext that has a specified TimestampBound.ofExactStaleness() to perform a read of rows from the database using an exact-staleness timestamp bound.

static void readStaleData(DatabaseClient dbClient) {
  ResultSet resultSet =
      dbClient
          .singleUse(TimestampBound.ofExactStaleness(15, TimeUnit.SECONDS))
          .read("Albums",
              KeySet.all(),
              Arrays.asList("SingerId", "AlbumId", "MarketingBudget"));
  while (resultSet.next()) {
    System.out.printf(
        "%d %d %s\n", resultSet.getLong(0), resultSet.getLong(1),
        resultSet.isNull(2) ? "NULL" : resultSet.getLong("MarketingBudget"));
  }
}

The example uses KeySet to define a collection of keys or key ranges to read.

Node.js

Use Table.read with the exactStaleness option to perform a read of rows from the database using an exact-staleness timestamp bound.

// Imports the Google Cloud client library
const Spanner = require('@google-cloud/spanner');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = 'my-project-id';
// const instanceId = 'my-instance';
// const databaseId = 'my-database';

// Creates a client
const spanner = new Spanner({
  projectId: projectId,
});

// Gets a reference to a Cloud Spanner instance and database
const instance = spanner.instance(instanceId);
const database = instance.database(databaseId);

// Reads rows from the Albums table
const albumsTable = database.table('Albums');

const query = {
  columns: ['SingerId', 'AlbumId', 'AlbumTitle', 'MarketingBudget'],
  keySet: {
    all: true,
  },
};

const options = {
  // Guarantees that all writes committed more than 15 seconds ago are visible
  exactStaleness: 15,
};

albumsTable
  .read(query, options)
  .then(results => {
    const rows = results[0];

    rows.forEach(row => {
      const json = row.toJSON();
      const id = json.SingerId;
      const album = json.AlbumId;
      const title = json.AlbumTitle;
      const budget = json.MarketingBudget ? json.MarketingBudget : '';
      console.log(
        `SingerId: ${id}, AlbumId: ${album}, AlbumTitle: ${title}, MarketingBudget: ${budget}`
      );
    });
  })
  .catch(err => {
    console.error('ERROR:', err);
  })
  .then(() => {
    // Close the database when finished.
    return database.close();
  });

The example uses keySet to define a collection of keys or key ranges to read.

PHP

Use Database::read with a exactStaleness value specified to perform a read of rows from the database using an exact-staleness timestamp bound.

use Google\Cloud\Spanner\Duration;
use Google\Cloud\Spanner\SpannerClient;

/**
 * Reads sample data from the database.  The data is exactly 10 seconds stale.
 * Example:
 * ```
 * read_stale_data
 *($instanceId, $databaseId);
 * ```
 *
 * @param string $instanceId The Spanner instance ID.
 * @param string $databaseId The Spanner database ID.
 */
function read_stale_data($instanceId, $databaseId)
{
    $spanner = new SpannerClient();
    $instance = $spanner->instance($instanceId);
    $database = $instance->database($databaseId);
    $keySet = $spanner->keySet(['all' => true]);
    $results = $database->read(
        'Albums',
        $keySet,
        ['SingerId', 'AlbumId', 'AlbumTitle'],
        ['exactStaleness' => new Duration(15)]
    );

    foreach ($results->rows() as $row) {
        printf('SingerId: %s, AlbumId: %s, AlbumTitle: %s' . PHP_EOL,
            $row['SingerId'], $row['AlbumId'], $row['AlbumTitle']);
    }
}

The example uses keySet to define a collection of keys or key ranges to read.

Python

Use the read method of a Database snapshot that has a specified exact_staleness value to perform a read of rows from the database using an exact-staleness timestamp bound.

def read_stale_data(instance_id, database_id):
    """Reads sample data from the database. The data is exactly 15 seconds
    stale."""
    import datetime

    spanner_client = spanner.Client()
    instance = spanner_client.instance(instance_id)
    database = instance.database(database_id)
    staleness = datetime.timedelta(seconds=15)

    with database.snapshot(exact_staleness=staleness) as snapshot:
        keyset = spanner.KeySet(all_=True)
        results = snapshot.read(
            table='Albums',
            columns=('SingerId', 'AlbumId', 'AlbumTitle',),
            keyset=keyset)

        for row in results:
            print(u'SingerId: {}, AlbumId: {}, AlbumTitle: {}'.format(*row))

The example uses KeySet to define a collection of keys or key ranges to read.

Ruby

Use the read method of a snapshot Client that has a specified staleness value (in seconds) to perform a read of rows from the database using an exact-staleness timestamp bound.

# project_id  = "Your Google Cloud project ID"
# instance_id = "Your Spanner instance ID"
# database_id = "Your Spanner database ID"
require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new project: project_id
client  = spanner.client instance_id, database_id

# Perform a read with a data staleness of 15 seconds
client.snapshot staleness: 15 do |snapshot|
  snapshot.read("Albums", [:SingerId, :AlbumId, :AlbumTitle]).rows.each do |row|
    puts "#{row[:SingerId]} #{row[:AlbumId]} #{row[:AlbumTitle]}"
  end
end

Perform a read using an index

The following shows how to read zero or more rows from a database using an index.

C#

Read data using the index by executing a query that explicitly specifies the index:

string connectionString =
$"Data Source=projects/{projectId}/instances/{instanceId}"
+ $"/databases/{databaseId}";
// Create connection to Cloud Spanner.
using (var connection = new SpannerConnection(connectionString))
{
    var cmd = connection.CreateSelectCommand(
        "SELECT AlbumId, AlbumTitle, MarketingBudget FROM Albums@ "
        + "{FORCE_INDEX=AlbumsByAlbumTitle} "
        + $"WHERE AlbumTitle >= @startTitle "
        + $"AND AlbumTitle < @endTitle",
        new SpannerParameterCollection {
            {"startTitle", SpannerDbType.String},
            {"endTitle", SpannerDbType.String} });
    cmd.Parameters["startTitle"].Value = startTitle;
    cmd.Parameters["endTitle"].Value = endTitle;
    using (var reader = await cmd.ExecuteReaderAsync())
    {
        while (await reader.ReadAsync())
        {
            Console.WriteLine("AlbumId : "
            + reader.GetFieldValue<string>("AlbumId")
            + " AlbumTitle : "
            + reader.GetFieldValue<string>("AlbumTitle")
            + " MarketingBudget : "
            + reader.GetFieldValue<string>("MarketingBudget"));
        }
    }
}

Go

Use Client.Single().ReadUsingIndex to read rows from the database using an index.

func readUsingIndex(ctx context.Context, w io.Writer, client *spanner.Client) error {
	iter := client.Single().ReadUsingIndex(ctx, "Albums", "AlbumsByAlbumTitle", spanner.AllKeys(),
		[]string{"AlbumId", "AlbumTitle"})
	defer iter.Stop()
	for {
		row, err := iter.Next()
		if err == iterator.Done {
			return nil
		}
		if err != nil {
			return err
		}
		var albumID int64
		var albumTitle string
		if err := row.Columns(&albumID, &albumTitle); err != nil {
			return err
		}
		fmt.Fprintf(w, "%d %s\n", albumID, albumTitle)
	}
}

Java

Use ReadContext.readUsingIndex to read rows from the database using an index.

static void readUsingIndex(DatabaseClient dbClient) {
  ResultSet resultSet =
      dbClient
          .singleUse()
          .readUsingIndex(
              "Albums",
              "AlbumsByAlbumTitle",
              KeySet.all(),
              Arrays.asList("AlbumId", "AlbumTitle"));
  while (resultSet.next()) {
    System.out.printf("%d %s\n", resultSet.getLong(0), resultSet.getString(1));
  }
}

Node.js

Use Table.read and specify the index in the query to read rows from the database using an index.

// Imports the Google Cloud client library
const Spanner = require('@google-cloud/spanner');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = 'my-project-id';
// const instanceId = 'my-instance';
// const databaseId = 'my-database';

// Creates a client
const spanner = new Spanner({
  projectId: projectId,
});

// Gets a reference to a Cloud Spanner instance and database
const instance = spanner.instance(instanceId);
const database = instance.database(databaseId);

const albumsTable = database.table('Albums');

const query = {
  columns: ['AlbumId', 'AlbumTitle'],
  keySet: {
    all: true,
  },
  index: 'AlbumsByAlbumTitle',
};

// Reads the Albums table using an index
albumsTable
  .read(query)
  .then(results => {
    const rows = results[0];

    rows.forEach(row => {
      const json = row.toJSON();
      console.log(`AlbumId: ${json.AlbumId}, AlbumTitle: ${json.AlbumTitle}`);
    });
  })
  .catch(err => {
    console.error('ERROR:', err);
  })
  .then(() => {
    // Close the database when finished.
    return database.close();
  });

PHP

Use Database::read and specify the index to read rows from the database using an index.

use Google\Cloud\Spanner\SpannerClient;

/**
 * Reads sample data from the database using an index.
 *
 * The index must exist before running this sample. You can add the index
 * by running the `add_index` sample or by running this DDL statement against
 * your database:
 *
 *     CREATE INDEX AlbumsByAlbumTitle ON Albums(AlbumTitle)
 *
 * Example:
 * ```
 * read_data_with_index($instanceId, $databaseId);
 * ```
 *
 * @param string $instanceId The Spanner instance ID.
 * @param string $databaseId The Spanner database ID.
 */
function read_data_with_index($instanceId, $databaseId)
{
    $spanner = new SpannerClient();
    $instance = $spanner->instance($instanceId);
    $database = $instance->database($databaseId);

    $keySet = $spanner->keySet(['all' => true]);
    $results = $database->read(
        'Albums',
        $keySet,
        ['AlbumId', 'AlbumTitle'],
        ['index' => 'AlbumsByAlbumTitle']
    );

    foreach ($results->rows() as $row) {
        printf('AlbumId: %s, AlbumTitle: %s' . PHP_EOL,
            $row['AlbumId'], $row['AlbumTitle']);
    }
}

Python

Use Database.read and specify the index to read rows from the database using an index.

def read_data_with_index(instance_id, database_id):
    """Reads sample data from the database using an index.

    The index must exist before running this sample. You can add the index
    by running the `add_index` sample or by running this DDL statement against
    your database:

        CREATE INDEX AlbumsByAlbumTitle ON Albums(AlbumTitle)

    """
    spanner_client = spanner.Client()
    instance = spanner_client.instance(instance_id)
    database = instance.database(database_id)

    with database.snapshot() as snapshot:
        keyset = spanner.KeySet(all_=True)
        results = snapshot.read(
            table='Albums',
            columns=('AlbumId', 'AlbumTitle'),
            keyset=keyset,
            index='AlbumsByAlbumTitle')

        for row in results:
            print('AlbumId: {}, AlbumTitle: {}'.format(*row))

Ruby

Use Client#read and specify the index to read rows from the database using an index.

# project_id  = "Your Google Cloud project ID"
# instance_id = "Your Spanner instance ID"
# database_id = "Your Spanner database ID"

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new project: project_id
client  = spanner.client instance_id, database_id

result = client.read "Albums", [:AlbumId, :AlbumTitle],
                     index: "AlbumsByAlbumTitle"

result.rows.each do |row|
  puts "#{row[:AlbumId]} #{row[:AlbumTitle]}"
end

Read data in parallel

When reading or querying large amounts of data from Cloud Spanner, it can be useful to divide the query into smaller pieces, or partitions, and use multiple machines to fetch the partitions in parallel.

You can perform any read API operation in parallel using the Cloud Spanner client libraries. However, you can only partition SQL queries where the first operator in the query execution plan is Distributed Union. To view the query execution plan for a particular SQL query, follow the instructions from SQL Best Practices.

After you have the query execution plan, make sure that the first operator it contains is Distributed Union.

C#

This example fetches partitions of a SQL query of the Singers table and executes the query over each partition through the following steps:

  • Creating a Cloud Spanner batch transaction.
  • Generating partitions for the query, so that the partitions can be distributed to multiple workers.
  • Retrieving the query results for each partition.

private static int s_partitionId;
private static int s_rowsRead;
public static object BatchReadRecords(string projectId,
     string instanceId, string databaseId)
{
    var responseTask =
        DistributedReadAsync(projectId, instanceId, databaseId);
    Console.WriteLine("Waiting for operation to complete...");
    responseTask.Wait();
    Console.WriteLine($"Operation status: {responseTask.Status}");
    return ExitCode.Success;
}

private static async Task DistributedReadAsync(string projectId,
    string instanceId, string databaseId)
{
    string connectionString =
        $"Data Source=projects/{projectId}/instances/{instanceId}"
        + $"/databases/{databaseId}";
    using (var connection = new SpannerConnection(connectionString))
    {
        await connection.OpenAsync();

        using (var transaction =
            await connection.BeginReadOnlyTransactionAsync())
        using (var cmd =
            connection.CreateSelectCommand(
                "SELECT SingerId, FirstName, LastName FROM Singers"))
        {
            transaction.DisposeBehavior =
                DisposeBehavior.CloseResources;
            cmd.Transaction = transaction;
            var partitions = await cmd.GetReaderPartitionsAsync();
            var transactionId = transaction.TransactionId;
            await Task.WhenAll(partitions.Select(
                    x => DistributedReadWorkerAsync(x, transactionId)))
                        .ConfigureAwait(false);
        }
        Console.WriteLine($"Done reading!  Total rows read: "
            + $"{s_rowsRead:N0} with {s_partitionId} partition(s)");
    }
}

private static async Task DistributedReadWorkerAsync(
    CommandPartition readPartition, TransactionId id)
{
    var localId = Interlocked.Increment(ref s_partitionId);
    using (var connection = new SpannerConnection(id.ConnectionString))
    using (var transaction = connection.BeginReadOnlyTransaction(id))
    {
        using (var cmd = connection.CreateCommandWithPartition(
            readPartition, transaction))
        {
            using (var reader =
                await cmd.ExecuteReaderAsync().ConfigureAwait(false))
            {
                while (await reader.ReadAsync())
                {
                    Interlocked.Increment(ref s_rowsRead);
                    Console.WriteLine($"Partition ({localId}) "
                        + $"{reader.GetFieldValue<string>("SingerId")}"
                        + $" {reader.GetFieldValue<string>("FirstName")}"
                        + $" {reader.GetFieldValue<string>("LastName")}");
                }
            }
        }
        Console.WriteLine($"Done with single reader {localId}.");
    }
}

Go

This example fetches partitions of a SQL query of the Singers table and executes the query over each partition through the following steps:

  • Creating a Cloud Spanner client and a transaction.
  • Generating partitions for the query, so that the partitions can be distributed to multiple workers.
  • Retrieving the query results for each partition.

func readBatchData(ctx context.Context, w io.Writer, client *spanner.Client) error {
	txn, err := client.BatchReadOnlyTransaction(ctx, spanner.StrongRead())
	if err != nil {
		return err
	}
	defer txn.Close()

	// Singer represents a row in the Singers table.
	type Singer struct {
		SingerID   int64
		FirstName  string
		LastName   string
		SingerInfo []byte
	}
	stmt := spanner.Statement{SQL: "SELECT SingerId, FirstName, LastName FROM Singers;"}
	partitions, err := txn.PartitionQuery(ctx, stmt, spanner.PartitionOptions{})
	if err != nil {
		return err
	}
	recordCount := 0
	for i, p := range partitions {
		iter := txn.Execute(ctx, p)
		defer iter.Stop()
		for {
			row, err := iter.Next()
			if err == iterator.Done {
				break
			} else if err != nil {
				return err
			}
			var s Singer
			if err := row.ToStruct(&s); err != nil {
				return err
			}
			fmt.Fprintf(w, "Partition (%d) %v\n", i, s)
			recordCount++
		}
	}
	fmt.Fprintf(w, "Total partition count: %v\n", len(partitions))
	fmt.Fprintf(w, "Total record count: %v\n", recordCount)
	return nil
}

Java

This example fetches partitions of a SQL query of the Singers table and executes the query over each partition through the following steps:

  • Creating a Cloud Spanner batch client and a transaction.
  • Generating partitions for the query, so that the partitions can be distributed to multiple workers.
  • Retrieving the query results for each partition.

int numThreads = Runtime.getRuntime().availableProcessors();
ExecutorService executor = Executors.newFixedThreadPool(numThreads);

// Statistics
int totalPartitions;
AtomicInteger totalRecords = new AtomicInteger(0);

try {
  BatchClient batchClient = spanner.getBatchClient(
      DatabaseId.of(options.getProjectId(), instanceId, databaseId));

  final BatchReadOnlyTransaction txn =
      batchClient.batchReadOnlyTransaction(TimestampBound.strong());

  // A Partition object is serializable and can be used from a different process.
  List<Partition> partitions = txn.partitionQuery(PartitionOptions.getDefaultInstance(),
      Statement.of("SELECT SingerId, FirstName, LastName FROM Singers"));

  totalPartitions = partitions.size();

  for (final Partition p : partitions) {
    executor.execute(() -> {
      try (ResultSet results = txn.execute(p)) {
        while (results.next()) {
          long singerId = results.getLong(0);
          String firstName = results.getString(1);
          String lastName = results.getString(2);
          System.out.println("[" + singerId + "] " + firstName + " " + lastName);
          totalRecords.getAndIncrement();
        }
      }
    });
  }
} finally {
  executor.shutdown();
  executor.awaitTermination(1, TimeUnit.HOURS);
  spanner.close();
}

double avgRecordsPerPartition = 0.0;
if (totalPartitions != 0) {
  avgRecordsPerPartition = (double) totalRecords.get() / totalPartitions;
}
System.out.println("totalPartitions=" + totalPartitions);
System.out.println("totalRecords=" + totalRecords);
System.out.println("avgRecordsPerPartition=" + avgRecordsPerPartition);

Node.js

This example shows how to fetch partitions of a SQL query of the Singers table, then execute the query over each partition.

This portion of the code generates partitions:

// Imports the Google Cloud client library
const Spanner = require('@google-cloud/spanner');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = 'my-project-id';
// const instanceId = 'my-instance';
// const databaseId = 'my-database';
// const identifier = {};

// Creates a client
const spanner = new Spanner({
  projectId: projectId,
});

// Gets a reference to a Cloud Spanner instance and database
const instance = spanner.instance(instanceId);
const database = instance.database(databaseId);
const transaction = database.batchTransaction(identifier);

const query = 'SELECT * FROM Singers';

transaction
  .createQueryPartitions(query)
  .then(data => {
    const partitions = data[0];
    console.log(
      `Successfully created ${partitions.length} query partitions.`
    );
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

Then, this portion of the code executes the query over each partition:

// Imports the Google Cloud client library
const Spanner = require('@google-cloud/spanner');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = 'my-project-id';
// const instanceId = 'my-instance';
// const databaseId = 'my-database';
// const identifier = {};
// const partition = {};

// Creates a client
const spanner = new Spanner({
  projectId: projectId,
});

// Gets a reference to a Cloud Spanner instance and database
const instance = spanner.instance(instanceId);
const database = instance.database(databaseId);
const transaction = database.batchTransaction(identifier);

transaction
  .execute(partition)
  .then(data => {
    const rows = data[0];
    console.log(
      `Successfully received ${rows.length} from executed partition.`
    );
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

PHP

This example fetches partitions of a SQL query of the Singers table and executes the query over each partition through the following steps:

  • Creating a Cloud Spanner client and a batch.
  • Generating partitions for the query, so that the partitions can be distributed to multiple workers.
  • Retrieving the query results for each partition.

use Google\Cloud\Spanner\SpannerClient;

/**
 * Queries sample data from the database using SQL.
 * Example:
 * ```
 * batch_query_data($instanceId, $databaseId);
 * ```
 *
 * @param string $instanceId The Spanner instance ID.
 * @param string $databaseId The Spanner database ID.
 */
function batch_query_data($instanceId, $databaseId)
{
    $spanner = new SpannerClient();
    $batch = $spanner->batch($instanceId, $databaseId);
    $snapshot = $batch->snapshot();
    $queryString = 'SELECT SingerId, FirstName, LastName FROM Singers';
    $partitions = $snapshot->partitionQuery($queryString);
    $totalPartitions = count($partitions);
    $totalRecords = 0;
    foreach ($partitions as $partition) {
        $result = $snapshot->executePartition($partition);
        $rows = $result->rows();
        foreach ($rows as $row) {
            $singerId = $row['SingerId'];
            $firstName = $row['FirstName'];
            $lastName = $row['LastName'];
            printf('SingerId: %s, FirstName: %s, LastName: %s' . PHP_EOL, $singerId, $firstName, $lastName);
            $totalRecords++;
        }
    }
    printf('Total Partitions: %d' . PHP_EOL, $totalPartitions);
    printf('Total Records: %d' . PHP_EOL, $totalRecords);
    $averageRecordsPerPartition = $totalRecords / $totalPartitions;
    printf('Average Records Per Partition: %f' . PHP_EOL, $averageRecordsPerPartition);
}

Python

This example fetches partitions of a SQL query of the Singers table and executes the query over each partition through the following steps:

  • Creating a Cloud Spanner client and a batch transaction.
  • Generating partitions for the query, so that the partitions can be distributed to multiple workers.
  • Retrieving the query results for each partition.

def run_batch_query(instance_id, database_id):
    """Runs an example batch query."""

    # Expected Table Format:
    # CREATE TABLE Singers (
    #   SingerId   INT64 NOT NULL,
    #   FirstName  STRING(1024),
    #   LastName   STRING(1024),
    #   SingerInfo BYTES(MAX),
    # ) PRIMARY KEY (SingerId);

    spanner_client = spanner.Client()
    instance = spanner_client.instance(instance_id)
    database = instance.database(database_id)

    # Create the batch transaction and generate partitions
    snapshot = database.batch_snapshot()
    partitions = snapshot.generate_read_batches(
        table='Singers',
        columns=('SingerId', 'FirstName', 'LastName',),
        keyset=spanner.KeySet(all_=True)
    )

    # Create a pool of workers for the tasks
    start = time.time()
    with concurrent.futures.ThreadPoolExecutor() as executor:
        futures = [executor.submit(process, snapshot, p) for p in partitions]

        for future in concurrent.futures.as_completed(futures, timeout=3600):
            finish, row_ct = future.result()
            elapsed = finish - start
            print(u'Completed {} rows in {} seconds'.format(row_ct, elapsed))

    # Clean up
    snapshot.close()


def process(snapshot, partition):
    """Processes the requests of a query in an separate process."""
    print('Started processing partition.')
    row_ct = 0
    for row in snapshot.process_read_batch(partition):
        print(u'SingerId: {}, AlbumId: {}, AlbumTitle: {}'.format(*row))
        row_ct += 1
    return time.time(), row_ct

Ruby

This example fetches partitions of a SQL query of the Singers table and executes the query over each partition through the following steps:

  • Creating a Cloud Spanner batch client.
  • Creating partitions for the query, so that the partitions can be distributed to multiple workers.
  • Retrieving the query results for each partition.

# project_id  = "Your Google Cloud project ID"
# instance_id = "Your Spanner instance ID"
# database_id = "Your Spanner database ID"

require "google/cloud/spanner"

# Prepare a thread pool with number of processors
processor_count  = Concurrent.processor_count
thread_pool      = Concurrent::FixedThreadPool.new processor_count

# Prepare AtomicFixnum to count total records using multiple threads
total_records = Concurrent::AtomicFixnum.new

# Create a new Spanner batch client
spanner        = Google::Cloud::Spanner.new project: project_id
batch_client   = spanner.batch_client instance_id, database_id

# Get a strong timestamp bound batch_snapshot
batch_snapshot = batch_client.batch_snapshot strong: true

# Get partitions for specified query
partitions       = batch_snapshot.partition_query "SELECT SingerId, FirstName, LastName FROM Singers"
total_partitions = partitions.size

# Enqueue a new thread pool job
partitions.each_with_index do |partition, partition_index|
  thread_pool.post do
    # Increment total_records per new row
    batch_snapshot.execute_partition(partition).rows.each do |row|
      total_records.increment
    end
  end
end

# Wait for queued jobs to complete
thread_pool.shutdown
thread_pool.wait_for_termination

# Close the client connection and release resources.
batch_snapshot.close

# Collect statistics for batch query
average_records_per_partition = 0.0
if total_partitions != 0
  average_records_per_partition = total_records.value.to_f / total_partitions.to_f
end

puts "Total Partitions: #{total_partitions}"
puts "Total Records: #{total_records.value}"
puts "Average records per Partition: #{average_records_per_partition}"
# project_id  = "Your Google Cloud project ID"
# instance_id = "Your Spanner instance ID"
# database_id = "Your Spanner database ID"

require "google/cloud/spanner"

# Prepare a thread pool with number of processors
processor_count  = Concurrent.processor_count
thread_pool      = Concurrent::FixedThreadPool.new processor_count

# Prepare AtomicFixnum to count total records using multiple threads
total_records = Concurrent::AtomicFixnum.new

# Create a new Spanner batch client
spanner        = Google::Cloud::Spanner.new project: project_id
batch_client   = spanner.batch_client instance_id, database_id

# Get a strong timestamp bound batch_snapshot
batch_snapshot = batch_client.batch_snapshot strong: true

# Get partitions for specified query
partitions       = batch_snapshot.partition_query "SELECT SingerId, FirstName, LastName FROM Singers"
total_partitions = partitions.size

# Enqueue a new thread pool job
partitions.each_with_index do |partition, partition_index|
  thread_pool.post do
    # Increment total_records per new row
    batch_snapshot.execute_partition(partition).rows.each do |row|
      total_records.increment
    end
  end
end

# Wait for queued jobs to complete
thread_pool.shutdown
thread_pool.wait_for_termination

# Close the client connection and release resources.
batch_snapshot.close

# Collect statistics for batch query
average_records_per_partition = 0.0
if total_partitions != 0
  average_records_per_partition = total_records.value.to_f / total_partitions.to_f
end

puts "Total Partitions: #{total_partitions}"
puts "Total Records: #{total_records.value}"
puts "Average records per Partition: #{average_records_per_partition}"

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Spanner Documentation