Cloud Spanner end-to-end latency guide

This page:

  • Describes the high-level components involved in a Cloud Spanner API request.
  • Explains how to extract, capture, and visualize latencies associated with these components to know the source of the latencies.

The high-level components that are used to make a Spanner API request include:

  • Spanner client libraries, which provide a layer of abstraction on top of gRPC, and handle server communication details, such as session management, transactions, and retries.

  • The Google Front End (GFE), which is an infrastructure service that's common to all Google Cloud services, including Spanner. The GFE verifies that all Transport Layer Security (TLS) connections are stopped and applies protections against Denial of Service attacks. To learn more about the GFE, see Google Front End Service.

  • The Spanner API front end, which performs various checks on the API request (including authentication, authorization, and quota checks), and maintains sessions and transaction states.

  • The Spanner database, which executes reads and writes to the database.

When you make a remote procedure call to Spanner, the Spanner client libraries prepare the API request. Then, the API request passes through both the GFE and the Spanner API front end before reaching the Spanner database.

By measuring and comparing the request latencies between different components and the database, you can determine which component is causing the problem. These latencies include client round-trip, GFE, Spanner API request, and query latencies.

Spanner architecture diagram.

Client round-trip latency

Client round-trip latency is the length of time (in milliseconds) between the first byte of the Spanner API request that the client sends to the database (through both the GFE and the Spanner API front end), and the last byte of response that the client receives from the database.

Spanner architecture diagram for client round-trip latency.

Spanner client libraries provide statistics and traces with the use of the OpenCensus instrumentation framework. This framework gives insights into the internals of the client, and aids in troubleshooting end-to-end (round-trip) latency.

By default, the framework is disabled. To learn how to enable this framework, see Capture client round-trip latency.

The grpc.io/client/roundtrip_latency metric provides the time between the first byte of the API request sent to the last byte of the response received.

Capture client round-trip latency

You can capture client round-trip latency for the following languages:

Java

static void captureGrpcMetric(DatabaseClient dbClient) {
  // Register basic gRPC views.
  RpcViews.registerClientGrpcBasicViews();

  // Enable OpenCensus exporters to export metrics to Stackdriver Monitoring.
  // Exporters use Application Default Credentials to authenticate.
  // See https://developers.google.com/identity/protocols/application-default-credentials
  // for more details.
  try {
    StackdriverStatsExporter.createAndRegister();
  } catch (IOException | IllegalStateException e) {
    System.out.println("Error during StackdriverStatsExporter");
  }

  try (ResultSet resultSet =
      dbClient
          .singleUse() // Execute a single read or query against Cloud Spanner.
          .executeQuery(Statement.of("SELECT SingerId, AlbumId, AlbumTitle FROM Albums"))) {
    while (resultSet.next()) {
      System.out.printf(
          "%d %d %s", resultSet.getLong(0), resultSet.getLong(1), resultSet.getString(2));
    }
  }
}

Go


import (
	"context"
	"fmt"
	"io"
	"regexp"

	"cloud.google.com/go/spanner"
	"google.golang.org/api/iterator"

	"contrib.go.opencensus.io/exporter/stackdriver"
	"go.opencensus.io/plugin/ocgrpc"
	"go.opencensus.io/stats/view"
)

var validDatabasePattern = regexp.MustCompile("^projects/(?P<project>[^/]+)/instances/(?P<instance>[^/]+)/databases/(?P<database>[^/]+)$")

func queryWithGRPCMetric(w io.Writer, db string) error {
	projectID, _, _, err := parseDatabaseName(db)
	if err != nil {
		return err
	}

	ctx := context.Background()
	client, err := spanner.NewClient(ctx, db)
	if err != nil {
		return err
	}
	defer client.Close()

	// Register OpenCensus views.
	if err := view.Register(ocgrpc.DefaultClientViews...); err != nil {
		return err
	}

	// Create OpenCensus Stackdriver exporter.
	sd, err := stackdriver.NewExporter(stackdriver.Options{
		ProjectID: projectID,
	})
	if err != nil {
		return err
	}
	// It is imperative to invoke flush before your main function exits
	defer sd.Flush()

	// Start the metrics exporter
	sd.StartMetricsExporter()
	defer sd.StopMetricsExporter()

	stmt := spanner.Statement{SQL: `SELECT SingerId, AlbumId, AlbumTitle FROM Albums`}
	iter := client.Single().Query(ctx, stmt)
	defer iter.Stop()
	for {
		row, err := iter.Next()
		if err == iterator.Done {
			return nil
		}
		if err != nil {
			return err
		}
		var singerID, albumID int64
		var albumTitle string
		if err := row.Columns(&singerID, &albumID, &albumTitle); err != nil {
			return err
		}
		fmt.Fprintf(w, "%d %d %s\n", singerID, albumID, albumTitle)
	}
}

func parseDatabaseName(databaseUri string) (project, instance, database string, err error) {
	matches := validDatabasePattern.FindStringSubmatch(databaseUri)
	if len(matches) == 0 {
		return "", "", "", fmt.Errorf("failed to parse database name from %q according to pattern %q",
			databaseUri, validDatabasePattern.String())
	}
	return matches[1], matches[2], matches[3], nil
}

Visualize client round-trip latency

After retrieving the metrics, you can visualize client round-trip latency in Cloud Monitoring.

Here's an example of a graph that illustrates the 5th percentile latency for the client round-trip latency metric. To change the percentile latency to either the 50th or the 99th percentile, use the Aggregator menu.

The program creates an OpenCensus view called roundtrip_latency. This string becomes part of the name of the metric when it's exported to Cloud Monitoring.

Cloud Monitoring client round-trip latency.

GFE latency

GFE latency is the length of time (in milliseconds) between when Google's network receives a remote procedure call from the client and when the GFE receives the first byte of the response. This latency doesn't include any TCP/SSL handshake.

Spanner architecture diagram for GFE latency.

Every response from Spanner, whether it's REST or gRPC, includes a header that contains the total time between the GFE and the back end (the Spanner service) for both the request and the response. This helps to differentiate better the source of the latency between the client and Google's network.

The cloud.google.com/[language]/spanner/gfe_latency metric captures and exposes GFE latency for Spanner requests.

Capture GFE latency

You can capture GFE latency for the following languages:

Java

static void captureGfeMetric(DatabaseClient dbClient) {
  // Capture GFE Latency.
  SpannerRpcViews.registerGfeLatencyView();

  // Capture GFE Latency and GFE Header missing count.
  // SpannerRpcViews.registerGfeLatencyAndHeaderMissingCountViews();

  // Capture only GFE Header missing count.
  // SpannerRpcViews.registerGfeHeaderMissingCountView();

  // Enable OpenCensus exporters to export metrics to Stackdriver Monitoring.
  // Exporters use Application Default Credentials to authenticate.
  // See https://developers.google.com/identity/protocols/application-default-credentials
  // for more details.
  try {
    StackdriverStatsExporter.createAndRegister();
  } catch (IOException | IllegalStateException e) {
    System.out.println("Error during StackdriverStatsExporter");
  }

  try (ResultSet resultSet =
      dbClient
          .singleUse() // Execute a single read or query against Cloud Spanner.
          .executeQuery(Statement.of("SELECT SingerId, AlbumId, AlbumTitle FROM Albums"))) {
    while (resultSet.next()) {
      System.out.printf(
          "%d %d %s", resultSet.getLong(0), resultSet.getLong(1), resultSet.getString(2));
    }
  }
}

Go


// We are in the process of adding support in the Cloud Spanner Go Client Library
// to capture the gfe_latency metric.

import (
	"context"
	"fmt"
	"io"
	"strconv"
	"strings"

	spanner "cloud.google.com/go/spanner/apiv1"
	gax "github.com/googleapis/gax-go/v2"
	sppb "google.golang.org/genproto/googleapis/spanner/v1"
	"google.golang.org/grpc"
	"google.golang.org/grpc/metadata"

	"contrib.go.opencensus.io/exporter/stackdriver"
	"go.opencensus.io/stats"
	"go.opencensus.io/stats/view"
	"go.opencensus.io/tag"
)

// OpenCensus Tag, Measure and View.
var (
	KeyMethod    = tag.MustNewKey("grpc_client_method")
	GFELatencyMs = stats.Int64("cloud.google.com/go/spanner/gfe_latency",
		"Latency between Google's network receives an RPC and reads back the first byte of the response", "ms")
	GFELatencyView = view.View{
		Name:        "cloud.google.com/go/spanner/gfe_latency",
		Measure:     GFELatencyMs,
		Description: "Latency between Google's network receives an RPC and reads back the first byte of the response",
		Aggregation: view.Distribution(0.0, 0.01, 0.05, 0.1, 0.3, 0.6, 0.8, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 8.0, 10.0, 13.0,
			16.0, 20.0, 25.0, 30.0, 40.0, 50.0, 65.0, 80.0, 100.0, 130.0, 160.0, 200.0, 250.0,
			300.0, 400.0, 500.0, 650.0, 800.0, 1000.0, 2000.0, 5000.0, 10000.0, 20000.0, 50000.0,
			100000.0),
		TagKeys: []tag.Key{KeyMethod}}
)

func queryWithGFELatency(w io.Writer, db string) error {
	projectID, _, _, err := parseDatabaseName(db)
	if err != nil {
		return err
	}

	ctx := context.Background()
	client, err := spanner.NewClient(ctx)
	if err != nil {
		return err
	}
	defer client.Close()

	// Register OpenCensus views.
	err = view.Register(&GFELatencyView)
	if err != nil {
		return err
	}

	// Create OpenCensus Stackdriver exporter.
	sd, err := stackdriver.NewExporter(stackdriver.Options{
		ProjectID: projectID,
	})
	if err != nil {
		return err
	}
	// It is imperative to invoke flush before your main function exits
	defer sd.Flush()

	// Start the metrics exporter
	sd.StartMetricsExporter()
	defer sd.StopMetricsExporter()

	// Create a session.
	req := &sppb.CreateSessionRequest{Database: db}
	session, err := client.CreateSession(ctx, req)
	if err != nil {
		return err
	}

	// Execute a SQL query and retrieve the GFE server-timing header in gRPC metadata.
	req2 := &sppb.ExecuteSqlRequest{
		Session: session.Name,
		Sql:     `SELECT SingerId, AlbumId, AlbumTitle FROM Albums`,
	}
	var md metadata.MD
	resultSet, err := client.ExecuteSql(ctx, req2, gax.WithGRPCOptions(grpc.Header(&md)))
	if err != nil {
		return err
	}
	for _, row := range resultSet.GetRows() {
		for _, value := range row.GetValues() {
			fmt.Fprintf(w, "%s ", value.GetStringValue())
		}
		fmt.Fprintf(w, "\n")
	}

	// The format is: "server-timing: gfet4t7; dur=[GFE latency in ms]"
	srvTiming := md.Get("server-timing")[0]
	gfeLtcy, err := strconv.Atoi(strings.TrimPrefix(srvTiming, "gfet4t7; dur="))
	if err != nil {
		return err
	}
	// Record GFE t4t7 latency with OpenCensus.
	ctx, err = tag.New(ctx, tag.Insert(KeyMethod, "ExecuteSql"))
	if err != nil {
		return err
	}
	stats.Record(ctx, GFELatencyMs.M(int64(gfeLtcy)))

	return nil
}

Visualize GFE latency

After retrieving the metrics, you can visualize GFE latency in Cloud Monitoring.

Here's an example of a graph that illustrates the 5th percentile latency for the GFE latency metric. To change the percentile latency to either the 50th or the 99th percentile, use the Aggregator menu.

The program creates an OpenCensus view called gfe_latency. This string becomes part of the name of the metric when it's exported to Cloud Monitoring.

Cloud Monitoring GFE latency.

Spanner API request latency

Spanner API request latency is the length of time (in seconds) between the first byte of request that the Spanner API front end receives and the last byte of response that the Spanner API front end sends. The latency includes the time needed for processing API requests in both the Spanner back end and the API layer. However, this latency doesn't include network or reverse-proxy overhead between Spanner clients and servers.

Spanner architecture diagram for Spanner API request latency.

The spanner.googleapis.com/api/request_latencies metric captures and exposes Spanner API front end latency for Spanner requests.

Capture Spanner API request latency

By default, this latency is available as part of Cloud Monitoring metrics. You don't have to do anything to capture and export it.

Visualize Spanner API request latency

You can use the Metrics Explorer charting tool to visualize the graph for the spanner.googleapis.com/api/request_latencies metric in Cloud Monitoring.

Here's an example of a graph that illustrates the 5th percentile latency for the Spanner API request latency metric. To change the percentile latency to either the 50th or the 99th percentile, use the Aggregator menu.

Cloud Monitoring API request latency.

Query latency

Query latency is the length of time (in milliseconds) that it takes to run SQL queries in the Spanner database.

Spanner architecture diagram for query latency.

Query latency is available for both the executeSql and the executeStreamingSql APIs.

If the QueryMode parameter is set to PROFILE, then Spanner's ResultSetStats are available in the responses.

Setting QueryMode to PROFILE returns both the query plan, and the execution statistics along with the results. Also, ResultSetStats includes the elapsed time for running queries in the Spanner database.

Capture query latency

You can capture query latency for the following languages:

Java

private static final String MILLISECOND = "ms";
static final List<Double> RPC_MILLIS_BUCKET_BOUNDARIES =
    Collections.unmodifiableList(
        Arrays.asList(
            0.0, 0.01, 0.05, 0.1, 0.3, 0.6, 0.8, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 8.0, 10.0, 13.0,
            16.0, 20.0, 25.0, 30.0, 40.0, 50.0, 65.0, 80.0, 100.0, 130.0, 160.0, 200.0, 250.0,
            300.0, 400.0, 500.0, 650.0, 800.0, 1000.0, 2000.0, 5000.0, 10000.0, 20000.0, 50000.0,
            100000.0));
static final Aggregation AGGREGATION_WITH_MILLIS_HISTOGRAM =
    Distribution.create(BucketBoundaries.create(RPC_MILLIS_BUCKET_BOUNDARIES));

static MeasureDouble QUERY_STATS_ELAPSED =
    MeasureDouble.create(
        "cloud.google.com/java/spanner/query_stats_elapsed",
        "The execution of the query",
        MILLISECOND);

// Register the view. It is imperative that this step exists,
// otherwise recorded metrics will be dropped and never exported.
static View QUERY_STATS_LATENCY_VIEW = View
    .create(Name.create("cloud.google.com/java/spanner/query_stats_elapsed"),
        "The execution of the query",
        QUERY_STATS_ELAPSED,
        AGGREGATION_WITH_MILLIS_HISTOGRAM,
        Collections.emptyList());

static ViewManager manager = Stats.getViewManager();
private static final StatsRecorder STATS_RECORDER = Stats.getStatsRecorder();

static void captureQueryStatsMetric(DatabaseClient dbClient) {
  manager.registerView(QUERY_STATS_LATENCY_VIEW);

  // Enable OpenCensus exporters to export metrics to Cloud Monitoring.
  // Exporters use Application Default Credentials to authenticate.
  // See https://developers.google.com/identity/protocols/application-default-credentials
  // for more details.
  try {
    StackdriverStatsExporter.createAndRegister();
  } catch (IOException | IllegalStateException e) {
    System.out.println("Error during StackdriverStatsExporter");
  }

  try (ResultSet resultSet = dbClient.singleUse()
      .analyzeQuery(Statement.of("SELECT SingerId, AlbumId, AlbumTitle FROM Albums"),
          QueryAnalyzeMode.PROFILE)) {

    while (resultSet.next()) {
      System.out.printf(
          "%d %d %s", resultSet.getLong(0), resultSet.getLong(1), resultSet.getString(2));
    }
    Value value = resultSet.getStats().getQueryStats()
        .getFieldsOrDefault("elapsed_time", Value.newBuilder().setStringValue("0 msecs").build());
    double elapasedTime = Double.parseDouble(value.getStringValue().replaceAll(" msecs", ""));
    STATS_RECORDER.newMeasureMap()
        .put(QUERY_STATS_ELAPSED, elapasedTime)
        .record();
  }
}

Go


import (
	"context"
	"fmt"
	"io"
	"strconv"
	"strings"

	"cloud.google.com/go/spanner"
	"google.golang.org/api/iterator"

	"contrib.go.opencensus.io/exporter/stackdriver"
	"go.opencensus.io/stats"
	"go.opencensus.io/stats/view"
	"go.opencensus.io/tag"
)

// OpenCensus Tag, Measure and View.
var (
	QueryStatsElapsed = stats.Float64("cloud.google.com/go/spanner/query_stats_elapsed",
		"The execution of the query", "ms")
	QueryStatsLatencyView = view.View{
		Name:        "cloud.google.com/go/spanner/query_stats_elapsed",
		Measure:     QueryStatsElapsed,
		Description: "The execution of the query",
		Aggregation: view.Distribution(0.0, 0.01, 0.05, 0.1, 0.3, 0.6, 0.8, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 8.0, 10.0, 13.0,
			16.0, 20.0, 25.0, 30.0, 40.0, 50.0, 65.0, 80.0, 100.0, 130.0, 160.0, 200.0, 250.0,
			300.0, 400.0, 500.0, 650.0, 800.0, 1000.0, 2000.0, 5000.0, 10000.0, 20000.0, 50000.0,
			100000.0),
		TagKeys: []tag.Key{}}
)

func queryWithQueryStats(w io.Writer, db string) error {
	projectID, _, _, err := parseDatabaseName(db)
	if err != nil {
		return err
	}

	ctx := context.Background()
	client, err := spanner.NewClient(ctx, db)
	if err != nil {
		return err
	}
	defer client.Close()

	// Register OpenCensus views.
	err = view.Register(&QueryStatsLatencyView)
	if err != nil {
		return err
	}

	// Create OpenCensus Stackdriver exporter.
	sd, err := stackdriver.NewExporter(stackdriver.Options{
		ProjectID: projectID,
	})
	if err != nil {
		return err
	}
	// It is imperative to invoke flush before your main function exits
	defer sd.Flush()

	// Start the metrics exporter
	sd.StartMetricsExporter()
	defer sd.StopMetricsExporter()

	// Execute a SQL query and get the query stats.
	stmt := spanner.Statement{SQL: `SELECT SingerId, AlbumId, AlbumTitle FROM Albums`}
	iter := client.Single().QueryWithStats(ctx, stmt)
	defer iter.Stop()
	for {
		row, err := iter.Next()
		if err == iterator.Done {
			// Record query execution time with OpenCensus.
			elapasedTime := iter.QueryStats["elapsed_time"].(string)
			elapasedTimeMs, err := strconv.ParseFloat(strings.TrimSuffix(elapasedTime, " msecs"), 64)
			if err != nil {
				return err
			}
			stats.Record(ctx, QueryStatsElapsed.M(elapasedTimeMs))
			return nil
		}
		if err != nil {
			return err
		}
		var singerID, albumID int64
		var albumTitle string
		if err := row.Columns(&singerID, &albumID, &albumTitle); err != nil {
			return err
		}
		fmt.Fprintf(w, "%d %d %s\n", singerID, albumID, albumTitle)
	}
}

Visualize query latency

After retrieving the metrics, you can visualize query latency in Cloud Monitoring.

Here's an example of a graph that illustrates the 5th percentile latency for the query latency metric. To change the percentile latency to either the 50th or the 99th percentile, use the Aggregator menu.

The program creates an OpenCensus view called query_stats_elapsed. This string becomes part of the name of the metric when it's exported to Cloud Monitoring.

Cloud Monitoring query latency.

Troubleshoot latency issues

By using Cloud Monitoring to capture and visualize client round-trip, GFE, Spanner API request, and query latencies, you can compare them, side-by-side. This helps you identify the source of the latency.

In the following table, you can find some examples of latency issues and why they might be occurring:

For this latency issue... The problem might be...
You have a high client round-trip latency, but a low GFE latency and a low Spanner API request latency. There might be an issue in the application code, or there might be a networking issue between the client and the regional GFE. If your application has a performance issue that causes some code paths to be slow, then the client round-trip latency for each API request might increase. This latency can also be caused by issues on the computing infrastructure on the client side (for example, VM, CPU, or memory utilization, connections, file descriptors, and so on).
You have a high client round-trip latency and a high Spanner API request latency.
  1. Some of your queries are causing higher latencies because the queries scan and fetch a lot of data. To analyze the profile so that you can optimize your queries, see Tuning a query using the query plan visualizer.
  2. If the query-per-second rate is low, then the latency might come from outliers. To reduce the impact of the outliers, see Configuring custom timeouts and retries.
You have a high GFE latency, but a low Spanner API request latency.
  1. Accessing a database from another region can lead to high GFE latency and lower Spanner API request latency. For example, traffic from a client in the us-east1 region that has an instance in the us-central1 region might have a high GFE latency but a lower Spanner API request latency.
  2. Check the Google Cloud Status Dashboard to see if there are any ongoing networking issues in your region. If there aren't any issues, then open a support case and include this information so that support engineers can help with troubleshooting the GFE.
You have a high Spanner API request latency, but a low query latency. Check the Google Cloud Status Dashboard to see if there are any ongoing networking issues in your region. If there aren't any issues, then open a support case and include this information so that support engineers can help with troubleshooting the Spanner API front end.
You have a high query latency. To view metrics for your query and to help you troubleshoot your query latency issue, see Query statistics. For better performance, optimize your schema, indexing, and query. For more information, see SQL best practices and Troubleshooting performance regressions.

What's next