Optimizing for network latency

This document lists best practices for using the Cloud Healthcare API. The guidelines in this page are designed for greater efficiency, accuracy, and optimal response times from the service.

Understanding latency performance

The performance of the Cloud Healthcare API is measured by the latency between:

  1. When you send a request to the Cloud Healthcare API.
  2. When you receive a full response to the request.

Latency comprises three components:

  • Round-trip time (RTT)
  • Server processing latency
  • Server throughput

The geographical distance between you and the server you are making requests to can have a significant impact on RTT and server throughput. The measured inter-region latency and throughput for Google Cloud networks can be found in a live dashboard. The dashboard shows the performance a client can expect from different locations when making requests to Cloud Healthcare API servers.

Measuring latency performance

The following tools and dashboards provide ways to measure the performance of requests to and from Cloud Healthcare API servers:

  • Google Cloud Console latency metrics: You can view the server-side latency of Cloud Healthcare API requests in the Google Cloud Console. For more information, see Google Cloud metrics.

  • Cloud Logging custom metrics: You can create distribution metrics using Logging. Distribution metrics let you configure and understand end-to-end latency in your applications. You can also monitor and report on any custom-defined latency measurements.

  • Chrome network panel: You can inspect network activity in Chrome DevTools to view the performance details of an HTTP request sent from a browser.

Reducing request latency

This section describes various methods of reducing the latency of requests sent to the Cloud Healthcare API.

Sending requests to the closest regional location

To get the best RTT and server throughput performance, send requests from the client to the closest Cloud Healthcare API regional location. See Regions for a list of available regions.

Compressing the response body

If a client has limited bandwidth, a simple way to reduce the bandwidth needed for each request is to enable gzip compression. gzip is a form of data compression: it typically reduces the size of a file. This allows the file to be transferred faster and stored using less space than if it were not compressed. Compressing a file can reduce both cost and transfer time.

Although enabling gzip compression requires additional CPU time to extract the results, the benefit of saving bandwidth typically makes using gzip compression worthwhile. However, if limited bandwidth is not a concern, then the benefits of gzip compression will not be worthwhile.

To receive a gzip-encoded response, you must set an Accept-Encoding header in your request.

The following sample shows a properly formed HTTP header for enabling gzip compression:

Accept-Encoding: gzip

Sending warmup requests

When a client sends requests to a Cloud Healthcare API server for the first time during a session, the client performs TCP handshakes with the server to establish connections for HTTP requests. Any subsequent requests can continue to use these established connections, allowing the client to avoid the TCP overhead typically associated with a request. This results in better performance when sending requests.

Sending requests concurrently with HTTP/1.1 or HTTP/2

To obtain the best performance for a series of requests, send the requests concurrently. Use the following guidelines when sending concurrent requests:

  • When sending concurrent requests, try to find an ideal number for the number of concurrent requests. The ideal number depends on several factors including your hardware and network capabilities and how many requests are being sent. Conduct tests to find the ideal number.
  • Send requests from the client using HTTP/2 whenever possible. HTTP/2 provides better performance than HTTP/1.1 because HTTP/2 requires only one TCP connection when sending multiple requests sequentially or concurrently. As a result, you can avoid TCP handshake overhead.
  • If it's not possible to use HTTP/2, use HTTP/1.1 with a persistent connection. You can avoid TCP handshake overhead if warmup requests have already been sent. Using a persistent connection might require you to manage an optimized connection with a connection pool for your HTTP library.

    For example, to set a connection pool with 20 concurrent requests using the Google HTTP client library for Java, your code would include the following:

    PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();
    // Support 20 concurrent requests.
    cm.setDefaultMaxPerRoute(20);
    cm.setMaxTotal(100);
    HTTP_CLIENT = HttpClients.custom().setConnectionManager(cm).build();
    

    To set a connection pool with 20 concurrent requests using Node.js, your code would include the following:

    require('http').globalAgent.maxSockets = 20