Performance Dashboard overview

Performance Dashboard gives you visibility into the performance of the entire Google Cloud network, as well as to the performance of your project's resources.

With these performance-monitoring capabilities, you can distinguish between a problem in your application and a problem in the underlying Google Cloud network. You can also investigate historical network performance problems.

Performance Dashboard also exports data to Cloud Monitoring. You can use Monitoring to query the data and get access to additional information. For details, see Performance Dashboard metrics reference.

Project performance view

In the project performance view, Performance Dashboard shows packet loss or latency metrics only for zones where you have project virtual machine (VM) instances. For example, VM-to-VM traffic and VM-to-internet traffic. You can select up to five regions where the workloads are deployed. The dashboard lets you see and understand the following:

  • Packet loss summary
  • Packet loss average between region pairs of the regions selected
  • Packet loss average between zone pairs of selected regions
  • Latency summary
  • Latency median between region pairs of the regions selected
  • Latency median between zone pairs of the regions selected

Traffic between VM instances

Performance Dashboard shows packet loss and latency metrics (in summary charts and heatmap views) for zones where you have Compute Engine virtual machine (VM) instances. It provides current data and metrics for the past six weeks. For example, your project has a Virtual Private Cloud (VPC) network with VMs in zones A and B. In such a case, Performance Dashboard provides packet loss and latency metrics for your project between those two zones.

You can also view the aggregated latency metrics from a sample of your actual VM-to-VM traffic in a tabular view depending on the selected time period. The latency details table lists the VMs and their corresponding latency details.

Traffic between Google Cloud and internet locations

Performance Dashboard shows latency metrics for regions where you have Compute Engine virtual machine (VM) instances and the internet locations of the end devices that communicate with the VMs. It provides the current latency metrics and six weeks worth of historical data. For example, your project has a Virtual Private Cloud network with VMs in region A that receive traffic from clients in cities X and Y. In such a case, Performance Dashboard provides latency metrics for your project between region A and cities X and Y.

To view project metrics, click View project performance at the top of the Performance Dashboard page. For more examples and details about what is measured, see Metrics.

Google Cloud performance view

In the Google Cloud performance view, Performance Dashboard shows zone-to-zone packet loss and latency metrics across all Google Cloud. For example, VM-to-VM traffic and Google Cloud to internet traffic. The dashboard shows the status of the Google Cloud network and lets you compare the performance across all of Google Cloud to the performance observed in your projects. You can select up to five Google Cloud regions to see and understand the following:

  • Packet loss summary. This view can show up to 50 zone pairs with VM-to-VM packet loss in all of Google Cloud.
  • Packet loss average between zone pairs.
  • Latency summary. This view can show up to 50 zone pairs with VM-to-VM round trip time (RTT) in all of Google Cloud.
  • Median latency between zone pairs.

Traffic between VM instances

Performance Dashboard shows the packet loss and latency metrics across all of Google Cloud. These metrics can help you understand whether issues evident in the per-project dashboard are unique to your project.

The Google Cloud performance view shows time series data for up to 50 zone pairs for the selected time window, which by default is one hour.

You can view network performance for any Google Cloud zone pair, even if your project is not deployed in those zones. You can view the performance at both the region level and the zone level. A summary time series chart shows up to 50 zone pairs with the highest aggregated VM-to-VM packet loss or latency across all of Google Cloud.

Traffic between Google Cloud and internet locations

Performance Dashboard shows latency metrics between VMs across all Google Cloud regions and internet endpoints. You can aggregate the traffic to city, geographic region, and country levels. You can view the latency metrics that correspond to specific region-geographic location pairs if there is sufficient Google Cloud traffic for that pair.

These metrics can help you assess whether issues apparent in the per-project dashboard are unique to your project. The global metrics can also help you plan future deployments.

To view the Google Cloud performance metrics, click View performance for all of Google Cloud at the top of the Performance Dashboard page. To view the Google Cloud performance metrics from the project performance view, you can hold the pointer over the specific zone pairs. For more examples and details about what is measured, see Metrics.

Metrics

Performance Dashboard provides two kinds of metrics: packet loss and latency (round-trip time, or RTT). To get packet loss metrics for your project, you need a sufficient number of VMs in the project. To get latency metrics, you need a sufficient amount of traffic. Other than that, Performance Dashboard requires no setup.

The following sections describe both metrics in more detail.

Packet loss

Packet loss metrics show the results of active probing between the following:

  • VMs within a single VPC network.

  • VMs in peered VPC networks, when one or both networks sit within your project. If the peered networks sit in different projects, packet loss is visible in the destination project.

  • VMs in a Shared VPC network that is used by your project. Packet loss between two projects that use a Shared VPC network is visible in the destination service project.

For example, suppose project A includes two VPC networks: network A, which has VMs only in zone A, and network M, which has VMs only in zone M. If those two networks are peered, project A's Performance Dashboard shows the packet loss data for the A/M zone pair. If the networks are not peered, Performance Dashboard does not show the packet loss metric for that zone pair.

On the other hand, suppose that these two networks are not in the same project. That is, suppose network A is part of project A, and network M is part of project M. When the networks are peered, project M's Performance Dashboard shows packet loss data for situations where zone M is the destination zone. Conversely, when zone A is the destination zone, the packet loss data is visible only to project A. If the networks are not peered, neither project's Performance Dashboard shows packet loss data for the zone pair.

The data gathered through all the probes is aggregated in Performance Dashboard. That is, Performance Dashboard does not let you isolate data about intra-project packet loss versus other types (such as packet loss related to a peered VPC network in another project). However, you can use Monitoring to drill down and see results that are more granular. For details, see Performance Dashboard metrics reference.

Performance Dashboard does not send probes over Cloud VPN connections.

Methodology

Performance Dashboard runs workers on the physical hosts that house your VMs. These workers insert and receive probe packets that run on the same network as your traffic. Because the workers run on the physical hosts and not on your VMs, these workers do not consume VM resources, and the traffic is not visible on your VMs.

The probes cover the entire mesh of VMs that can communicate with each other, which is not necessarily the same as your traffic pattern. Therefore, you might see indications of packet loss in Performance Dashboard, but no evidence of packet loss in your application.

For all probed VMs, we try to access the VM both by using its internal IP address and external IP address (if one exists). The probes do not leave Google Cloud, but by using external IP addresses, Performance Dashboard can cover part of the path that would be used by external traffic, such as traffic coming from the internet.

Packet loss for internal IP addresses is measured by using UDP packets, and packet loss for external IP addresses is measured by using TCP packets.

Metric availability and confidence levels

Performance Dashboard probes a subset of all VM-VM pairs in the network. The data gathered is then used to estimate the packet loss that you might experience. Google's confidence in the data depends on the probing rate, and the probing rate depends on the number of VMs that you have in each zone, as well as the number of zones where you have VMs deployed. For example, having 10 VMs in two zones generates more confidence than having 10 VMs in 10 zones.

All VMs, including those created by Google Kubernetes Engine (GKE), count toward the total number of VMs.

The varying levels of confidence are described in the following table. Lower levels of confidence are flagged in the heatmap with an asterisk (*) or N/A.

Level Required number of VMs in each zone What Performance Dashboard shows on the heatmap
95% confidence 10 VMs x the number of zones in the project. For example, if you have 12 zones in your project, you must have 120 VMs in each zone. A measurement without any additional notations
90% confidence 2.5 VMs x the number of zones in the project. For example, if you have 12 zones in your project, you must have 30 VMs in each zone. A measurement without any additional notations
Low confidence A measurement with an asterisk
Not enough probes to have meaningful data N/A

The Google Cloud packet loss metrics are always available. An asterisk (*) is displayed if there are fewer than 400 probes per minute.

Project-specific latency

Latency metrics are measured by using customer traffic between the following:

  • VMs within a single VPC network
  • VMs between peered VPC networks, if the networks sit in the same project
  • VMs and internet endpoints

Additionally, the Performance Dashboard for a service project within a Shared VPC network shows data only for the zones within the service project. That is, suppose a VM in zone A and service project A uses the host project to communicate with a VM in zone B and service project B. Measurements about that traffic are not available to either service project or the host project.

Google Cloud latency

Latency metrics are measured by using actual customer traffic between the following:

  • VMs within a single VPC network
  • VMs between peered VPC networks
  • VMs and internet endpoints

Methodology for project and Google Cloud latency

Latency is measured by using TCP packets.

Based on a sample of your actual traffic, latency is calculated as the time that elapses between sending a TCP sequence number (SEQ) and receiving a corresponding ACK that contains the network RTT and TCP stack-related delay. The dashboard shows latency as the median of all relevant measurements.

The latency metric is based on the same data source and sampling methodology as VPC Flow Logs.

The project-specific latency is based on samples from your project. The Google Cloud latency is based on samples from all of Google Cloud.

The global latency metrics are derived from passive sampling of TCP traffic headers, and not through active probing from Google Cloud to internet endpoints.

Latency metric anomalies

Note the following latency metric anomalies:

  • For low rate environments, Network Intelligence Center uses sixty-second probes for latency metrics. Therefore, RTT metrics based on packet sampling might report false high latency levels when TCP-based services return a delayed application-level response. You can usually recognize the false high RTT levels by checking if they correspond with application-level delays.

    Although the TCP-based service responds quickly with an ACK, the sampling misses the ACK and counts a later data response as the closing ACK to a much earlier SEND, which distorts the overall RTT measurement. In these cases, you can disregard the RTT metrics.

  • Sometimes, the project-specific latency data doesn't align with the global latency data. Such misalignment can happen if the global dataset also incorporates other network paths with significantly different latencies relative to the network path used by the specific project.

Metric availability

The Google Cloud latency metric is always available. The per-project latency metric is available only if TCP traffic is around 1000 packets per minute or higher.

Metrics summary table

The following table summarizes the probing methods and protocols used for reporting packet loss and latency metrics.

Packet loss Latency
Probing method Active probing (synthetic VM traffic) Passive probing (actual VM traffic)
Protocol UDP (internal IP address), TCP (external IP address) TCP (internal/external IP addresses)

Latency views

The latency details for the Internet to Google Cloud traffic type are available in three views: Table view, Map view, and Timeline view.

Table view

The Table view shows the median RTT between the selected geographic areas and the regions that contain VM instances in your project. The table includes the following details:

  • Country: The name of the country.
  • Cities: The number of cities. You can view the latency details of each specific city in the country details graph.
  • Destination regions: The number of destination regions with traffic for users from a given country.
  • Median latency: The median RTT between the country and regions in milliseconds.

Map view

The Map view displays the geographic locations (metro areas or cities) and Google Cloud regions.

  • You can view the median latency of specific locations and Google Cloud regions.
  • You can select a Google Cloud region and view the locations with traffic to the selected region.
  • You can view location-specific details in a latency graph in the sidebar.
  • You can search for locations by using the search box in the map.
  • Locations are color-graded in different shades of blue to indicate the ranges of median latency on the map. In the following image, the color of a circle showing a given city on a global map can be one shade of blue. The darker the shade of blue, the greater the latency of that city from a given Google Cloud region.

    Ranges of median latency on the map.
    Ranges of median latency on the map (click to enlarge).

Timeline view

The Timeline view shows the median RTT between the selected geographic areas and Google Cloud regions. It provides the current latency metrics and six weeks worth of historical data. You can use the filters to further aggregate the traffic to city, geographic region, and country levels. You can only view the latency metrics corresponding to specific region-geographic location pairs if there is sufficient Google Cloud traffic for that pair.

Permissions

To access Performance Dashboard data, either through the Google Cloud console or through Monitoring, you must have the monitoring.timeSeries.list permission. This permission is included in the Monitoring roles listed in the following table.

Role name Role ID
Monitoring Viewer roles/monitoring.viewer
Monitoring Editor roles/monitoring.editor
Monitoring Admin roles/monitoring.admin

For information about other roles that include the monitoring.timeSeries.list permission, see Understanding roles.

Change project scope

To make use of an existing metrics scope and monitor multiple Google Cloud projects in a single view, select the scoping project by using the Google Cloud console project picker or the Change Scope button. You can also select a single monitoring project by using these options. For more information, see Performance Dashboard metrics reference.

You can create alerts for high packet loss based on predefined conditions. The alerting policy for packet loss is triggered when the packet loss exceeds 5% for 5 minutes for any region pair.

Alerting gives timely awareness to problems in your cloud applications so that you can resolve the problems quickly. An alerting policy describes the circumstances under which you want to be alerted and how you want to be notified. For more information about creating and managing alerting policies, see Introduction to alerting.

What's next