Performance Dashboard overview

Performance Dashboard gives you visibility into the network performance underlying your Google Cloud project. It provides packet loss and latency metrics for zones where you have virtual machine (VM) instances.

Performance Dashboard provides current data, as well as metrics for the past six weeks. It illustrates metrics using summary charts and heatmap views for zones where you have VMs.

As one example, suppose your project has a VPC network with VMs in zones A and B. In that case, Performance Dashboard would provide data about packet loss and latency between those two zones. For more examples and details about what we measure, see Metrics.

Additionally, Performance Dashboard exports data to Cloud Monitoring. You can use Monitoring to query the data and get access to additional information. For details, see Viewing monitoring metrics.

With these performance-monitoring capabilities, you can distinguish between a problem in your application and a problem in the underlying Google Cloud network. You can also debug historical network performance problems.

Metrics

Performance Dashboard provides two kinds of metrics: packet loss and latency (Round Trip Time, or RTT). To get packet loss metrics, you need a sufficient number of VMs in the project, and to get latency metrics, you need a sufficient amount of traffic. Other than that, Performance Dashboard requires no setup.

The following sections describe both metrics in more detail.

Packet loss

Packet loss metrics show the results of active probing between the following:

  • VMs within a single VPC network.

  • VMs in peered VPC networks, when one or both networks sit within your project. If the peered networks sit in different projects, packet loss is visible in the destination project.

  • VMs in a Shared VPC network that is used by your project. Packet loss between two projects that use a Shared VPC network is visible in the destination service project.

For example, suppose project A includes two VPC networks: network A, which has VMs only in zone A, and network M, which has VMs only in zone M. If those two networks are peered, project A's Performance Dashboard shows packet loss data for the A-M zone pair. If the networks are not peered, Performance Dashboard does not show packet loss data for that zone pair.

On the other hand, suppose that these two networks are not in the same project. That is, suppose network A is part of project A, and network M is part of project M. When the networks are peered, project M's Performance Dashboard would show packet loss data for situations where zone M is the destination zone. (Conversely, when zone A is the destination zone, the packet loss data would be visible only to project A.) If the networks are not peered, neither project's Performance Dashboard shows packet loss data for the zone pair.

The data gathered through all probes is aggregated in Performance Dashboard. That is, Performance Dashboard does not let you isolate data about intra-project packet loss versus other types (such as packet loss related to a peered VPC network in another project). However, you can use Monitoring to drill down and see results that are more granular. For details, see Viewing monitoring metrics.

Performance Dashboard does not send probes over Cloud VPN connections.

Methodology

Performance Dashboard runs workers on the physical hosts that house your VMs. These workers insert and receive probe packets that run on the same network as your traffic. Because the workers run on the physical hosts and not on your VMs, these workers do not consume VM resources, and the traffic is not visible on your VMs.

The probes cover the entire mesh of VMs that can communicate with each other, which is not necessarily the same as your traffic pattern. Therefore, you might see indications of packet loss in Performance Dashboard, but no evidence of packet loss in your application.

For all probed VMs, we try to access the VM both by using its internal IP address and external IP address (if one exists). The probes do not leave Google Cloud, but by using external IP addresses, Performance Dashboard can cover part of the path that would be used by external traffic, such as traffic coming from the internet.

Packet loss for internal IP addresses is measured by using UDP packets, and packet loss for external IP addresses is measured by using TCP packets.

Metric availability and confidence levels

Performance Dashboard probes a subset of all VM-VM pairs in the network. The data gathered is then used to estimate the packet loss that you might experience. Google's confidence in the data depends on the probing rate, and the probing rate depends on the number of VMs you have in each zone, as well as the number of zones where you have VMs deployed. For example, having 10 VMs in each of two zones would generate more confidence than having 10 VMs in each of 10 zones.

All VMs, including those created by Google Kubernetes Engine (GKE), count toward the total number of VMs.

The varying levels of confidence are described in the following table. Lower levels of confidence are flagged in the heatmap with an asterisk or NA.

Level Required number of VMs in each zone What Performance Dashboard shows on the heatmap
95% confidence 10 VMs x the number of zones in the project. For example, if you have 12 zones in your project, you must have 120 VMs in each zone. A measurement without any additional notations
90% confidence 2.5 VMs x the number of zones in the project. For example, if you have 12 zones in your project, you must have 30 VMs in each zone. A measurement without any additional notations
Low confidence A measurement with an asterisk
Not enough probes to have meaningful data NA

Latency

Latency metrics are measured using actual customer traffic between the following:

  • VMs within a single VPC network.

  • VMs between peered VPC networks, if the networks sit in the same project.

Additionally, the Performance Dashboard for a service project within a Shared VPC network shows data only for the zones within the service project. That is, suppose a VM in zone A and service project A uses the host project to communicate with a VM in zone B and service project B. Measurements about that traffic are not available to either service project or the host project.

Performance Dashboard does not show latency data for the following:

  • Traffic between peered VPC networks, if one VPC network is in a different project.

  • Traffic sent through Cloud VPN connections.

Methodology

Latency is measured by using TCP packets.

Based on a sample of your actual traffic, latency is calculated as the time that elapses between sending a TCP sequence number (SEQ) and receiving a corresponding ACK that contains the network RTT and TCP stack-related delay. The UI shows latency as the median of all relevant measurements.

The latency metric is based on the same data source and sampling methodology as VPC Flow Logs.

Metric availability

The latency metric is available only if TCP traffic is around 1000 packets per minute or higher.

Metrics summary table

The following table summarizes the probing methods and protocols used for reporting packet loss and latency metrics.

Packet loss Latency
Probing method Active probing (synthetic VM traffic) Passive probing (actual VM traffic)
Protocol UDP (internal IP), TCP (external IP) TCP (internal/external IP)

Although the preceding table references external IP addresses, the Performance Dashboard UI shows data only about internal traffic. However, you can use Monitoring to find aggregated data about traffic that your VMs receive from external sources.

Permissions

To access Performance Dashboard data, either through the console or through Monitoring, you must have the monitoring.timeSeries.list permission. This permission is included in the Monitoring roles listed in the following table.

Role name Role ID
Monitoring Viewer roles/monitoring.viewer
Monitoring Editor roles/monitoring.editor
Monitoring Admin roles/monitoring.admin

For information about other roles that include the monitoring.timeSeries.list permission, see Understanding roles.

Use cases

The following sections describe various ways that Performance Dashboard can help you.

Current performance diagnostics: Is it the network or the application?

Performance Dashboard gives you live visibility into the network performance underlying your your project. It helps you to determine whether application issues are the result of software or network problems. If you see significant packet loss or high latency, it is possible that a Google Cloud network problem is at least part of the issue. If packet loss and latency look normal, it is likely a problem with the application.

Scenario: Investigate an issue happening right now

You open Performance Dashboard and see a big spike in the Packet loss summary chart within the last hour. Because this chart summarizes packet loss across all zones, you don't yet know where the packet loss took place.

Current packet loss (click to enlarge).
Current packet loss (click to enlarge)

To investigate further, you click the time that the spike took place. Doing so opens a heatmap specific to the selected time.

You can adjust your selection by clicking and dragging your cursor on the time axis of the summary chart.

Click a time to see details for that time (click to enlarge).
Click a time to see details for that time (click to enlarge)

The heatmap displayed shows you data for the time that you selected. The heatmap squares are color-coded. As summarized in the legend to the left of the heatmap, each color reflects a different percentage of packet loss.

Packet loss heatmap for a specific time (click to enlarge).
Packet loss heatmap for a specific time (click to enlarge)

Packet loss is measured in one direction only. A square showing packet loss indicates packet loss from the zone indicated in the source axis to the zone indicated in the destination axis.

To see a chart specifically for a zone pair, you click the purple square for source zone europe-west1-b to destination zone us-central1-a . The details chart keeps your time selection from the previous page, showing the blue pin.

Packet loss for a selected timeframe (click to enlarge).
Packet loss for a selected timeframe (click to enlarge)

You see two lines on the chart, one for each direction of data flow. In this example, the purple line shows packet loss for traffic from source zone europe-west1-b to destination zone us-central1-a. A red line shows the reverse direction, from source zone us-central1-a to destination zone europe-west1-b.

The chart shows that this spike in packet loss is an outlier. You can change the time window of data displayed for this zone pair by clicking the time selector on the top right. You can view up to six weeks of data. In this example, you click 7 days to see the packet loss trend for the selected zone pair.

Packet loss for a 7-day timeframe (click to enlarge).
Packet loss for a 7-day timeframe (click to enlarge)

Historical performance diagnostics

Scenario: Investigate an issue that happened in the recent past

You are investigating an issue with latency that happened earlier this week. You use the historical performance data in the Performance Dashboard to examine the zone in question.

To change the view, you click the Latency tab.

Latency tab (click to enlarge).
Latency tab (click to enlarge)

To adjust the time window of the Latency summary chart, you use the time selector on the top right. In this example, it is set to 1 hour. To see the heatmap for latency at a certain time, you click that time on the chart's time axis.

Select a time (click to enlarge).
Select a time (click to enlarge)

Because there are consistently higher values on the left of the chart, click the time axis to see the heatmap for latency at that time.

Latency heatmap (click to enlarge).
Latency heatmap (click to enlarge)

The bright purple squares in the heatmap show that the latency was 261 milliseconds (ms) between zones asia-east1-b and europe-west2-c. To investigate further, you click a bright purple square. The latency details chart that opens keeps your time selection from the previous page showing the blue pin.

Latency spike (click to enlarge).
Latency spike (click to enlarge)

The blue line shows a spike in latency for traffic traveling from europe-west2-c to asia-east1-b.

To zoom in on the spike, you click and drag your mouse.

Click and drag to zoom (click to enlarge).
Click and drag to zoom (click to enlarge)

You can now see that the spike lasted for two to three minutes, reaching a peak at 8:19 AM.

Latency spike details (click to enlarge).
Latency spike details (click to enlarge)

Data visualization when viewing historical data

When viewing data for a time period of one day or more, the chart provides additional data in a lighter color (a halo) around the primary data. Due to the longer time period, data is aggregated over longer intervals. For example, one hour of data is aggregated at one-minute intervals, and 24 hours of data is aggregated at five-minute intervals. The lighter color surrounding the line shows the range of values, from the lowest to the highest, that were aggregated to draw the primary line.

Aggregated historical data (click to enlarge).
Aggregated historical data (click to enlarge)

What's next