Auditing network performance

Assume that you're a network administrator who supports a network that includes several load-balanced applications. You've been asked to audit the network configurations that support those applications to ensure that the configurations are consistent with the expected state of your network. By doing this audit, you can ensure that customers are getting the lowest possible latency to your applications.

The following use case demonstrates how Network Topology can help you check your existing configurations. For example, you can check that all client requests are being served by application instances from the Google Cloud region that's closest to the client. You can also ensure that cross-region traffic is low because that traffic comes from databases that replicate data globally.

Topology overview

The deployment spans three Google Cloud regions (us-central1, europe-west1, and asia-east1). All external client requests are served by a single external HTTP(S) load balancer that has multiple backends in each of the three regions. Client requests that come from one of three business regions (Americas, EMEA, and APAC) are served by application instances in the closest Google Cloud region.

The following graph shows the top-level hierarchy for the deployment.

Resources and traffic paths

In this example, the project contains the following Google Cloud resources:

  • 1 HTTPS load balancer

  • 4 backend services: browse, shopping_cart, checkout, and feeds

  • 12 instance groups (which are the load balancer's backends)

    There is one instance group for each backend service in each of the three regions.

  • 3 database instances, one in each region

You expect that traffic from certain countries goes to the following locations:

  • Traffic from countries in the Americas business region goes to backends in the us-central1 region. For example, traffic from an external client in Canada travels through the load balancer to the checkout backend in the us-central1 region.
  • Traffic from countries in the EMEA business region goes to backends in the europe-west1 region. For example, traffic from an external client in Poland travels through the load balancer to the checkout backend in the europe-west1 region.
  • Traffic from countries in the APAC business region goes to backends in the asia-east1 region. For example, traffic from an external client in Japan travels through the load balancer to the checkout backend in the asia-east1 region.
  • Traffic to a database instance comes from a backend in the same region. For example, the backends in asia-east1 send data only to the database instance in asia-east1.
  • Cross-region traffic is limited to database replication. For example, traffic between us-central1 and europe-west1 travels only between database instances in those regions.

Unexpected traffic flow

In this scenario, you discover that traffic from the EMEA business region is now going to two different Google Cloud regions, us-central1 and europe-west1. By using Network Topology, you discover that one of the backends is overutilized.

  1. You want to check that external traffic that is going through the load balancer eventually goes to the correct Google Cloud region. You filter the graph to show only the traffic for your external load balancer shopping-site-lb.

    After you apply the filter, Network Topology shows only the connections related to the load balancer, as shown in the following example.

  2. You hold the pointer over each business region to highlight the communication to that region.

    When you hold the pointer over Americas and APAC, you see traffic going to the nearest Google Cloud region: us-central1 and asia- east1 respectively. However, when you hold the pointer over EMEA, you see traffic going to us-central1 and europe-west1. Ideally, to reduce latency, all traffic from EMEA should go to europe-west1.

  3. Next, you click EMEA to study the throughput between it and the Google Cloud regions. Network Topology overlays bandwidth values on each connection. You see that about 10 MBps is going to us-central1 and 90 MBps is going to europe-west1. You know that most of the traffic is being directed as you would expect, but some traffic is flowing to us-central1.

    1The figure is for reference. Its data doesn't reflect the use case.

  4. To investigate further, you expand us-central1 to view where traffic is going. Because there's only one network with a single subnet in that region, Network Topology doesn't show those levels of the hierarchy and skips to the instance groups.

    You see that traffic is going to an instance group that's associated with the load balancer's backend service. Because it's a relatively small amount of traffic going to europe-west1, it's possible that resources in europe-west1 are overutilized and causing traffic to overflow to us-central1.

    1The figure is for reference. Its data doesn't reflect the use case.

  5. To confirm your conclusion, you expand the us-central1 region until you reach the instance that is associated with the same load balancer's backend service. Network Topology shows time-series charts in the details pane for the instance.

    In the chart, you notice that the CPU utilization rate is at 82% for the instance. The threshold for this example is 80%, indicating that the instance is oversubscribed. You resolve this issue by scaling up the instance group so that traffic returns to the ideal flow.

    1The figure is for reference. Its data doesn't reflect the use case.

Inter-region traffic

In the following section, you check that internal traffic between regions is limited to only database instance traffic.

  1. To focus on internal traffic, you clear the External clients and Load balancers checkboxes from the Entities list. Because you are only viewing traffic within your application, you don't need to see external clients and external load balancer traffic.

  2. You expand the asia-east1 region, and you see five instance groups. They are not aggregated by network, subnet, zone, or infrastructure segment because they are all in the same network, subnet, and so on.

    You notice that only one instance group (db-group-asia) contains a path for inter-region traffic. All other instance groups are communicating within the region.

    You continue to expand the db-group-asia group until you reach the base entity. In this scenario, the base entity is a virtual machine (VM) instance (db-instance-asia) that acts as a database server. It's communicating with other regions to replicate data, which is what you expected, so no further investigations are required.

    1The figure is for reference. Its data doesn't reflect the use case.

What's next