Flow Analyzer overview

Flow Analyzer (preview) lets you quickly and efficiently understand your VPC traffic flows without the need to write complex SQL queries for analyzing VPC Flow Logs. Flow Analyzer lets you perform opinionated network traffic analysis with 5-tuple granularity (source IP, destination IP, source port, destination port, and protocol).

Developed using Log Analytics and powered by BigQuery, Flow Analyzer enables in-depth analysis of inbound and outbound traffic of your VM instances. It lets you monitor, troubleshoot, and optimize your networking deployment for better performance and enhanced security which helps ensure compliance, and save on costs.

Flow Analyzer analyzes VPC Flow Logs data stored in a log bucket (record format). To use Flow Analyzer, you must select a project with a log bucket that contains VPC Flow Logs. For more information, see the VPC Flow Logs overview. VPC Flow Logs can be used for network monitoring, forensics, real-time security analysis, and expense optimization.

Flow Analyzer runs queries on the fields included in VPC Flow Logs. For more information, see Key properties of VPC Flow Logs.

Using Flow Analyzer, you can perform the following tasks:

  • Build and run a simple query on VPC Flow Logs
  • Build a SQL filter (using a WHERE statement) for the query on VPC Flow Logs
  • Organize the results using selected fields and sort the query results using the total traffic and aggregate packets
  • View the traffic at chosen time intervals
  • View the top five highest traffic flows over time in a graphical format, when compared with the rest of the traffic
  • View the resources with highest traffic aggregated over the selected duration in a tabular format
  • View the details of the traffic between a specific source and destination pair from the query results
  • Drill down the query results using the remaining fields available in VPC Flow Logs

How it works

VPC Flow Logs records a sample of network flows sent from and received by VPC resources, such as VM instances and Google Kubernetes Engine nodes.

The flow logs can be viewed in Cloud Logging, and can be exported to any destination that Logging export supports. You can use Log Analytics to run queries that analyze log data, and then you can display the query results in the form of charts and tables.

Flow Analyzer uses Log Analytics to let you run queries on VPC Flow Logs and learn more about the traffic flows by providing information such as the highest data flows chart and a table that provides details about all the data flows.

Query components

To analyze and understand your traffic flows, you must run a query on VPC Flow Logs. Flow Analyzer helps you build the query, customize the display options, and drill down to view and monitor your traffic flows.

Traffic aggregation

To analyze VPC traffic flows, you must determine the aggregation approach to filter the flows between the resources. Flow Analyzer organizes the flow logs for aggregation in the following ways:

  • Source and destination: this option uses the SRC and DEST information included in VPC Flow Logs. This view aggregates the traffic from source to destination.
  • Client and server: this option tries to find the initiator of the connection. A resource with the smaller port number is considered the server. It also considers the resources with gke_service definition as the servers because services don't initiate requests. This view aggregates the traffic in both directions.

Time-range selector

The default time range is one hour, but you can select from preset time options, specify a custom start and end time, or center the time range around a specific timestamp by using the time-range selector. For example, if you want to view the data for the past week, then select Last 1 week from the time-range selector.

You can also set your time zone preferences by using the time-range selector.

Basic filters

You can build the query by organizing the flows according to the resources in both directions.

To use the filters, select the fields from the list and specify values for these fields.

You can add multiple filter expressions to filter flows that match the selected key-value pairs. If you select more filters for the same field, an OR operator is used. If you select filters for different fields, an AND operator is used.

For example if you select two IP address values: 1.2.3.4 and 10.20.10.30 and two Country values: US and France, the following filter logic is applied to the query:

(IP=1.2.3.4 OR IP=10.20.10.30) AND (Country=US OR Country=France)

If you try to modify the endpoint filters or change the traffic options, the results might vary. You must again run the query to view the updated results.

To build and run the query using the basic filters, see Build and run the query.

SQL filters

To build complex queries, you can use SQL filters. Using complex queries, you can perform tasks such as the following:

  1. Comparing field values with each other
  2. Building complex boolean logic using AND/OR and nested OR operations
  3. Performing complex operations on IP addresses using BigQuery functions

The SQL filter queries use BigQuery SQL syntax. For more information, see the BigQuery SQL syntax.

To view filter expression syntax and examples, click Filter expression syntax and examples.

To build and run the query using SQL filters, see Build and run a SQL query.

Query results

The query results include the following components:

  • Highest data flows chart: displays the top five highest traffic flows over time along with the rest of the traffic. You can spot trends like traffic spikes using this chart.
  • All data flows table: shows the top traffic flows up to 10,000 rows aggregated over the selected duration. This table displays the fields selected for organizing the flows while defining the filters for the query.

Display options

After running the query, you can further refine the results by using the various display options. Both the chart and the table get updated to reflect the newly selected options. To select the custom options and run the query, see Customize display options.

Metric types

You can choose to view one of the following metric types.

  • Bytes sent: contains information about the payload volumes and doesn't include headers. This metric value can be zero because some packets have only headers and don't include any payload.

  • Packets sent: indicates the number of packets sent from the source to the destination.

For both the metric types, you can choose additional metric aggregations.

Metric aggregation

You can view metric aggregation in the following ways.

If you select Bytes sent as the metric and Source and destination as the traffic aggregation, the following options are available:

  • Total traffic: this is always enabled by default and shows the total traffic for the chosen time period.
  • Average traffic rate: shows the average traffic rate (in bytes per second) for the chosen time period, calculated only for the alignment periods during which the traffic was observed. For more information, see Alignment period.
  • Median traffic rate: displays the median traffic rate (in bytes per second) for the chosen time period, calculated only for the alignment periods during which the traffic was observed. For more information, see Alignment period.
  • P95 traffic rate: shows the 95th percentile traffic rate in bytes per second for the chosen time period, calculated only for the alignment periods during which the traffic was observed. For more information, see Alignment period.
  • Maximum traffic rate: shows the maximum traffic rate in bytes per second for the chosen time period.

If you select Packets sent as the metric and Source and destination as the traffic aggregation, the following options are available:

  • Aggregate packets: shows the count of packets sent for the chosen time period. Enabled by default.
  • Average packets rate: displays the average packets rate for the chosen time period, calculated only for the alignment periods during which the traffic was observed. For more information, see Alignment period.
  • Median packets rate: displays the median packets rate for the chosen time period, calculated only for the alignment periods during which the traffic was observed. For more information, see Alignment period.
  • P95 packets rate: shows the 95th percentile packets rate for the chosen time period, calculated only for the alignment periods during which the traffic was observed. For more information, see Alignment period.
  • Maximum packets rate: shows the maximum packets rate for the chosen time period.

Alignment period

You can choose from 5 seconds to 1 day for the time range of the details in the chart. Automatic mode selects the optimal alignment period depending on the length of the selected period.

Every point on the timeline represents aggregated data for a specific time period. The length of this period is called the alignment period.

The performance declines with the decrease of the value of the alignment period. For higher values of the alignment period, the chart becomes less granular. You might not be able to view short spikes with higher values.

For large time durations, a smaller alignment period is not helpful. For example, if you select 1 minute alignment for a 30 day period, Flow Analyzer generates more than 43,000 data points. Because that's 10 times more than the 4k display pixels, you'll not be able to view all the details and some options are disabled for large time durations.

For more information about how the sampling is done and alignment period is determined to display the query results, see Metrics and alignment period.

Sampling point

For VM-to-VM network communication, flow logs are available (with sampling applied) at both the VMs that send and receive traffic. If both the endpoint VMs are in subnets that have VPC Flow Logs enabled, the same flow gets reported twice. You can choose one of the following four approaches to determine which VPC Flow Logs contribute to the computed metrics and how they are evaluated:

  • Source endpoint: the number of bytes sent or packets sent reported at the source endpoint of a flow
  • Destination endpoint: the number of bytes sent or packets sent reported at the destination endpoint of a flow
  • Sum of source and destination endpoint: the sum of bytes sent or packets sent reported by both endpoints of a flow
  • Average of source and destination endpoint: an average of bytes sent or packets sent reported by both endpoints of a flow if both the source and the destination information are available in VPC Flow Logs

Traffic deduplication

To prevent traffic reported at the source and destination VMs from being counted twice, you can choose the Average of source and destination endpoint sampling option. Flow Analyzer identifies equivalent flows within each alignment period and calculates the averages of the reported metrics values (bytes count and packets count).

For alignment periods where equivalent flows are reported at both SRC and DEST, all traffic attributed to a given alignment period is divided by two.

View flow details

In the All data flows table, click Show details for any flow. The Flow details panel appears. This panel provides information such as the source, destination, traffic and possible drill down options.

You can drill down by splitting a selected traffic flow using an extra field. For example, if a flow includes generic details about 1,000 GiB traffic from Google Cloud zone X to zone Y, you can drill down using another field such as the source IP address. The results include several IP addresses that make up the original flow.

The fields that appear in the drill down component are selected as follows:

  • When you access the flow details, Flow Analyzer runs several queries. Each query tries to drill down the selected flow using the fields available in the VPC Flow Logs and not yet used in the original query. For example, if the executed query already includes the IP address details, you don't need to run the query with this field again and can't drill down using this field.
  • If any of the additional queries return a single field value, it gets added to the source and destination details section even though it is not fetched earlier.
  • If any of the query results include more than one field value, the corresponding field appears in the drill down list.

As you select a field in the drill down list, the drill down table and the chart get updated to display the top three traffic flows.

You can also use the Compare to past toggle. Select this feature to view six lines: three solid lines for the three top talkers from the drill down and three dashed lines in corresponding colors representing the past traffic.

To drill down traffic flows using more fields, see Drill down traffic flows.

Explore in Log Analytics

You can view the raw SQL query in Log Analytics.

For advanced analysis, you can directly modify SQL code used to visualize the traffic. The Explore in Log Analytics feature directs you to the Log Analytics page with a prefilled query.

What's next