Deploy network monitoring and telemetry capabilities in Google Cloud

Last reviewed 2024-02-13 UTC

Network telemetry collects network traffic data from devices on your network so that the data can be analyzed. Network telemetry lets security operations teams detect network-based threats and hunt for advanced adversaries, which is essential for autonomic security operations. To obtain network telemetry, you need to capture and store network data. This blueprint describes how you can use Packet Mirroring and Zeek to capture network data in Google Cloud.

This blueprint is intended for security analysts and network administrators who want to mirror network traffic, store this data, and forward it for analysis. This blueprint assumes that you have working knowledge of networking and network monitoring.

This blueprint is part of a security blueprint that's made up of the following:

A GitHub repository that contains a set of Terraform configurations and scripts.
A guide to the architecture, design, and security controls that you implement with the blueprint (this document).

Using this blueprint, you capture network packets (including network metadata) using Packet Mirroring, transform the network packets into Zeek logs, and then store them in Cloud Logging. The blueprint extracts metadata such as IP addresses, ports, protocols, and Layer 7 headers and requests. Storing network metadata as Zeek logs uses less data volume than storing raw packet data and is therefore more cost effective.

This document assumes that you have already configured a foundational set of security controls, as described in the Google Cloud enterprise foundations guide.

Supported use cases

This blueprint supports the following use cases:

Your security operations center (SOC) requires access to Google Cloud network log data in a centralized location so that they can investigate security incidents. This blueprint translates network packet data into logs that you can forward to your analysis and investigation tools. Analysis and investigation tools include BigQuery, Google Security Operations, Flowmon, ExtraHop, or security information and event management (SIEM).
Your security teams require visibility into Google Cloud networks to perform threat hunting using tools such as Google SecOps. You can use this blueprint to create a pipeline for Google Cloud network traffic.
You want to demonstrate how your organization meets compliance requirements for network detection and response. For example, your organization must demonstrate compliance with Memorandum M-21-31 from the United States Office of Management and Budget (OMB).
Your network security analysts require long-term network log data. This blueprint supports both long-term monitoring and on-demand monitoring.

If you also require packet capture (pcap) data, you need to use network protocol analyzer tools (for example, Wireshark or tcpdump). The use of network protocol analyzer tools is not in the scope of this blueprint.

You can't deploy this blueprint with Cloud Intrusion Detection System. Both this solution and Cloud Intrusion Detection System use Packet Mirroring policies, and these policies can only be used by one service at a time.

Costs

This blueprint can affect your costs because you are adding computing resources and storing significant amounts of data in Cloud Logging. Consider the following when you deploy the blueprint:

Each collector virtual machine (VM) in Compute Engine runs as an e2-medium instance.
You can control storage costs with the following:
- Using Packet Mirroring filters.
- Not mirroring across zones to avoid inter-zonal egress charges.
- Storing data only as long as required by your organization.

You can use the Pricing Calculator to get an estimate for your computing, logging, and storage costs.

Architecture

The following architectural diagram shows the infrastructure that you use this blueprint to implement:

The network telemetry architecture.

The architecture shown in the preceding image uses a combination of the following Google Cloud services and features:

Two Virtual Private Cloud (VPC) networks:
- A Virtual Private Cloud network for the mirrored sources.
- A VPC network for the collector instances.
These VPC networks must be in the same project.
Compute Engine or Google Kubernetes Engine (GKE) instances (called the mirrored sources) in specific regions and subnets that are the source for the network packets. You identify which instances to mirror sources using one of the following methods:
- Network tags
- Compute instance names
- Subnet name
Compute Engine instances that function as the collector instances behind an internal passthrough Network Load Balancer, in the same region as the mirrored sources. These instances run the Zeek-Fluentd Golden Image or your custom zeek-fluentd image. The VMs are e2-medium and the supported throughput is 4 Gbps.
An internal passthrough Network Load Balancer that receives packets from the mirrored sources and forwards them to the collector instances for processing. The forwarding rule for the load balancer uses the --is-mirroring-collector flag.
VPC firewall rules that permit the following:
- Egress from mirrored sources to the internal passthrough Network Load Balancer.
- Ingress from the collector instances to the mirrored instances.
A Packet Mirroring policy that defines the region, subnet, mirrored instances, protocols, direction, and forwarding rule. Each region requires its own Packet Mirroring policy.
VPC Network Peering to permit connectivity using internal IP addresses between highly available Compute Engine VMs across multiple regions. VPC Network Peering allows the mirrored sources to communicate with the internal passthrough Network Load Balancer.
A Cloud Logging instance that collects all the packets for storage and retrieval by an analysis and investigation tool.

Understand the security controls that you need

This section discusses the security controls within Google Cloud that you can use to help secure the different components of the network monitoring architecture.

VPC network security controls

You create VPC networks around your mirrored sources and your collectors. When you create the VPC network for the collectors, you delete the system-generated default route, which means that all default internet gateway routes are turned off. Turning off default internet gateways helps reduce your network attack surface from external threat attackers.

You create subnets in your VPC network for each region. Subnets let you control the flow of traffic between your workloads on Google Cloud and also from external sources. The subnets have Private Google Access enabled. Private Google Access also helps reduce your network attack surface, while permitting VMs to communicate to Google APIs and services.

To permit communication between the VPC networks, you enable VPC Network Peering. VPC Network Peering uses subnet routes for internal IP address connectivity. You import and export custom routes to allow a direct connection between the mirrored sources and the collectors. You must restrict all communication to regional routes because the internal passthrough Network Load Balancer for the collectors doesn't support global routes.

Firewall rules

You use firewall rules to define the connections that the mirrored sources and collectors can make. You set up an ingress rule to allow for regular uptime health checks, an egress rule for all protocols on the mirrored sources, and an ingress rule for all protocols on the collectors.

Collector VM security controls

The collector VMs are responsible for receiving the packet data. The collector VMs are identical VMs that operate as managed instance groups (MIGs). You turn on health checks to permit automatic recreation of an unresponsive VM. In addition, you allow the collectors to autoscale based on your usage requirements.

Each collector VM runs the zeek-fluentd Packer image. This image consists of Zeek, which generates the logs, and Fluentd, which forwards the logs to Cloud Logging. After you deploy the Terraform module, you can update the VM OS and Zeek packages and apply the security controls that are required for your organization.

Internal load balancer security controls

The internal passthrough Network Load Balancer directs network packet traffic from the mirrored sources to the collector VMs for processing. All the collector VMs must run in the same region as the internal passthrough Network Load Balancer.

The forwarding rule for the internal passthrough Network Load Balancer defines that access is possible from all ports, but global access isn't allowed. In addition, the forwarding rule defines this load balancer as a mirroring collector, using the --is-mirroring-collector flag.

You don't need to set up a load balancer for storage, as each collector VM directly uploads logs to Cloud Logging.

Packet Mirroring

Packet Mirroring requires you to identify the instances that you want to mirror. You can identify the instances that you want to mirror using network tags, instance names, or the subnet that the instances are located in. In addition, you can further filter traffic by using one or more of the following:

Layer 4 protocols, such as TCP, UDP, or ICMP.
IPv4 CIDR ranges in the IP headers, such as 10.0.0.0/8.
Direction of the traffic that you want to mirror, such as ingress, egress, or both.

Service accounts and access controls

Service accounts are identities that Google Cloud can use to run API requests on your behalf. Service accounts ensure that user identities don't have direct access to services.

To deploy the Terraform code, you must impersonate a service account that has the following roles in the project:

The collector VMs also require this service account so that they can authenticate to Google Cloud services, get the network packets, and forward them to Cloud Logging.

Data retention practices

You can specify how long Cloud Logging stores your network logs using retention rules for your log buckets. To determine how long to store the data, review your organization's regulatory requirements.

Logging and auditing

You can use Cloud Monitoring to analyze the performance of the collector VMs and set up alerts for uptime checks and performance conditions such as CPU load.

You can track administrator access or changes to the data and configuration using Cloud Audit Logs. Audit logging is supported by Compute Engine, Cloud Load Balancing, and Cloud Logging.

You can export monitoring information as follows:

To Google SecOps for additional analysis. For more information, see Ingesting Google Cloud Logs in to Google SecOps.
To a third-party SIEM, using Pub/Sub and Dataflow. For more information, see Export Google Cloud security data to your SIEM system.

Bringing it all together

To implement the architecture described in this document, do the following:

Deploy a secure baseline in Google Cloud, as described in the Google Cloud enterprise foundations blueprint. If you choose not to deploy the enterprise foundations blueprint, ensure that your environment has a similar security baseline in place.
Review the Readme for the blueprint and ensure that you meet all the prerequisites.
In your testing environment, deploy one of the example network telemetry configurations to see the blueprint in action. As part of your testing process, do the following:
1. Verify that the Packet Mirroring policies and subnets were created.
2. Verify that you have the Logs Viewer (roles/logging.viewer) role and run a curl command to view your log data. For example:
  
  curl http://example.com/
  
  You should see that log data is stored in Cloud Logging.
3. Use Security Command Center to scan the newly created resources against your compliance requirements.
4. Verify that your system is capturing and storing the appropriate network packets, and fine-tune the performance as necessary.
Deploy the blueprint into your production environment.
Connect Cloud Logging to your SIEM or Google SecOps so that your SOC and network security analysts can incorporate the new telemetry into their dashboards.

What's next

Work through the blueprint.
Read about When to use five telemetry types in security threat monitoring.
Read about Leveraging Network Telemetry in Google Cloud.
Read about transforming your SOC using autonomic security operations.