Configure VPC Service Controls

Cloud Composer 1 | Cloud Composer 2

VPC Service Controls enable organizations to define a perimeter around Google Cloud resources to mitigate data exfiltration risks.

Cloud Composer environments can be deployed within a service perimeter. By configuring your environment with VPC Service Controls, you can keep sensitive data private while taking advantage of the fully-managed workflow orchestration capabilities of Cloud Composer.

VPC Service Controls support for Cloud Composer means that:

  • Cloud Composer can now be selected as a secured service inside a VPC Service Controls perimeter.
  • All underlying resources used by Cloud Composer are configured to support VPC Service Controls architecture and follow its rules.

Deploying Cloud Composer environments with VPC Service Controls gives you:

  • Reduced risk of data exfiltration.
  • Protection against data exposure due to misconfigured access controls.
  • Reduced risk of malicious users copying data to unauthorized Google Cloud resources, or external attackers accessing Google Cloud resources from the internet.

Airflow web server in VPC Service Controls mode

In VPC Service Controls mode, Cloud Composer runs two instances of the Airflow web server. Identity-Aware Proxy load balances user traffic between these instances. Airflow web servers run in "read-only" mode, which means:

  • DAG Serialization is enabled. As a result, Airflow web server does not parse DAG definition files.

  • Pluings are not synced to the web server, so you cannot modify or extend the web server functionality with plugins.

  • The Airflow web server uses a container image that is pre-built by the Cloud Composer service. If you install PyPI images in your environment, these images are not installed on the web server container image.

We recommend to protect access to the Airflow web server with Network ACLs. You can specify the IP ranges that can access the Airflow web server for a new or for an existing environment.

Creating a service perimeter

See Creating a service perimeter to learn how to create and configure service perimeters. Make sure to select Cloud Composer as one of the services secured within the perimeter.

Creating environments in a perimeter

There are additional steps required to deploy Cloud Composer inside a perimeter. When creating your Cloud Composer environment:

  1. Enable Access Context Manager API and Cloud Composer API for your project. See Enabling APIs for reference.

  2. Add the following services to the perimeter for maximum protection of your environment: Cloud SQL, Pub/Sub, Monitoring, Cloud Storage, GKE, Container Registry, Artifact Registry, and Compute Engine.

  3. Use version composer-1.10.4 or later.

  4. Make sure that DAGs serialization is enabled. If your environment uses Cloud Composer version 1.15.0 and later, the serialization is enabled by default.

  5. Create a new Cloud Composer environment with Private IP enabled. Note that this setting must be configured during the environment creation.

  6. When creating your environment, remember to configure access to the Airflow web server. For maximum protection, only allow access to the web server from specific IP ranges. For details, see the "Configure web server network access" step in Creating a new environment.

Configuring existing environments with VPC Service Controls

You can add the project containing your environment to the perimeter if:

Installing PyPI packages

In the default VPC Service Controls configuration, Cloud Composer only supports installing PyPI packages from private repositories that are reachable from the private IP address space of the VPC network. The recommended configuration for this process is to set up a private PyPI repository, populate it with vetted packages used by your organization, then configure Cloud Composer to install Python dependencies from a private repository.

It's also possible to install PyPI packages from repositories outside the private IP space. Follow these steps:

  1. Configure Cloud NAT to allow Cloud Composer running in the private IP space to connect with external PyPI repositories.
  2. Configure your firewall rules to allow outbound connections from the Composer cluster to the repository.

When using this setup, make sure you understand the risks of using external repositories. Be sure that you trust the content and integrity of any external repositories, because these connections could potentially be used as an exfiltration vector.

Network configuration checklist

Your VPC network must be configured properly to create Cloud Composer environments inside a perimeter. Make sure to follow the configuration requirements listed below.

Firewall rules

Navigate to the VPC network -> Firewall section in Cloud Console, and verify that the following firewall rules are configured.

  • Configure DNS service in your VPC as described in VPC Service Controls support for Cloud DNS. As an alternative, you can allow egress from GKE Node IP range to anywhere on port 53.

  • Allow egress from GKE Node IP range to GKE Node IP range, all ports.

  • Allow egress from GKE Node IP range to GKE Pods IP range, all ports.

  • Allow egress from GKE Node IP range to GKE Master IP range, all ports.

  • Allow egress from GKE Node IP range to 199.36.153.4/30, port 443 (restricted.googleapis.com).

  • Allow ingress from GCP Health Checks 130.211.0.0/22,35.191.0.0/16 to the Node IP range. TCP Ports 80 and 443.

  • Allow egress from the Node IP range to GCP Health Checks. TCP ports 80 and 443.

  • Allow egress from GKE Node IP range to Web server IP range, TCP ports 3306 and 3307.

See Using firewall rules to learn how to check, add, and update rules for your VPC network.

Connectivity to the restricted.googleapis.com endpoint

Configure connectivity to the restricted.googleapis.com endpoint:

  • Verify the existence of a DNS mapping from *.googleapis.com to restricted.googleapis.com.

  • DNS *.gcr.io should resolve to 199.36.153.4/30 similarly to the googleapis.com endpoint. To do that, create a new zone as: CNAME *.gcr.io -> gcr.io. A gcr.io. -> 199.36.153.4, 199.36.153.5, 199.36.153.6, 199.36.153.7.

  • DNS *.pkg.dev should resolve to 199.36.153.4/30 similarly to the googleapis.com endpoint. To do that, create a new zone as: CNAME *.pkg.dev -> pkg.dev. A pkg.dev. -> 199.36.153.4, 199.36.153.5, 199.36.153.6, 199.36.153.7.

For more information, see Setting up private connectivity to Google APIs and services.

Limitations

  • Displaying a rendered template with functions in the web UI with DAG serialization enabled is supported for environments running Cloud Composer version 1.12.0 or later and Airflow version 1.10.9 or later.

  • Setting the async_dagbag_loader flag to True is not supported while DAG serialization is enabled.

  • Enabling DAG serialization disables all Airflow web server plugins, as they could risk the security of the VPC network where Cloud Composer is deployed. This doesn't impact the behaviour of scheduler or worker plugins, including Airflow operators, sensors etc.

  • When Cloud Composer is running inside a perimeter, access to public PyPI repositories is restricted. See Installing Python dependencies to learn how to install PyPI modules in Private IP mode.