Configure VPC Service Controls

Cloud Composer 1 | Cloud Composer 2

VPC Service Controls enable organizations to define a perimeter around Google Cloud resources to mitigate data exfiltration risks.

Cloud Composer environments can be deployed within a service perimeter. By configuring your environment with VPC Service Controls, you can keep sensitive data private while taking advantage of the fully-managed workflow orchestration capabilities of Cloud Composer.

VPC Service Controls support for Cloud Composer means that:

  • Cloud Composer can now be selected as a secured service inside a VPC Service Controls perimeter.
  • All underlying resources used by Cloud Composer are configured to support VPC Service Controls architecture and follow its rules.

Deploying Cloud Composer environments with VPC Service Controls gives you:

  • Reduced risk of data exfiltration.
  • Protection against data exposure due to misconfigured access controls.
  • Reduced risk of malicious users copying data to unauthorized Google Cloud resources, or external attackers accessing Google Cloud resources from the internet.

Airflow web server in VPC Service Controls mode

In VPC Service Controls mode, Cloud Composer runs two instances of the Airflow web server. Identity-Aware Proxy load balances user traffic between these instances. Airflow web servers run in "read-only" mode, which means:

  • DAG Serialization is enabled. As a result, Airflow web server does not parse DAG definition files.

  • Plugins are not synced to the web server, so you cannot modify or extend the web server functionality with plugins.

  • The Airflow web server uses a container image that is pre-built by the Cloud Composer service. If you install PyPI images in your environment, these images are not installed on the web server container image.

Creating a service perimeter

See Creating a service perimeter to learn how to create and configure service perimeters. Make sure to select Cloud Composer as one of the services secured within the perimeter.

Creating environments in a perimeter

There are additional steps required to deploy Cloud Composer inside a perimeter. When creating your Cloud Composer environment:

  1. Enable Access Context Manager API and Cloud Composer API for your project. See Enabling APIs for reference.

  2. Make sure your service perimeter has the following VPC accessible services, otherwise your environment may fail to create:

    • Cloud Composer API (composer.googleapis.com)
    • Compute Engine API (compute.googleapis.com)
    • Kubernetes Engine API (container.googleapis.com)
    • Container Registry API (containerregistry.googleapis.com)
    • Artifact Registry API (artifactregistry.googleapis.com)
    • Cloud Storage API (storage.googleapis.com)
    • Cloud SQL Admin API (sqladmin.googleapis.com)
    • Cloud Build API (cloudbuild.googleapis.com)
    • Cloud Logging API (logging.googleapis.com)
    • Cloud Monitoring API (monitoring.googleapis.com)
    • Cloud Pub/Sub API (pubsub.googleapis.com)
    • Cloud Cloud Resource Manager API (cloudresourcemanager.googleapis.com)
    • Service Directory API (servicedirectory.googleapis.com)
    • Cloud Key Management Service API (cloudkms.googleapis.com), if you are using Cloud KMS or CMEK keys
    • Secret Manager API (secretmanager.googleapis.com), if you are using Secret Manager as a secret backend
  3. Use version composer-1.10.4 or later.

  4. Make sure that DAGs serialization is enabled. If your environment uses Cloud Composer version 1.15.0 and later, the serialization is enabled by default.

  5. Create a new Cloud Composer environment with Private IP enabled. Note that this setting must be configured during the environment creation.

  6. When creating your environment, remember to configure access to the Airflow web server. For maximum protection, only allow access to the web server from specific IP ranges. For details, see Configure web server network access.

Configuring existing environments with VPC Service Controls

You can add the project containing your environment to the perimeter if:

Installing PyPI packages

In the default VPC Service Controls configuration, Cloud Composer only supports installing PyPI packages from private repositories that are reachable from the private IP address space of the VPC network. The recommended configuration for this process is to set up a private PyPI repository, populate it with vetted packages used by your organization, then configure Cloud Composer to install Python dependencies from a private repository.

It's also possible to install PyPI packages from repositories outside the private IP space. Follow these steps:

  1. Configure Cloud NAT to allow Cloud Composer running in the private IP space to connect with external PyPI repositories.
  2. Configure your firewall rules to allow outbound connections from the Composer cluster to the repository.

When using this setup, make sure you understand the risks of using external repositories. Be sure that you trust the content and integrity of any external repositories, because these connections could potentially be used as an exfiltration vector.

Configure connectivity to Google APIs and services

In a VPC Service Controls configuration, to control network traffic, configure access to Google APIs and services through the restricted.googleapis.com. This domain blocks access to Google APIs and services that do not support VPC Service Controls.

Cloud Composer environments use the following domains:

  • *.googleapis.com is used to access other Google services.

  • *.pkg.dev is used to get environment images, such as when creating or updating an environment.

  • *.gcr.io GKE requires connectivity to Container Registry domain regardless of Cloud Composer version.

Configure connectivity to the restricted.googleapis.com endpoint:

Domain DNS name CNAME Record A Record
*.googleapis.com googleapis.com. DNS Name: *.googleapis.com.
Resource record type: CNAME
Canonical name: googleapis.com.
Resource record type: A
IPv4 addresses: 199.36.153.4, 199.36.153.5, 199.36.153.6, 199.36.153.7
*.pkg.dev pkg.dev. DNS Name: *.pkg.dev.
Resource record type: CNAME
Canonical name: pkg.dev.
Resource record type: A
IPv4 addresses: 199.36.153.4, 199.36.153.5, 199.36.153.6, 199.36.153.7
*.gcr.io gcr.io. DNS Name: *.gcr.io.
Resource record type: CNAME
Canonical name: gcr.io.
Resource record type: A
IPv4 addresses: 199.36.153.4, 199.36.153.5, 199.36.153.6, 199.36.153.7

To create a DNS rule:

  1. Create a new DNS zone and use DNS name as DNS name of this zone.

    Example: pkg.dev.

  2. Add a record set for CNAME Record.

    Example:

    • DNS Name: *.pkg.dev.
    • Resource record type: CNAME
    • Canonical name: pkg.dev.
  3. Add a record set with for A Record:

    Example:

    • Resource record type: A
    • IPv4 addresses: 199.36.153.4, 199.36.153.5, 199.36.153.6, 199.36.153.7

For more information, see Setting up private connectivity to Google APIs and services.

Configure firewall rules

If your project has non-default firewall rules, such as rules that override implied firewall rules, or modify pre-populated rules in the default network, then verify that the following firewall rules are configured.

For example, Cloud Composer might fail to create an environment if you have a firewall rule that denies all egress traffic. To avoid issues, define selective allow rules that follow the list and have higher priority than the global deny rule.

Configure your VPC network to allow traffic from your environment:

  • See Using firewall rules to learn how to check, add and update rules for your VPC network.
  • Use Connectivity Tool to validate the connectivity between IP ranges.
  • You can use networking tags to further limit access. You can set these tags when you create an environment.
Description Direction Action Source or Destination Protocols Ports
DNS

Configure as described in VPC Service Controls support for Cloud DNS
- - - - -
Google APIs and services Egress Allow IPv4 addresses of restricted.googleapis.com that you use for Google APIs and services. TCP 443
Environment's cluster Nodes Egress Allow Environment's subnetwork primary IP address range TCP, UDP all
Environment's cluster Pods Egress Allow Secondary IP address range for Pods in the environment's subnetwork TCP, UDP all
Environment's cluster Control Plane Egress Allow GKE Control Plane IP range TCP, UDP all
Web server Egress Allow Web server network IP range TCP 3306, 3307

To obtain IP ranges:

  • Pod, Service, and Control Plane address ranges are available on the Clusters page of your environment's cluster:

    1. In Google Cloud console, go to the Environments page.

      Go to Environments

    2. In the list of environments, click the name of your environment. The Environment details page opens.

    3. Go to the Environment configuration tab.

    4. Follow the view cluster details link.

  • You can see environment's web server IP range on the Environment configuration tab.

  • You can see environment's network ID on the Environment configuration tab. To get IP ranges for a subnetwork, go to VPC Networks page and click on the network's name to see details:

    Go to VPC Networks

VPC Service Controls logs

When troubleshooting environment creation issues, you can analyze audit logs generated by VPC Service Controls.

In addition to other log messages, you can check logs for information about cloud-airflow-prod@system.gserviceaccount.com and service-PROJECT_ID@cloudcomposer-accounts.iam.gserviceaccount.com service accounts that configure components of your environments.

Cloud Composer service uses the cloud-airflow-prod@system.gserviceaccount.com service account to manage tenant project components of your environments.

The service-PROJECT_ID@cloudcomposer-accounts.iam.gserviceaccount.com service account, also known as Composer Service Agent Service Account manages environment components in service and host projects.

Limitations

  • Displaying a rendered template with functions in the web UI with DAG serialization enabled is supported for environments running Cloud Composer version 1.12.0 or later and Airflow version 1.10.9 or later.

  • Setting the async_dagbag_loader flag to True is not supported while DAG serialization is enabled.

  • Enabling DAG serialization disables all Airflow web server plugins, as they could risk the security of the VPC network where Cloud Composer is deployed. This doesn't impact the behaviour of scheduler or worker plugins, including Airflow operators, sensors etc.

  • When Cloud Composer is running inside a perimeter, access to public PyPI repositories is restricted. See Installing Python dependencies to learn how to install PyPI modules in Private IP mode.