Configuring VPC Service Controls

Virtual Private Cloud Service Controls (VPC Service Controls) enable organizations to define a perimeter around Google Cloud resources to mitigate data exfiltration risks.

Cloud Composer environments can be deployed within a service perimeter. By configuring your environment with VPC Service Controls, you can keep sensitive data private while taking advantage of the fully-managed workflow orchestration capabilities of Cloud Composer.

VPC Service Controls support for Cloud Composer means that:

  • Cloud Composer can now be selected as a secured service inside a VPC Service Controls perimeter.
  • All underlying resources used by Cloud Composer are configured to support VPC Service Controls architecture and follow its rules.

Deploying Cloud Composer environments with VPC Service Controls gives you:

  • Reduced risk of data exfiltration.
  • Protection against data exposure due to misconfigured access controls.
  • Reduced risk of malicious users copying data to unauthorized Google Cloud resources, or external attackers accessing Google Cloud resources from the Internet.

Creating a service perimeter

See Creating a service perimeter to learn how to create and configure service perimeters. Make sure to select Cloud Composer as one of the services secured within the perimeter.

Creating environments in a perimeter

There are some additional steps required to deploy Cloud Composer inside a perimeter. When creating your Cloud Composer environment:

  1. Enable Access Context Manager API and Cloud Composer API for your project. See Enabling APIs for reference.

  2. Whitelist the following service accounts by adding them to an AccessLevel and attaching it to the service perimeter:

    • PROJECT_NUMBER@cloudservices.gserviceaccount.com - This service account runs internal Google processes on your behalf, and is not listed in the Service Accounts section of Cloud Console. You can learn more about it on this page.
    • cloud-logs@system.gserviceaccount.com - This service account enables Cloud Composer to store logs in your project's Cloud Monitoring service.
  3. Add the following services to the perimeter for maximum protection of your environment: Cloud SQL, Pub/Sub, Monitoring, Cloud Storage, Kubernetes Engine, Container Registry.

  4. Use version composer-1.10.4 or later.

  5. Enable DAGs serialization in the Airflow database. To do this, add the [core] store_serialized_dags=True and [core] store_dag_code=True configuration override in core section when creating an environment. See DAG serialization for details.

  6. Create a new Cloud Composer environment with Private IP enabled. Note that this setting must be configured during environment creation.

  7. When creating your environment, remember to configure access to the Airflow web server. For maximum protection, only allow access to the web server from specific IP ranges. For details, see step 5 in Creating a new environment.

Configuring existing environments with VPC Service Controls

If your environment satisfies certain conditions, you can configure Cloud Composer to work inside a perimeter:

  1. The environment was created using the Composer Beta API, with Private IP enabled.

  2. DAG serialization is turned on.

If these conditions are met, you can add the project containing your environment to the perimeter, assuming the perimeter was created as described in the section above.

Installing PyPI packages

In the default VPC Service Controls configuration (shown above), Cloud Composer only supports installing PyPI packages from private repositories reachable from the private IP address space of the VPC network. The recommended configuration for this process is to set up a private PyPI repository, populate it with vetted packages used by your organization, then configure Cloud Composer to install Python dependencies from a private repository.

It's also possible to install PyPI packages from repositories outside the private IP space. Follow these steps:

  1. Configure Cloud NAT to allow Composer running in the private IP space to connect with external PyPI repositories.
  2. Configure your firewall rules to allow outbound connections from the Composer cluster to the repository.

When using this setup, make sure you understand the risks of using external repositories and are comfortable with them. Be sure that you trust the content and integrity of any external repos, as these connections could potentially be used as an exfiltration vector.

Network configuration checklist

Your VPC network must be configured properly to create Cloud Composer environments inside a perimeter. Make sure to follow the configuration requirements listed below.

  • Navigate to the VPC network -> Firewall section in the Cloud Console, and verify that the following firewall rules are configured:

    • Allow egress from GKE Node IP range to anywhere, port 53
    • Allow egress from GKE Node IP range to GKE Node IP range, all ports
    • Allow egress from GKE Node IP range to GKE Master IP range, all ports
    • Allow egress from GKE Node IP range to 199.36.153.4/30, port 443 (restricted.googleapis.com)
    • Allow ingress from GCP Health Checks 130.211.0.0/22,35.191.0.0/16 to the Node IP range. TCP Ports 80 and 443
    • Allow egress from the Node IP range to GCP Health Checks. TCP ports 80 and 443.

    See Using firewall rules to learn how to check, add, and update rules for your VPC network.

  • Configure connectivity to the restricted.googleapis.com endpoint.

    • Verify the existence of a DNS mapping from *.googleapis.com to restricted.googleapis.com
    • DNS *.gcr.io should resolve to 199.36.153.4/30 similarly to the googleapis.com endpoint. To do that, create a new zone as: CNAME *.gcr.io -> gcr.io. A gcr.io. -> 199.36.153.4, 199.36.153.5, 199.36.153.6, 199.36.153.7

    To learn more, see Setting up private connectivity to Google APIs and services.

Limitations

  • All VPC Service Controls network constraints will also apply to your Cloud Composer environments. See the VPC Service Controls documentation for details.

  • Enabling DAG serialization prevents Airflow from displaying a rendered template with functions in the web UI. This may be fixed in a future version of Airflow and Cloud Composer.

  • Setting the async_dagbag_loader flag to True is not supported while DAG serialization is enabled.

  • Enabling DAG serialization disables all Airflow web server plugins, as they could risk the security of the VPC network where Cloud Composer is deployed. This doesn't impact the behaviour of scheduler or worker plugins, including Airflow operators, sensors etc.

  • When Cloud Composer is running inside a perimeter, access to public PyPI repositories is restricted. See Installing Python dependencies to learn how to install PyPi modules in Private IP mode.