Troubleshooting environment creation

Cloud Composer 1 | Cloud Composer 2

This page provides troubleshooting information for problems that you might encounter while creating Cloud Composer environments.

For troubleshooting information related to updating and upgrading environments, see Troubleshooting environment updates and upgrades.

When Cloud Composer environments are created, the majority of issues happen because of the following reasons:

  • Service account permission problems

  • Incorrect Firewall, DNS or routing information

  • Network-related issues. For example, invalid VPC configuration, IP address conflicts, or network IP ranges that are too narrow

  • Quota-related issues

  • Incompatible Organization Policies

Insufficient permissions to create an environment

If Cloud Composer cannot create an environment because your account has insufficient permissions, it outputs the following error messages:

ERROR: (gcloud.composer.environments.create) PERMISSION_DENIED: The caller
does not have permission

or

ERROR: (gcloud.composer.environments.create) PERMISSION_DENIED: User not
authorized to act as service account <service-account-name>.
The user must be granted iam.serviceAccounts.actAs permission, included in
Owner, Editor, Service Account User role. See https://cloud.google.com/iam/docs
/understanding-service-accounts for additional details.

Solution: Assign roles to both to your account and to the service account of your environment as described in Access control.

  • In Cloud Composer 2, make sure that Cloud Composer Service Agent service account (service-PROJECT_NUMBER@cloudcomposer-accounts.iam.gserviceaccount.com) has the Cloud Composer v2 API Service Agent Extension role assigned.

  • Make sure that Google APIs Service Agent (PROJECT_NUMBER@cloudservices.gserviceaccount.com) has the Editor role assigned.

  • In the Shared VPC configuration, follow Configure Shared VPC instructions.

The service account of the environment has insufficient permissions

When creating a Cloud Composer environment, you specify a service account that runs the environment's GKE cluster nodes. If this service account does not have enough permissions for the requested operation, Cloud Composer outputs the following error:

Errors in: [Web server]; Error messages:
  Creation of airflow web server version failed. This may be an intermittent
  issue of the App Engine service. You may retry the operation later.
{"ResourceType":"appengine.v1.version","ResourceErrorCode":"504","ResourceError
Message":"Your deployment has failed to become healthy in the allotted time
and therefore was rolled back. If you believe this was an error, try adjusting
the 'app_start_timeout_sec' setting in the 'readiness_check' section."}

Solution: Assign roles to both to your account and to the service account of your environment as described in Access control.

Warnings about missing IAM roles in service accounts

When an environment creation fails, Cloud Composer generates the following warning message after an error occurred: The issue may be caused by missing IAM roles in the following Service Accounts ....

This warning message highlights possible causes for the error. Cloud Composer checks for required roles on the service accounts in your project, and if these roles are not present, it generates this warning message.

Solution: Check that service accounts mentioned in the warning message have the required roles. For more information about roles and permissions in Cloud Composer, see Access control.

In some cases, you can ignore this warning. Cloud Composer does not check individual permissions assigned to roles. For example, If you use custom IAM roles, it is possible that the service account mentioned in the warning message already has all required permissions. In this case, you can ignore this warning.

A VPC network selected for the environment does not exist

You can specify a VPC network and a subnet for your Cloud Composer environment when you create it. If you do not specify a VPC network, then the Cloud Composer service selects the default VPC and the default subnet for the environment's region and zone.

If the specified VPC network and subnet do not exist, then Cloud Composer outputs the following error:

Errors in: [GKE cluster]; Error messages:
        {"ResourceType":"gcp-types/container-v1:projects.locations.clusters","R
        esourceErrorCode":"400","ResourceErrorMessage":{"code":400,"message":"P
        roject \"<your composer project>\" has no network named \"non-existing-
        vpc\".","status":"INVALID_ARGUMENT","statusMessage":"Bad
        Request","requestPath":"https://container.googleapis.com/
        v1/projects/<your composer
        project>/locations/<zone>/clusters","httpMethod":"POST"}}

Solution:

  • In Cloud Composer 2, you can create environments that use Private Service Connect instead of VPC networks.
  • Before creating an environment, make sure that the VPC network and the subnet for your new environment exist.

Incorrect network configuration

Cloud Composer environment creations require proper network or DNS configuration. Follow these instructions to configure connectivity to Google APIs and services:

If you configure Cloud Composer environments in a Shared VPC mode, then follow also these Shared VPC instructions.

Cloud Composer environment uses a subnet for cluster nodes and IP ranges for Pods and Services. To assure communication with these and other IP ranges, follow these instructions to configure firewall rules:

You can also check for any log entries within select GCE Networking and Subnetwork configuration categories in Activities to see if there are any errors reported during environment creation.

Quota issues encountered when creating environments in large-scale networks

When creating Cloud Composer environments in large-scale networks, you might encounter the following quota limitations:

  • The maximum number of VPC peerings per single VPC network is reached.
  • The maximum number of primary and secondary subnet IP ranges is reached.
  • The maximum number of forwarding rules in the peering group for Internal TCP/UDP Load Balancing is reached.

Solution:

Incompatible organization policies

The following policies must be configured appropriately so that Cloud Composer environments can be created successfully.

Organization Policy Cloud Composer 1 Cloud Composer 2
compute.disableSerialPortLogging Disabled for versions earlier than 1.13.0; otherwise any value Must be disabled
compute.requireOsLogin Must be disabled Any value is allowed
compute.vmCanIpForward Must be allowed (required for Cloud Composer-owned GKE clusters) when VPC native mode (using alias IP) is not configured Any value is allowed
compute.vmExternalIpAccess Must be allowed for Public IP environments Must be allowed for Public IP environments
compute.restrictVpcPeering Cannot be enforced Cannot be enforced
compute.disablePrivateServiceConnectCreationForConsumers Any value is allowed Cannot disallow "SERVICE_PRODUCERS" if Private Service Connect is used

For more information, see the Known issues page and Organization policy constraints.

Restricting services used within organization or project

Organization or project administrators can restrict what Google services can be used in their projects using gcp.restrictServiceUsage organization policy constraint.

When using this organization policy, it's important to allow all the services required by Cloud Composer.

400 Error Messages: Failed to deploy the Airflow web server.

This error might be caused by a failure to create a Private IP environment's GKE cluster because of overlapping IP ranges.

Solution: Check logs for any failures in your environment's cluster and resolve the issue based on the GKE error message.

Cloud Build fails to build environment images

If the Cloud Build service account (PROJECT_NUMBER@cloudbuild.gserviceaccount.com) does not have the Cloud Build Service Account (roles/cloudbuild.builds.builder) role in your project, then attempts to create or update an environment might fail with permission-related errors.

For example, you might see the denied: Permission "artifactregistry.repositories.uploadArtifacts" denied message followed by ERROR: failed to push because we ran out of retries in the Cloud Build logs.

To solve this issue, make sure that Cloud Build service account has the Cloud Build Service Account role.

What's next