Cloud Composer 1 | Cloud Composer 2 | Cloud Composer 3
This page provides troubleshooting information for problems that you might encounter while updating or upgrading Cloud Composer environments.
For troubleshooting information related to creating environments, see Troubleshooting environment creation.
When Cloud Composer environments are updated, the majority of issues happen because of the following reasons:
- Service account permission problems
- PyPI dependency issues
- Size of the Airflow database
Insufficient permissions to update or upgrade an environment
If Cloud Composer cannot update or upgrade an environment because of insufficient permissions, it outputs the following error message:
ERROR: (gcloud.composer.environments.update) PERMISSION_DENIED: The caller does not have permission
Solution: Assign roles to both to your account and to the service account of your environment as described in Access control.
The service account of the environment has insufficient permissions
When creating a Cloud Composer environment, you specify a service account that runs the environment's GKE cluster nodes. If this service account does not have enough permissions for the requested operation, Cloud Composer outputs an error:
UPDATE operation on this environment failed 3 minutes ago with the
following error message:
Composer Backend timed out. Currently running tasks are [stage:
CP_COMPOSER_AGENT_RUNNING
description: "No agent response published."
response_timestamp {
seconds: 1618203503
nanos: 291000000
}
].
Solution: Assign roles to both to your account and to the service account of your environment as described in Access control.
The size of the Airflow database is too big to perform the operation
A Cloud Composer upgrade operation might not succeed because the size of the Airflow database is too large for upgrade operations to succeed.
If the size of the Airflow database is more than 16 GB, Cloud Composer outputs the following error:
Airflow database uses more than 16 GB. Please clean the database before upgrading.
Solution: Perform the Airflow database cleanup, as described in Airflow database maintenance.
An upgrade to a new Cloud Composer version fails because of PyPI package conflicts
When you upgrade an environment with installed custom PyPI packages, you might encounter errors related to PyPI package conflicts. This might happen because the new Cloud Composer image contains newer versions of preinstalled packages that cause dependency conflicts with PyPI packages that you installed in your environment.
Solution:
- To get detailed information about package conflicts, run an upgrade check.
- Loosen version constraints for installed custom PyPI packages. For example,
instead of specifying a version as
==1.0.1
, specify it as>=1.0.1
. - For more information about changing version requirements to resolve conflicting dependencies, see pip documentation.
It's not possible to upgrade an environment to a version that is still supported
Cloud Composer environments can be upgraded only to several latest and previous versions.
The version limitations for creating new environments and upgrading existing environments are different. The Cloud Composer version you choose when creating a new environment might not be available when upgrading existing environments.
You can perform the upgrade operation using Google Cloud CLI, API or Terraform. In Google Cloud console, only the latest versions are available as upgrade choices.
Lack of connectivity to DNS can cause problems while performing upgrades or updates
Such connectivity problems might result in the log entries like this:
WARNING - Compute Engine Metadata server unavailable attempt 1 of 5. Reason: [Errno -3] Temporary failure in name resolution Error
It usually means that there is no route to DNS so make sure that metadata.google.internal DNS name can be resolved to IP address from within Cluster, Pods and Services networks. Check if you have Private Google Access turned on within VPC (in host or service project) where your environment is created.
More information:
Triggerer CPU exceeds the 1 vCPU limit
Cloud Composer 2 in versions 2.4.4 and higher introduces a different triggerer resource allocation strategy to improve performance scaling. If you encounter an error related to triggerer CPU when performing an environment update, it means that your current triggerers are configured to use more than 1 vCPU per triggerer.
Solution:
- Adjust triggerer resource allocation to meet the 1 vCPU limit.
- If you anticipate issues with DAGs that use deferrable operators, we recommend that you also increase the number of triggerers.
Inspect failed migration warnings
When upgrading Airflow to a later version, sometimes new constraints are applied to the Airflow database. If these constraints cannot be applied, Airflow creates new tables to store the rows for which the constraints couldn't be applied. Airflow UI displays a warning message until the moved data tables are renamed or dropped.
Solution:
You can use the following two DAGs to inspect the moved data and rename the tables.
The list_moved_tables_after_upgrade_dag
DAG lists rows that were moved from
every table where constraints could not be applied. Inspect the data and decide
whether you want to keep it. To keep it, you need to manually fix the data in
the Airflow database. For example, by adding the rows back with the correct data.
If you don't need the data or if you already fixed it, then you can run the
rename_moved_tables_after_upgrade_dag
DAG. This DAG renames the moved tables.
The tables and their data are not deleted, so you can review the data at a
later point.