Error message jobs marked as failed but triggered unaffected jobs

Problem

You see Cloud Composer jobs fail with the Negsignal.SIGSEGV error message but their triggered jobs in Dataproc or Dataflow seems to be successfully running and finishing as well.

Example:

[YYYY-MM-DD hh:mm:ss,sss] {local_task_job.py:102} INFO - Task exited with return code Negsignal.SIGSEGV

Environment

  • Cloud composer version composer -1.14.4-airflow-1.10.14 and above.

Solution

  1. Set default Airflow configuration GPRC_POLL_STRATEGY to epoll1 in Override Airflow configuration options.
  2. Upgrade of Cloud composer Image released after 2021-09-21 should also resolve this issue. See Cloud Composer release notes .

Cause

In gRPC v1.30.0, the default polling strategy was epoll1 which supports OS Fork and only compatible poll polling strategies which was required for Dataproc, Dataflow, and BigQuery job operators to get the status of the job.

gRPC v1.31.0 have changed the polling strategy from epoll1 to epollex which is not a compatible poll polling
strategies for Celery workers.

The composer version composer-1.14.4-airflow-1.10.14 and above uses grpcio v1.33.2 and above as mentioned in the version list.