Access the Airflow database

Cloud Composer 1 | Cloud Composer 2

This page explains how to connect to a Cloud SQL instance that runs the Airflow database of your Cloud Composer environment and run SQL queries.

For example, you might want to run queries directly on the Airflow database, make database backups, gather statistics based on the database content, or retrieve any other custom information from the database.

For a Cloud Composer environment, one way to do it is by connecting to the Airflow database from a VM in the GKE cluster of your environment.

Get the name and zone of your environment's cluster

  1. In the Google Cloud Console, go to the Environments page.

    Go to Environments

  2. Select your environment.

  3. Go to the Environment configuration tab.

  4. The GKE cluster item lists the name and the zone of your environment's GKE cluster.

    For example, if the value of this parameter is projects/example-project/zones/europe-west3-a/clusters/europe-west3-composer-exam--7b206598-gke, then the cluster name is europe-west3-composer-exam--7b206598-gke, the cluster zone is europe-west3-a.

Get the database connection parameters

Airflow 2

  1. Access the Airflow web interface for your environment.
  2. In the Airflow web interface, go to Admin > Configurations.

  3. Find the sql_alchemy_conn parameter.

  4. Get the user name, password, and database name from the value of this parameter.

    For example, if the value of this parameter is postgresql+psycopg2://root:example-password@127.0.0.1:3306/composer-1-17-0-airflow-2-0-1-400fa094, then the user name is root, the password is example-password, the database name is composer-1-17-0-airflow-2-0-1-400fa094.

Airflow 1

Cloud Composer 2 supports only Airflow 2.

Get the SQL proxy endpoint address

  1. In the Google Cloud Console, go to the Kubernetes Engine > Services & Ingress page:

    Go to Services & Ingress

  2. Select the airflow-sqlproxy-service service for your cluster. You can use the name of the cluster or the listed IP range for endpoints to look up the correct item.

  3. On the Service details page, find the IP address of a serving pod. It is listed in the Serving pods section, in the Endpoints column.

Create a VM instance

Create a new VM instance with the following parameters:

  • Region. Select the same region as your cluster.
  • Zone. Select the same zone as your cluster.
  • Boot disk. Select a Linux distribution, for example, Ubuntu 21.04 LTS.
  • (Only for Private IP) Specify an IP address in the cluster's IP range:

    1. Expand the Management, security, disks, networking, sole tenancy item.
    2. Go to the Networking tab.
    3. In Network interfaces, expand the default interface item.
    4. Expand Show alias IP ranges.
    5. In the Subnet range Select the gke-services IP range for your cluster. For example, gke-europe-west3-composer-exam7b206598-gke-services-115b26e7. You obtained the name of the cluster on the previous step.
    6. In the Alias IP range, specify an IP address for the VM instance that in the IP address range of the specified subnet range.

Connect to the VM instance and install the SQL client package

Airflow 2

  1. Connect to the VM instance that you created on the previous step.

  2. Install the postgresql-client package.

    sudo apt-get update
    sudo apt-get install postgresql-client
    

Airflow 1

Cloud Composer 2 supports only Airflow 2.

Connect to the Airfow database

While connected to the VM instance, run the following command to connect to the Airflow database:

Airflow 2

psql --user=USERNAME --password --host=SQL_ENDPOINT --port=3306 DB_NAME

Airflow 1

Cloud Composer 2 supports only Airflow 2.

Replace with the values obtained on the previous steps:

  • USERNAME with the user name.
  • SQL_ENDPOINT with the IP address of the SQL endpoint.
  • DB_NAME with the database name.

Run SQL queries

A prompt appears. You can run SQL queries in this prompt. For example:

SELECT * FROM dag LIMIT 10;

Dump database contents and transfer them to a bucket

To dump database contents into a file, run the following command:

Airflow 2

pg_dump --user=USERNAME --password --host=SQL_ENDPOINT --port=3306 \
    DB_NAME > DUMP_FILE

Airflow 1

Cloud Composer 2 supports only Airflow 2.

Replace:

  • USERNAME with the user name.
  • SQL_ENDPOINT with the IP address of the SQL endpoint.
  • DB_NAME with the database name.
  • DUMP_FILE with the name of the dump file. For example, dump.sql.

To transfer the dump file to a Cloud Storage bucket:

gsutil cp DUMP_DILE BUCKET_ADDRESS

Replace:

  • DUMP_FILE with the name of the dump file.
  • BUCKET_ADDRESS with the address of the bucket. For example, gs://europe-west3-db-dump.

As an alternative, you can transfer the dump file locally with gcloud compute scp.

What's next