Access the Airflow database

Cloud Composer 1 | Cloud Composer 2 | Cloud Composer 3

This page explains how to connect to a Cloud SQL instance that runs the Airflow database of your Cloud Composer environment and run SQL queries.

For example, you might want to run queries directly on the Airflow database, make database backups, gather statistics based on the database content, or retrieve any other custom information from the database.

For a Cloud Composer environment, one way to do it is by connecting to the Airflow database from a VM in the GKE cluster of your environment.

Get the name and region of your environment's cluster

  1. In the Google Cloud console, go to the Environments page.

    Go to Environments

  2. Select your environment.

  3. Go to the Environment configuration tab.

  4. The GKE cluster item lists the name and the region of your environment's GKE cluster.

    For example, if the value of this parameter is projects/example-project/locations/us-central1/clusters/us-central1-composer-exam-e-23125af4-gke, then the cluster name is us-central1-composer-exam-e-23125af4-gke, the cluster region is us-central1.

Get the database connection parameters

Airflow 2

  1. Access the Airflow web interface for your environment.
  2. In the Airflow web interface, go to Admin > Configurations.

  3. Find the sql_alchemy_conn parameter.

  4. Get the username, password, and database name from the value of this parameter.

    For example, if the value of this parameter is postgresql+psycopg2://root:example-password@127.0.0.1:3306/composer-1-17-0-airflow-2-0-1-400fa094, then the username is root, the password is example-password, the database name is composer-1-17-0-airflow-2-0-1-400fa094.

Airflow 1

Cloud Composer 2 supports only Airflow 2.

Get the database endpoint address

Public IP

  1. In the Google Cloud console, go to the Kubernetes Engine > Gateways, Services & Ingress page:

    Go to Services & Ingress

  2. Select the airflow-sqlproxy-service service for your cluster. You can use the name of the cluster or the listed IP range for endpoints to look up the correct item.

  3. On the Service details page, find the IP address of a serving pod. It is listed in the Serving pods section, in the Endpoints column.

Private IP - PSC

  1. In the Google Cloud console, go to the Network services > Private Service Connect page:

    Go to Private Service Connect

  2. Click the endpoint with the name of your environment. For example, it can be named us-central1-example-environ-d9cb394f-psc.

  3. Your environment connects to the database through the IP address specified in the IP address row.

Private IP - VPC peerings

  1. In the Google Cloud console, go to the Kubernetes Engine > Gateways, Services & Ingress page:

    Go to Services & Ingress

  2. Select the airflow-sqlproxy-service service for your cluster. You can use the name of the cluster or the listed IP range for endpoints to look up the correct item.

  3. On the Service details page, find the IP address of a serving pod. It is listed in the Serving pods section, in the Endpoints column.

Create a VM instance

Create a new VM instance with the following parameters:

  • Region. Select the same region as your cluster.

  • Zone. Select any zone in the same region as your cluster.

  • Boot disk. Select a Linux distribution, for example, Ubuntu 21.04 LTS.

  • Specify an IP address in the cluster's IP range:

    1. Expand the Management, security, disks, networking, sole tenancy item.
    2. Go to the Networking tab.
    3. In Network interfaces, expand the default interface item.
    4. Expand Show alias IP ranges.
    5. In the Subnet range Select the gke-services IP range for your cluster. For example, gke-europe-west3-composer-exam7b206598-gke-services-115b26e7. You obtained the name of the cluster on the previous step.
    6. In the Alias IP range, specify an IP address for the VM instance that in the IP address range of the specified subnet range.

Connect to the VM instance and install the SQL client package

Airflow 2

  1. Connect to the VM instance that you created on the previous step.

  2. Install the postgresql-client package.

    sudo apt-get update
    sudo apt-get install postgresql-client
    

Airflow 1

Cloud Composer 2 supports only Airflow 2.

Connect to the Airflow database

While connected to the VM instance, run the following command to connect to the Airflow database:

Airflow 2

psql --user=USERNAME --password \
  --host=DB_ENDPOINT \
  --port=3306 \
  DB_NAME

Airflow 1

Cloud Composer 2 supports only Airflow 2.

Replace with the values obtained on the previous steps:

  • USERNAME with the username.
  • DB_ENDPOINT with the IP address of the database endpoint that you obtained earlier in this guide.
  • DB_NAME with the database name.

Run SQL queries

A prompt appears. You can run SQL queries in this prompt. For example:

SELECT * FROM dag LIMIT 10;

Dump database contents and transfer them to a bucket

To dump database contents into a file, run the following command:

Airflow 2

pg_dump --user=USERNAME --password \
  --host=DB_ENDPOINT \
  --port=3306 \
  DB_NAME > DUMP_FILE

Airflow 1

Cloud Composer 2 supports only Airflow 2.

Replace:

  • USERNAME with the username.
  • DB_ENDPOINT with the IP address of the database endpoint.
  • DB_NAME with the database name.
  • DUMP_FILE with the name of the dump file. For example, dump.sql.

To transfer the dump file to a Cloud Storage bucket:

gcloud storage cp DUMP_FILE BUCKET_ADDRESS

Replace:

  • DUMP_FILE with the name of the dump file.
  • BUCKET_ADDRESS with the address of the bucket. For example, gs://europe-west3-db-dump.

As an alternative, you can transfer the dump file locally with gcloud compute scp.

What's next