Build a BigQuery processing pipeline with Eventarc


This tutorial shows you how to use Eventarc to build a processing pipeline that schedules queries to a public BigQuery dataset, generates charts based on the data, and shares links to the charts through email.

Objectives

In this tutorial, you will build and deploy three Cloud Run services that allow unauthenticated access and that receive events using Eventarc:

  1. Query Runner—Triggered when Cloud Scheduler jobs publish a message to a Pub/Sub topic, this service uses the BigQuery API to retrieve data from a public COVID-19 dataset, and saves the results in a new BigQuery table.
  2. Chart Creator—Triggered when the Query Runner service publishes a message to a Pub/Sub topic, this service generates charts using the Python plotting library, Matplotlib, and saves the charts to a Cloud Storage bucket.
  3. Notifier—Triggered by audit logs when the Chart Creator service stores a chart in a Cloud Storage bucket, this service uses the email service, SendGrid, to send links to the charts to an email address.

The following diagram shows the high-level architecture:

BigQuery processing pipeline

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

Before you begin

Security constraints defined by your organization might prevent you from completing the following steps. For troubleshooting information, see Develop applications in a constrained Google Cloud environment.

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Cloud Build, Cloud Logging, Cloud Run, Cloud Scheduler, Container Registry, Eventarc, Pub/Sub APIs.

    Enable the APIs

  5. Install the Google Cloud CLI.
  6. To initialize the gcloud CLI, run the following command:

    gcloud init
  7. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  8. Make sure that billing is enabled for your Google Cloud project.

  9. Enable the Cloud Build, Cloud Logging, Cloud Run, Cloud Scheduler, Container Registry, Eventarc, Pub/Sub APIs.

    Enable the APIs

  10. Install the Google Cloud CLI.
  11. To initialize the gcloud CLI, run the following command:

    gcloud init
  12. Update gcloud components:
    gcloud components update
  13. Log in using your account:
    gcloud auth login
  14. Select Google Cloud Storage and enable the Admin Read, Data Read, and Data Write Log Types:

    Go to Cloud Audit Logs

  15. Grant the eventarc.eventReceiver role to the Compute Engine service account:

    export PROJECT_NUMBER="$(gcloud projects describe $(gcloud config get-value project) --format='value(projectNumber)')"
    
    gcloud projects add-iam-policy-binding $(gcloud config get-value project) \
        --member=serviceAccount:${PROJECT_NUMBER}-compute@developer.gserviceaccount.com \
        --role='roles/eventarc.eventReceiver'
    

  16. If you enabled the Pub/Sub service account on or before April 8, 2021, grant the iam.serviceAccountTokenCreator role to the Pub/Sub service account:

    gcloud projects add-iam-policy-binding $(gcloud config get-value project) \
        --member="serviceAccount:service-${PROJECT_NUMBER}@gcp-sa-pubsub.iam.gserviceaccount.com"\
        --role='roles/iam.serviceAccountTokenCreator'
    

  17. Set the defaults used in this tutorial:
    export REGION=REGION
    gcloud config set run/region ${REGION}
    gcloud config set run/platform managed
    gcloud config set eventarc/location ${REGION}
    

    Replace REGION with the supported Eventarc location of your choice.

  18. Download and install the Git source code management tool.

Create a SendGrid API key

SendGrid is a cloud-based email provider that lets you send email without having to maintain email servers.

  1. Sign in to SendGrid and go to Settings > API Keys.
  2. Click Create API Key.
  3. Select the permissions for the key. At a minimum, the key must have Mail Send permissions to send email.
  4. Click Save to create the key.
  5. SendGrid generates a new key. This is the only copy of the key, so make sure that you copy the key and save it for later.

Create a Cloud Storage bucket

Create a unique Cloud Storage bucket to save the charts. Make sure that the bucket and the charts are publicly available, and in the same region as your Cloud Run service:

  export BUCKET="$(gcloud config get-value core/project)-charts"
  gsutil mb -l $(gcloud config get-value run/region) gs://${BUCKET}
  gsutil uniformbucketlevelaccess set on gs://${BUCKET}
  gsutil iam ch allUsers:objectViewer gs://${BUCKET}
  

Deploy the Notifier service

Deploy a Cloud Run service that receives Chart Creator events and uses SendGrid to email links to the generated charts.

  1. Clone the GitHub repository and change to the notifier/python directory:

    git clone https://github.com/GoogleCloudPlatform/eventarc-samples
    cd eventarc-samples/processing-pipelines/bigquery/notifier/python/
    
  2. Build and push the container image:

    export SERVICE_NAME=notifier
    docker build -t gcr.io/$(gcloud config get-value project)/${SERVICE_NAME}:v1 .
    docker push gcr.io/$(gcloud config get-value project)/${SERVICE_NAME}:v1
    
  3. Deploy the container image to Cloud Run, passing in an address to send emails to, and the SendGrid API key:

    export TO_EMAILS=EMAIL_ADDRESS
    export SENDGRID_API_KEY=YOUR_SENDGRID_API_KEY
    gcloud run deploy ${SERVICE_NAME} \
        --image gcr.io/$(gcloud config get-value project)/${SERVICE_NAME}:v1 \
        --update-env-vars TO_EMAILS=${TO_EMAILS},SENDGRID_API_KEY=${SENDGRID_API_KEY},BUCKET=${BUCKET} \
        --allow-unauthenticated
    

    Replace the following:

    • EMAIL_ADDRESS with an email address to send the links to the generated charts
    • YOUR_SENDGRID_API_KEY with the SendGrid API key you noted previously

When you see the service URL, the deployment is complete.

Create a trigger for the Notifier service

The Eventarc trigger for the Notifier service deployed on Cloud Run filters for Cloud Storage audit logs where the methodName is storage.objects.create.

  1. Create the trigger:

    gcloud eventarc triggers create trigger-${SERVICE_NAME} \
        --destination-run-service=${SERVICE_NAME} \
        --destination-run-region=${REGION} \
        --event-filters="type=google.cloud.audit.log.v1.written" \
        --event-filters="serviceName=storage.googleapis.com" \
        --event-filters="methodName=storage.objects.create" \
        --service-account=${PROJECT_NUMBER}-compute@developer.gserviceaccount.com
    

    This creates a trigger called trigger-notifier.

Deploy the Chart Creator service

Deploy a Cloud Run service that receives Query Runner events, retrieves data from a BigQuery table for a specific country, and then generates a chart, using Matplotlib, from the data. The chart is uploaded to a Cloud Storage bucket.

  1. Change to the chart-creator/python directory:

    cd ../../chart-creator/python
    
  2. Build and push the container image:

    export SERVICE_NAME=chart-creator
    docker build -t gcr.io/$(gcloud config get-value project)/${SERVICE_NAME}:v1 .
    docker push gcr.io/$(gcloud config get-value project)/${SERVICE_NAME}:v1
    
  3. Deploy the container image to Cloud Run, passing in BUCKET:

    gcloud run deploy ${SERVICE_NAME} \
        --image gcr.io/$(gcloud config get-value project)/${SERVICE_NAME}:v1 \
        --update-env-vars BUCKET=${BUCKET} \
        --allow-unauthenticated
    

When you see the service URL, the deployment is complete.

Create a trigger for the Chart Creator service

The Eventarc trigger for the Chart Creator service deployed on Cloud Run filters for messages published to a Pub/Sub topic.

  1. Create the trigger:

    gcloud eventarc triggers create trigger-${SERVICE_NAME} \
        --destination-run-service=${SERVICE_NAME} \
        --destination-run-region=${REGION} \
        --event-filters="type=google.cloud.pubsub.topic.v1.messagePublished"
    

    This creates a trigger called trigger-chart-creator.

  2. Set the Pub/Sub topic environment variable.

    export TOPIC_QUERY_COMPLETED=$(basename $(gcloud eventarc triggers describe trigger-${SERVICE_NAME} --format='value(transport.pubsub.topic)'))
    

Deploy the Query Runner service

Deploy a Cloud Run service that receives Cloud Scheduler events, retrieves data from a public COVID-19 dataset, and saves the results in a new BigQuery table.

  1. Change to the processing-pipelines directory:

    cd ../../..
    
  2. Build and push the container image:

    export SERVICE_NAME=query-runner
    docker build -t gcr.io/$(gcloud config get-value project)/${SERVICE_NAME}:v1 -f bigquery/${SERVICE_NAME}/csharp/Dockerfile .
    docker push gcr.io/$(gcloud config get-value project)/${SERVICE_NAME}:v1
    
  3. Deploy the container image to Cloud Run, passing in PROJECT_ID and TOPIC_QUERY_COMPLETED:

    gcloud run deploy ${SERVICE_NAME} \
        --image gcr.io/$(gcloud config get-value project)/${SERVICE_NAME}:v1 \
        --update-env-vars PROJECT_ID=$(gcloud config get-value project),TOPIC_ID=${TOPIC_QUERY_COMPLETED} \
        --allow-unauthenticated
    

When you see the service URL, the deployment is complete.

Create a trigger for the Query Runner service

The Eventarc trigger for the Query Runner service deployed on Cloud Run filters for messages published to a Pub/Sub topic.

  1. Create the trigger:

    gcloud eventarc triggers create trigger-${SERVICE_NAME} \
        --destination-run-service=${SERVICE_NAME} \
        --destination-run-region=${REGION} \
        --event-filters="type=google.cloud.pubsub.topic.v1.messagePublished"
    

    This creates a trigger called trigger-query-runner.

  2. Set an environment variable for the Pub/Sub topic.

    export TOPIC_QUERY_SCHEDULED=$(gcloud eventarc triggers describe trigger-${SERVICE_NAME} --format='value(transport.pubsub.topic)')
    

Schedule the jobs

The processing pipeline is triggered by two Cloud Scheduler jobs.

  1. Create an App Engine app which is required by Cloud Scheduler and specify an appropriate location:

    export APP_ENGINE_LOCATION=LOCATION
    gcloud app create --region=${APP_ENGINE_LOCATION}
    
  2. Create two Cloud Scheduler jobs that publish to a Pub/Sub topic once per day:

    gcloud scheduler jobs create pubsub cre-scheduler-uk \
        --schedule="0 16 * * *" \
        --topic=${TOPIC_QUERY_SCHEDULED} \
        --message-body="United Kingdom"
    
    gcloud scheduler jobs create pubsub cre-scheduler-cy \
        --schedule="0 17 * * *" \
        --topic=${TOPIC_QUERY_SCHEDULED} \
        --message-body="Cyprus"
    

    The schedule is specified in unix-cron format. For example, 0 16 * * * means that the jobs runs at 16:00 (4 PM) UTC every day.

Run the pipeline

  1. First, confirm that all the triggers were successfully created:

    gcloud eventarc triggers list
    

    The output should be similar to the following:

    NAME                   TYPE                                           DESTINATION_RUN_SERVICE  DESTINATION_RUN_PATH  ACTIVE
    trigger-chart-creator  google.cloud.pubsub.topic.v1.messagePublished  chart-creator                                  Yes
    trigger-notifier       google.cloud.audit.log.v1.written              notifier                                       Yes
    trigger-query-runner   google.cloud.pubsub.topic.v1.messagePublished  query-runner                                   Yes
    
  2. Retrieve the Cloud Scheduler job IDs:

    gcloud scheduler jobs list
    

    The output should be similar to the following:

    ID                LOCATION      SCHEDULE (TZ)         TARGET_TYPE  STATE
    cre-scheduler-cy  us-central1   0 17 * * * (Etc/UTC)  Pub/Sub      ENABLED
    cre-scheduler-uk  us-central1   0 16 * * * (Etc/UTC)  Pub/Sub      ENABLED
    
  3. Although the jobs are scheduled to run daily at 4 and 5 PM, you can also run the Cloud Scheduler jobs manually:

    gcloud scheduler jobs run cre-scheduler-cy
    gcloud scheduler jobs run cre-scheduler-uk
    
  4. After a few minutes, confirm that there are two charts in the Cloud Storage bucket:

    gsutil ls gs://${BUCKET}
    

    The output should be similar to the following:

    gs://BUCKET/chart-cyprus.png
    gs://BUCKET/chart-unitedkingdom.png
    

Congratulations! You should also receive two emails with links to the charts.

Clean up

If you created a new project for this tutorial, delete the project. If you used an existing project and wish to keep it without the changes added in this tutorial, delete the resources created for the tutorial.

Delete the project

The easiest way to eliminate billing is to delete the project that you created for the tutorial.

To delete the project:

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Delete tutorial resources

  1. Delete the Cloud Run service you deployed in this tutorial:

    gcloud run services delete SERVICE_NAME

    Where SERVICE_NAME is your chosen service name.

    You can also delete Cloud Run services from the Google Cloud console.

  2. Remove any gcloud CLI default configurations you added during the tutorial setup.

    For example:

    gcloud config unset run/region

    or

    gcloud config unset project

  3. Delete other Google Cloud resources created in this tutorial:

    • Delete the Eventarc trigger:
      gcloud eventarc triggers delete TRIGGER_NAME
      
      Replace TRIGGER_NAME with the name of your trigger.

What's next