Jump to Content
Developers & Practitioners

Scheduling Cloud Bigtable Backups

May 5, 2021
Tracy Cui

Software Engineer, Cloud Bigtable

Jordan Hambleton

Cloud Data Engineer, Google Cloud

Cloud Bigtable backups let you save a copy of a table's schema and data, then restore from the backup to a new table at a later time. In this tutorial, you'll learn how to create backups at regularly scheduled intervals (such as daily or weekly) using the Cloud Bigtable Scheduled Backups example.


This example uses Cloud Scheduler to periodically send backup creation requests as Pub/Sub messages. The Pub/Sub messages trigger a Cloud Function which initiates a backup using the Cloud Bigtable Java client library. The function could be adapted to any of the clients that are supported in Cloud Functions.


This solution uses the following Google Cloud services:

  • Cloud Scheduler to trigger tasks with a cron-based schedule

  • Cloud Pub/Sub to pass the message request from Cloud Scheduler to Cloud Functions

  • Cloud Functions to initiate an operation for creating a Cloud Bigtable backup

  • Cloud Logging to create logs-based metrics

  • Cloud Monitoring to create alerts based on conditions of the logs-based metrics

https://storage.googleapis.com/gweb-cloudblog-publish/images/image3_hOYgV5N.max-800x800.png
System Architecture

Costs

This tutorial uses billable components of Google Cloud, including the following:

Use the pricing calculator to generate a cost estimate based on your projected usage.

Before you begin

Before proceeding with the tutorial, ensure the following:

APIs and IAM Roles Setup

The diagram below focuses on the actions flow between human roles and APIs.

https://storage.googleapis.com/gweb-cloudblog-publish/images/image7_nk3FLBI.max-1500x1500.png

IAM Roles for Administrators

The administrator should be granted specific roles to deploy the services needed for the solution.


Role

Purpose

roles/bigtable.admin

Cloud Bigtable administrator

roles/cloudfunctions.admin

Deploy and manage Cloud Functions

roles/deploymentmanager.editor

Deploy monitoring metrics

roles/pubsub.editor

Create and manage Pub/Sub topics

roles/cloudscheduler.admin

Set up a schedule in Cloud Scheduler

roles/appengine.appAdmin

Use Cloud Scheduler to deploy a cron service

roles/monitoring.admin

Set up alerting policies for failure notifications

roles/logging.admin

Add log based user metrics to track failures

The administrator also needs to be assigned a custom role that has the following permissions:

  • appengine.applications.create - for Cloud Scheduler to create an App Engine app

  • serviceusage.services.use - for Cloud Scheduler to use the App Engine app

Service Account for Cloud Functions

Cloud Functions calls the Cloud Bigtable API to create a backup, and the Cloud function gets triggered when a message arrives on the Pub/Sub topic. For successful execution, the Cloud Function should be able to consume from the Pub/Sub topic and have permissions to create Cloud Bigtable backups. To accomplish this, perform the following steps:

  1. Create a service account (e.g. cbt-scheduled-backups@iam.gserviceaccount.com).

  2. Create a custom role with the following permissions:

    • bigtable.backups.create

    • bigtable.backups.delete

    • bigtable.backups.get

    • bigtable.backups.list

    • bigtable.backups.restore

    • bigtable.backups.update

    • bigtable.instances.get

    • bigtable.tables.create

    • bigtable.tables.readRows

  3. Assign the custom role and roles/pubsub.subscriber to the service account. This allows Cloud Functions to read messages from the Pub/Sub topic and initiate a `create backup` request.

  4. Add the administrator as a member of the service account with role roles/iam.serviceAccountUser. This allows the administrator to deploy Cloud Functions.

Creating Scheduled Backups

Create a Pub/Sub topic

Create a Cloud Pub/Sub topic cloud-bigtable-scheduled-backups that serves as the target of the Cloud Scheduler job and triggers the Cloud function. For example:

gcloud pubsub topics create cloud-bigtable-scheduled-backups --project <project-id>

Then go to the Pub/Sub UI and verify that you can see the newly created topic:

https://storage.googleapis.com/gweb-cloudblog-publish/images/image10_WCxLKcs.max-900x900.png

Deploy a function to Cloud Functions

Create and deploy a Cloud Function cbt-create-backup-function, which is called whenever a Pub/Sub message arrives in cloud-bigtable-scheduled-backups topic. The deploy-backup-function function in the script scheduled_backups.sh wraps the gcloud function to do that.

./scripts/scheduled_backups.sh deploy-backup-function

Go to the Cloud Functions UI to view the function. The function subscribes to the Pub/Sub topic cloud-bigtable-scheduled-backups.

https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_W4icAzu.max-600x600.png

Deploying scheduled jobs using Cloud Scheduler

Note: To use Cloud Scheduler, you must create an App Engine app. This can be done explicitly before the next step or indirectly when running the next step.

Now we need to deploy the scheduled backup configuration to Cloud Scheduler. The configuration includes the time schedule of the cron job and the Pub/Sub topic name and message. This is also wrapped under a function in the script, and the configurations can be specified in the properties file.

./scripts/scheduled_backups.sh create-schedule

The job is now visible in the Cloud Scheduler UI:

https://storage.googleapis.com/gweb-cloudblog-publish/images/image2_NBfFzEc.max-500x500.png

Email notification of backup failures


To get email notifications on backup creation failures, follow these steps:

  1. Follow this guide to add your email address as a notification channel.

https://storage.googleapis.com/gweb-cloudblog-publish/images/image4_UaCLUcl.max-1900x1900.png
2. Create and deploy a custom metrics configuration file to filter logs generated by Cloud Functions, Cloud Scheduler, and Cloud Bigtable. We use Deployment Manager to create custom metrics. The example file can be found in ./config/metrics.yaml. Deploy the custom metrics in Cloud Logging:

./scripts/scheduled_backups.sh add-metrics

After this, you should see two user-defined metrics under Logs-based Metrics in Cloud Logging.
https://storage.googleapis.com/gweb-cloudblog-publish/images/image9_jeyJAxA.max-2000x2000.png
3. From there, you can choose an Aggregrator, such as sum or mean, for the target metric, then define a condition that triggers an alert. For example, you can choose the following:
  • Condition triggers if: Any time series violates

  • Condition: is above

  • Threshold: 0

For: 1 minute
https://storage.googleapis.com/gweb-cloudblog-publish/images/image5_KelDPD7.max-2000x2000.png

4. Add the notification channels you just created to the alerting policies. Whenever the condition breaks, you will receive an email notification.

https://storage.googleapis.com/gweb-cloudblog-publish/images/image6_hahrP6E.max-1500x1500.png

Considerations

To use Cloud Scheduler, you must create an App Engine app. Once you set a zone for the App Engine app, you cannot change it. Your Cloud Scheduler job must run in the same zone as your App Engine app.


Learn More


Posted in