Scheduling Cloud Bigtable Backups
Tracy Cui
Software Engineer, Cloud Bigtable
Jordan Hambleton
Cloud Data Engineer, Google Cloud
Cloud Bigtable backups let you save a copy of a table's schema and data, then restore from the backup to a new table at a later time. In this tutorial, you'll learn how to create backups at regularly scheduled intervals (such as daily or weekly) using the Cloud Bigtable Scheduled Backups example.
This example uses Cloud Scheduler to periodically send backup creation requests as Pub/Sub messages. The Pub/Sub messages trigger a Cloud Function which initiates a backup using the Cloud Bigtable Java client library. The function could be adapted to any of the clients that are supported in Cloud Functions.
This solution uses the following Google Cloud services:
Cloud Scheduler to trigger tasks with a cron-based schedule
Cloud Pub/Sub to pass the message request from Cloud Scheduler to Cloud Functions
Cloud Functions to initiate an operation for creating a Cloud Bigtable backup
Cloud Logging to create logs-based metrics
Cloud Monitoring to create alerts based on conditions of the logs-based metrics
Costs
This tutorial uses billable components of Google Cloud, including the following:
Use the pricing calculator to generate a cost estimate based on your projected usage.
Before you begin
Before proceeding with the tutorial, ensure the following:
A Cloud Bigtable table exists in the same Google Cloud project. Please check Cloud Bigtable documentation if needed.
Google Cloud SDK is installed
APIs and IAM Roles Setup
The diagram below focuses on the actions flow between human roles and APIs.
IAM Roles for Administrators
The administrator should be granted specific roles to deploy the services needed for the solution.
Role | Purpose |
roles/bigtable.admin | Cloud Bigtable administrator |
roles/cloudfunctions.admin | Deploy and manage Cloud Functions |
roles/deploymentmanager.editor | Deploy monitoring metrics |
roles/pubsub.editor | Create and manage Pub/Sub topics |
roles/cloudscheduler.admin | Set up a schedule in Cloud Scheduler |
roles/appengine.appAdmin | Use Cloud Scheduler to deploy a cron service |
roles/monitoring.admin | Set up alerting policies for failure notifications |
roles/logging.admin | Add log based user metrics to track failures |
The administrator also needs to be assigned a custom role that has the following permissions:
appengine.applications.create - for Cloud Scheduler to create an App Engine app
serviceusage.services.use - for Cloud Scheduler to use the App Engine app
Service Account for Cloud Functions
Cloud Functions calls the Cloud Bigtable API to create a backup, and the Cloud function gets triggered when a message arrives on the Pub/Sub topic. For successful execution, the Cloud Function should be able to consume from the Pub/Sub topic and have permissions to create Cloud Bigtable backups. To accomplish this, perform the following steps:
Create a service account (e.g. cbt-scheduled-backups@iam.gserviceaccount.com).
Create a custom role with the following permissions:
bigtable.backups.create
bigtable.backups.delete
bigtable.backups.get
bigtable.backups.list
bigtable.backups.restore
bigtable.backups.update
bigtable.instances.get
bigtable.tables.create
bigtable.tables.readRows
Assign the custom role and roles/pubsub.subscriber to the service account. This allows Cloud Functions to read messages from the Pub/Sub topic and initiate a `create backup` request.
Add the administrator as a member of the service account with role roles/iam.serviceAccountUser. This allows the administrator to deploy Cloud Functions.
Creating Scheduled Backups
Create a Pub/Sub topic
Create a Cloud Pub/Sub topic cloud-bigtable-scheduled-backups that serves as the target of the Cloud Scheduler job and triggers the Cloud function. For example:
gcloud pubsub topics create cloud-bigtable-scheduled-backups --project <project-id>
Then go to the Pub/Sub UI and verify that you can see the newly created topic:
Deploy a function to Cloud Functions
Create and deploy a Cloud Function cbt-create-backup-function, which is called whenever a Pub/Sub message arrives in cloud-bigtable-scheduled-backups topic. The deploy-backup-function function in the script scheduled_backups.sh wraps the gcloud function to do that.
./scripts/scheduled_backups.sh deploy-backup-function
Go to the Cloud Functions UI to view the function. The function subscribes to the Pub/Sub topic cloud-bigtable-scheduled-backups.
Deploying scheduled jobs using Cloud Scheduler
Note: To use Cloud Scheduler, you must create an App Engine app. This can be done explicitly before the next step or indirectly when running the next step.
Now we need to deploy the scheduled backup configuration to Cloud Scheduler. The configuration includes the time schedule of the cron job and the Pub/Sub topic name and message. This is also wrapped under a function in the script, and the configurations can be specified in the properties file.
./scripts/scheduled_backups.sh create-schedule
The job is now visible in the Cloud Scheduler UI:
Email notification of backup failures
To get email notifications on backup creation failures, follow these steps:
Follow this guide to add your email address as a notification channel.
./scripts/scheduled_backups.sh add-metrics
Condition triggers if: Any time series violates
Condition: is above
Threshold: 0
4. Add the notification channels you just created to the alerting policies. Whenever the condition breaks, you will receive an email notification.
Considerations
To use Cloud Scheduler, you must create an App Engine app. Once you set a zone for the App Engine app, you cannot change it. Your Cloud Scheduler job must run in the same zone as your App Engine app.
Learn More
To get started, create a Cloud Bigtable instance or try it out with Cloud Bigtable Qwiklab.
Check out this Github sample for details about this Scheduled Backup tool.
Learn more about the managed backups feature of Cloud Bigtable.