Scheduling Cloud SQL database exports using Cloud Scheduler

This tutorial shows how to use Cloud Scheduler and Cloud Functions to automatically export a Cloud SQL for MySQL database to Cloud Storage. Having database exports on Cloud Storage lets you create a robust, diverse disaster recovery plan. For example, you can export to a different region, and import to other Cloud SQL instances or other MySQL databases.

Architecture

This tutorial includes the following Google Cloud components:

A Cloud Scheduler job posts a message on a Pub/Sub topic with information about the Cloud SQL instance name, the database, the project ID, and the Cloud Storage location at which to store the backup. This event triggers a Cloud Function that gets this payload and starts a database export on Cloud SQL through the SQL Admin API. The database generates the export and saves it to Cloud Storage. The following diagram shows this process.

Workflow from Cloud Scheduler to Pub/Sub, which triggers a Cloud Functions that starts the export.

Objectives

Costs

This tutorial uses the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

When you finish this tutorial, you can avoid continued billing by deleting the resources you created. For more information, see Cleaning up.

Before you begin

  1. In the Cloud Console, on the project selector page, select or create a Cloud project.

    Go to the project selector page

  2. Make sure that billing is enabled for your Google Cloud project. Learn how to confirm billing is enabled for your project.

  3. In the Cloud Console, activate Cloud Shell.

    Activate Cloud Shell

  4. Enable the Cloud SQL Admin, Cloud Functions, Cloud Scheduler, and App Engine APIs.

    Enable the API

Throughout this tutorial, you run all commands from Cloud Shell.

Setting up your environment

To get started, you first clone the repository that contains the sample data. You then configure your environment and create custom roles that have the permissions needed for this tutorial.

  1. Clone the repository that contains the sample data:

    git clone https://github.com/GoogleCloudPlatform/training-data-analyst.git
    

    You use the data from the training-data-analyst repository to create a database with some mock records.

  2. Configure the following environment variables:

    export PROJECT_ID=`gcloud config get-value project`
    export DEMO="sql-export-tutorial"
    export BUCKET_NAME=${USER}-mysql-$(date +%s)
    export SQL_INSTANCE="${DEMO}-sql"
    export GCF_NAME="${DEMO}-gcf"
    export PUBSUB_TOPIC="${DEMO}-topic"
    export SCHEDULER_JOB="${DEMO}-job"
    export SQL_ROLE="sqlExporter"
    export STORAGE_ROLE="simpleStorageRole"
    export REGION="us-west2"
    
  3. Create two custom roles that have only the permissions needed for this tutorial:

    gcloud iam roles create ${STORAGE_ROLE} --project ${PROJECT_ID} \
        --title "Simple Storage Role" \
        --description "Grant permissions to view and create objects in Cloud Storage" \
        --permissions "storage.objects.create,storage.objects.get"
    
    gcloud iam roles create ${SQL_ROLE} --project ${PROJECT_ID} \
        --title "SQL Exporter Role" \
        --description "Grant permissions to export data from a Cloud SQL instance to a Cloud Storage bucket as a SQL dump or CSV file" \
        --permissions "cloudsql.instances.export"
    

    These roles reduce the scope of access of Cloud Functions and Cloud SQL service accounts, following the principle of least privilege.

Creating a Cloud Storage bucket and a Cloud SQL instance

In this section, you first create a Cloud Storage bucket and a Cloud SQL MySQL instance. Then you create a sample database and populate it with sample data.

Create a Cloud Storage bucket

You use the gsutil command-line tool to create a Cloud Storage bucket.

  • Create a Cloud Storage bucket where you want to save the data exports:

    gsutil mb -l ${REGION} gs://${BUCKET_NAME}
    

Create a Cloud SQL instance and grant permissions to its service account

Next, you create a Cloud SQL instance and grant its service account the permissions to export data to Cloud Storage.

  1. Create a Cloud SQL for MySQL 5.7 instance:

    gcloud sql instances create ${SQL_INSTANCE} --database-version MYSQL_5_7 --region ${REGION}
    

    This operation takes a few minutes to complete.

  2. Verify that the Cloud SQL instance is running:

    gcloud sql instances list --filter name=${SQL_INSTANCE}
    

    The output looks similar to the following:

    NAME                     DATABASE_VERSION  LOCATION    TIER              PRIMARY_ADDRESS  PRIVATE_ADDRESS  STATUS
    sql-export-tutorial-sql  MYSQL_5_7         us-west2-b  db-n1-standard-1  34.94.173.98     -                RUNNABLE
    
  3. Grant your Cloud SQL service account the permissions to export data to Cloud Storage with the Simple Storage Role:

    export SQL_SA=(`gcloud sql instances describe ${SQL_INSTANCE} \
        --project ${PROJECT_ID} \
        --format "value(serviceAccountEmailAddress)"`)
    
    gsutil iam ch serviceAccount:${SQL_SA}:projects/${PROJECT_ID}/roles/${STORAGE_ROLE} gs://${BUCKET_NAME}
    

Populate the Cloud SQL instance with sample data

Now you can upload files to your bucket and create and populate your sample database.

  1. Go to the repository that you cloned:

    cd training-data-analyst/CPB100/lab3a/cloudsql
    
  2. Upload the files in the directory to your new bucket:

    gsutil cp * gs://${BUCKET_NAME}
    
  3. Create and populate a sample database; whenever you are prompted, click yes to continue:

    gcloud sql import sql ${SQL_INSTANCE} gs://${BUCKET_NAME}/table_creation.sql --project ${PROJECT_ID}
    
    gcloud sql import csv ${SQL_INSTANCE} gs://${BUCKET_NAME}/accommodation.csv \
        --database recommendation_spark \
        --table Accommodation
    
    gcloud sql import csv ${SQL_INSTANCE} gs://${BUCKET_NAME}/rating.csv \
        --database recommendation_spark \
        --table Rating
    

Creating a Pub/Sub topic, a Cloud Function, and a Cloud Scheduler job

In this section, you create a custom service account and bind it to the custom SQL role that you create. You then create a Pub/Sub topic that's used to trigger the execution of a Cloud Function. You also create a Cloud Scheduler job to periodically execute the data export function.

Create a service account for the Cloud Function

The first step is to create a custom service account and bind it to the custom SQL role that you create.

  1. Create an IAM service account to be used by the Cloud Function:

    gcloud iam service-accounts create ${GCF_NAME} \
        --display-name "Service Account for GCF and SQL Admin API"
    
  2. Grant the Cloud Function service account access to the custom SQL role:

    gcloud projects add-iam-policy-binding ${PROJECT_ID} \
        --member="serviceAccount:${GCF_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" \
        --role="projects/${PROJECT_ID}/roles/${SQL_ROLE}"
    

Create a Pub/Sub topic

The next step is to create a Pub/Sub topic that's used to trigger the Cloud Function that interacts with the Cloud SQL database.

  • Create the Pub/Sub topic:

    gcloud pubsub topics create ${PUBSUB_TOPIC}
    

Create a Cloud Function

Next, you create the Cloud Function.

  1. Create a main.py file by pasting the following into Cloud Shell:

    cat <<EOF > main.py
    
    import base64
    import logging
    import json
    
    from datetime import datetime
    from httplib2 import Http
    
    from googleapiclient import discovery
    from googleapiclient.errors import HttpError
    from oauth2client.client import GoogleCredentials
    
    def main(event, context):
        pubsub_message = json.loads(base64.b64decode(event['data']).decode('utf-8'))
        credentials = GoogleCredentials.get_application_default()
    
        service = discovery.build('sqladmin', 'v1beta4', http=credentials.authorize(Http()), cache_discovery=False)
    
        datestamp = datetime.now().strftime("%Y%m%d%H%M") # format timestamp: YearMonthDayHourMinute
        uri = "{0}/backup-{1}-{2}.gz".format(pubsub_message['gs'], pubsub_message['db'], datestamp)
    
        instances_export_request_body = {
          "exportContext": {
            "kind": "sql#exportContext",
            "fileType": "SQL",
            "uri": uri,
            "databases": [
              pubsub_message['db']
            ]
          }
        }
    
        try:
          request = service.instances().export(
                project=pubsub_message['project'],
                instance=pubsub_message['instance'],
                body=instances_export_request_body
            )
          response = request.execute()
        except HttpError as err:
            logging.error("Could NOT run backup. Reason: {}".format(err))
        else:
          logging.info("Backup task status: {}".format(response))
    EOF
    
  2. Create a requirements.txt file by pasting the following into Cloud Shell:

    cat <<EOF > requirements.txt
    google-api-python-client
    Oauth2client
    EOF
    
  3. Deploy the code. When you are asked if you want to allow unauthenticated invocations of the new function, answer no.

    gcloud functions deploy ${GCF_NAME} \
        --trigger-topic ${PUBSUB_TOPIC} \
        --runtime python37 \
        --entry-point main \
        --service-account ${GCF_NAME}@${PROJECT_ID}.iam.gserviceaccount.com
    

Create a Cloud Scheduler job

Finally, you create a Cloud Scheduler job to periodically execute the data export function.

  1. Create an App Engine instance for the Cloud Scheduler job:

    gcloud app create --region=${REGION}
    
  2. Create a Cloud Scheduler job to periodically execute the data export function:

    gcloud scheduler jobs create pubsub ${SCHEDULER_JOB} --schedule '0 23 * * *' --topic ${PUBSUB_TOPIC} --message-body '{"db":"recommendation_spark","instance":'\"${SQL_INSTANCE}\"',"project":'\"${PROJECT_ID}\"',"gs":'\"gs://${BUCKET_NAME}\"'}' --time-zone 'America/Los_Angeles'
    

    This job is scheduled to run at 11 PM every day.

Testing your solution

The final step is to test your solution. You start by running the Cloud Scheduler job.

  1. Run the Cloud Scheduler job manually to trigger a MySQL dump of your database.

    gcloud scheduler jobs run ${SCHEDULER_JOB}
    
  2. List the operations performed on the MySQL instance, and verify that there's an operation of type EXPORT:

    gcloud sql operations list --instance ${SQL_INSTANCE} --limit 1
    

    The output shows a completed export job—for example:

    NAME                                  TYPE    START                          END                            ERROR  STATUS
    8b031f0b-9d66-47fc-ba21-67dc20193749  EXPORT  2020-02-06T21:55:22.240+00:00  2020-02-06T21:55:32.614+00:00  -      DONE
    
  3. Check the Cloud Storage bucket to see if the database dump file was created:

    gsutil ls gs://${BUCKET_NAME} | grep backup-recommendation_spark
    

    You see a file named backup-database_name-timestamp.gz after the STATUS operation from the previous step returns DONE.

Cleaning up

You can avoid incurring charges to your Google Cloud account for the resources used in this tutorial by following these steps. The easiest way to eliminate billing is to delete the project you created for the tutorial.

  1. In the Cloud Console, go to the Manage resources page.

    Go to the Manage resources page

  2. In the project list, select the project that you want to delete and then click Delete .
  3. In the dialog, type the project ID and then click Shut down to delete the project.

What's next