Using a DLP proxy to interact with Cloud SQL

This tutorial demonstrates how you can use a proxy that passes the results of a query through Cloud Data Loss Prevention (Cloud DLP) so the results returned are tokenized or de-identified. This proxy is a demonstration for this tutorial and is not intended for production purposes.

The tutorial is intended for database admins, security professionals, and cloud architects interested in using Cloud DLP to tokenize and de-identify data stored in Cloud SQL. This tutorial assumes you are familiar with Cloud Shell and Cloud SQL.

In every organization, there is an ever increasing amount of data being stored, much of which can be considered as sensitive data. Not everyone accessing this stored data needs to see the sensitive data when running queries against the database.

This tutorial shows one way you can allow your users to access a Cloud SQL database that contains sensitive data, but prevent this sensitive data from being viewed when looking at query results.

The following diagram shows the deployment architecture for this tutorial.

Deployment architecture where users can access Cloud SQL database.

In this tutorial, you set up a DLP proxy that acts as an interface between Cloud SQL and a simple SQL client.

Objectives

  • Create a database and populate a table with sample sensitive data.
  • Create Cloud DLP templates.
  • Download the DLP proxy server and client JAR files.
  • Configure the DLP proxy to connect to the Cloud SQL instance database and to use the Cloud DLP templates.
  • Test the proxy features.

Costs

This tutorial uses the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

When you finish this tutorial, you can avoid continued billing by deleting the resources you created. For more information, see Cleaning up.

Before you begin

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. In the Cloud Console, on the project selector page, select or create a Cloud project.

    Go to the project selector page

  3. Make sure that billing is enabled for your Google Cloud project. Learn how to confirm billing is enabled for your project.

  4. Enable the required APIs.

    Enable the APIs

Creating a Cloud SQL for MySQL database

In this section, you create a Cloud SQL instance, create a Cloud SQL for MySQL database, and add some sample sensitive data.

  1. In the Cloud Console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Cloud Console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Cloud SDK already installed, including the gcloud command-line tool, and with values already set for your current project. It can take a few seconds for the session to initialize.

  2. All commands in this tutorial are run in Cloud Shell.
  3. Copy the files needed to complete the tutorial from Google's public Cloud Storage bucket into a folder called gcp-dlp-tutorial:
    mkdir gcp-dlp-proxy-tutorial
    gsutil cp  gs://solutions-public-assets/dlp-cloudsql-proxy/*.* gcp-dlp-proxy-tutorial
    
  4. Set the Cloud project:
    gcloud config set project project-id
    

    Replace the following:

    • project-id: Your Cloud project ID.
  5. Change directories to the folder you copied the tutorial files to:
    cd  gcp-dlp-proxy-tutorial
    
  6. To see a list of zones, run the following command:
    gcloud compute zones list
    

    For more information about zones, see Geography and regions.

  7. Choose a zone to create the Cloud SQL instance and export the zone as an environment variable:
    export ZONE=zone
    

    Replace the following:

    • zone: A zone of your choosing where you create the Cloud SQL instance.
  8. Create a Cloud SQL for MySQL instance called dlp-test-instance:
    gcloud sql instances create dlp-test-instance  --zone  $ZONE
    

    The instance is created when the output is similar to the following:

    Creating Cloud SQL instance...done.
    Created [https://www.googleapis.com/sql/v1beta4/projects/[PROJECT_ID]/instances/dlp-test-instance].
    NAME               DATABASE_VERSION  LOCATION  TIER              PRIMARY_ADDRESS  PRIVATE_ADDRESS  STATUS
    dlp-test-instance  MYSQL_5_7         [ZONE]    db-n1-standard-1  35.195.26.16     -                RUNNABLE
    
  9. Set the password for the root@% MYSQL user.
    gcloud sql users set-password root \
      --host=% --instance=dlp-test-instance \
      --password=password
    

    Replace the following:

    • password: A strong password.

    The output displays the following:

    Updating Cloud SQL user...done.
    
  10. Create a database in your instance called dlp-test-db:
    gcloud sql databases create  dlp-test-db --instance dlp-test-instance
    

    The output is similar to the following:

    Creating Cloud SQL database...done.
    Created database [dlp-test-db].
    instance: dlp-test-instance
    name: dlp-test-db
    project: project-id
    
  11. Create a database user called dlptester:
    gcloud sql users create dlptester \
      --host=% --instance=dlp-test-instance --password=password-database
    

    Replace the following:

    • password-database: Substitute a strong password. Make a note of the password because you need it later in the tutorial.

    The output displays the following:

    Creating Cloud SQL user...done.
    Created user [dlptester].
    
  12. Connect to the Cloud SQL for MySQL instance:
    gcloud sql connect dlp-test-instance  --user=dlptester
    

    When prompted, enter password-database.

  13. At the MySQL prompt, create a table called TEST_DATA and populate it with a few rows of sample sensitive data:
    source create_test_data.sql
    
  14. At the MySQL prompt, run a query to show that the test data was loaded into the table:
    select * from TEST_DATA LIMIT 10;
    

    The results of the query are similar to the following:

    MySQL [dlp-test-db]> select * from TEST_DATA LIMIT 10;
    +------+---------------------+--------------+-------------+--------+
    | id   | email               | phone        | ssn         | metric |
    +------+---------------------+--------------+-------------+--------+
    |    1 | mallory@example.org | 858-222-0222 | 222-22-2222 |      5 |
    |    2 | james@example.org   | 858-333-0333 | 333-33-3333 |      8 |
    |    3 | mallory@example.org | 858-222-0222 | 222-22-2222 |      8 |
    |    4 | maria@example.org   | 858-444-0444 | 444-44-4444 |      1 |
    |    1 | mallory@example.org | 858-222-0222 | 222-22-2222 |      5 |
    |    2 | james@example.org   | 858-333-0333 | 333-33-3333 |      8 |
    |    3 | mallory@example.org | 858-222-0222 | 222-22-2222 |      8 |
    |    4 | maria@example.org   | 858-444-0444 | 444-44-4444 |      1 |
    +------+---------------------+--------------+-------------+--------+
    8 rows in set (0.00 sec)
    
  15. Exit the MySQL client:
    exit
    

Creating Cloud DLP templates

In this section, you create two Cloud DLP templates. An inspect template to inspect the data in the TEST_DATA tables for sensitive data as defined by the InfoTypes. The second template is a de-identification template to tokenize the sensitive data defined by the InfoTypes.

Create an inspect template

  1. In the Cloud Console, go to the Cloud DLP page.

    GO TO CLOUD DLP

  2. Click Create and select Template.

  3. In the Template ID field, enter inspect-template-1.

  4. Click Continue.

  5. Set the InfoTypes to EMAIL_ADDRESS, PHONE_NUMBER, and US_SOCIAL_SECURITY_NUMBER.

  6. Click Create.

Create a de-identification template

In this section, you create a de-identification template named de-identify-template-1, configured to replace the sensitive information with a surrogate value generated by using cryptographic hashing. For more information, see the cryptoHashConfig section of the Cloud DLP documentation.

  1. Go to the API Explorer.

  2. In the parent field, replace [YOUR-PROJECT] with your Cloud project ID.

  3. Select Execute, and when prompted, select Execute again.

    You get a 200 response and output similar to the following:

    {
      "name": "projects/YOUR-PROJECT-ID/deidentifyTemplates/de-identify-template-1",
      "createTime": "2019-10-15T12:46:51.944253Z",
      "updateTime": "2019-10-15T12:46:51.944253Z",
      "deidentifyConfig": {
        "infoTypeTransformations": {
          "transformations": [
            {
              "infoTypes": [
                {
                  "name": "PHONE_NUMBER"
                },
                {
                  "name": "US_SOCIAL_SECURITY_NUMBER"
                },
                {
                  "name": "EMAIL_ADDRESS"
                }
              ],
              "primitiveTransformation": {
                "cryptoDeterministicConfig": {
                  "cryptoKey": {
                    "transient": {
                      "name": "abc"
                    }
                  },
                  "surrogateInfoType": {
                    "name": "S1"
                  }
    
  4. To validate that your de-identify template was successfully saved, go back to the API Explorer to make a call to list the templates.

  5. In the parent field, enter projects/project-id.

  6. Select Execute, and when prompted select Execute again.

    The output is similar to the following and includes the template you created:

    Output of API Explorer.

Configuring and using the DLP proxy

In this section, you configure the DLP proxy, connect to the proxy by using the client, and test that the query results are processed by Cloud DLP.

  1. In Cloud Shell, change to the directory with the folder that you copied the tutorial files to:

    cd  ~/gcp-dlp-proxy-tutorial
    
  2. Edit the DLP proxy server configuration file:

    nano config.json
    
  3. Edit the following lines replacing the placeholders with the values that reflect your configuration

    "db_instance": "project_id:region:instance_name",
    "db_dbname": "dbase_name",
    "projectID": "project_id",
    "audit_template": "inspect_template",
     "deidentify_template": "deidentify_template",
    

    Replace the following:

    • project_id: Your Cloud project ID.
    • region: The region name corresponding to the zone where your Cloud SQL instance was created. For example, if your instance was created in the us-central1-a zone, then the region is us-central1.
    • instance_name: Your Cloud SQL instance name such as dlp-test-instance.
    • dbase_name: Your database name such as dlp-test-db.
    • inspect_template: Your inspect template name such as inspect-template-1.
    • inspect_template: Your de-identification template name such as de-identify-template-1.
  4. To start the DLP proxy, run the proxy in the background by directing terminal output to thelog.txt file:

    java -jar dlp-cloudsql-proxy-server.jar database_userpassword-databaselog.txt 2>&1 &
    

    Replace the following:

    • database_user: The username for the database user.
    • password-database: The password for the database user that you previously created.
  5. To check that the DLP proxy is running, check that the process is running:

    ps -ax | grep dlp-cloudsql-proxy-server.jar
    

    The process running the DLP proxy is shown in the output:

    1233 pts/1    Sl     0:08 java -jar dlP-cloudsql-proxy-Server.jar dlptester XXXXXXXX
    1266 pts/1    S+     0:00 grep --color=auto dlp-cloud sql-proxy-Server
    

    If you don't see the process running, check the log.txt file for error messages.

  6. Start the simple client provided to connect to the DLP proxy on port 5000:

    java -jar dlp-sql-proxy-client-0.1.0.jar 5000
    
  7. From the client, run a simple query. This command can take a few moments because Cloud DLP needs to process the data before returning it to you.

    select * from TEST_DATA LIMIT 10;
    

    The output contains the phone number, email address, and SSN values in the query results replaced with cryptographically hashed surrogate values. Compare this to the results returned when not using the proxy earlier.

    Connected to SQL Proxy
    Enter query here (enter "quit" to disconnect):
    select * from TEST_DATA LIMIT 10;
    id: 1
    phone: S1(40):AYnPXA6QcxlpOITeaZZM3+U/s7KtXyTqv5KAbB8=
    metric: 5
    email: S1(48):AS8hnzz6g1fGCabK2cbhPn5X7Qvc7FKmLTXnKF1iKF2nRjGQ
    ssn: S1(40):AQ1jSCdIsctniCkCHNEbsc+kliJArHmM6bOJRg==
    
    id: 2
    phone: S1(40):AfiaKlOYHAzgWGtrkpvaCWUY1e2yvMaK7IQvM/M=
    metric: 8
    email: S1(48):ASF7UBFuZ/xUzXJjD1Ap745xcGmKpTjw+IwIBB/F/OX53w==
    ssn: S1(40):AWYXh7U8PhNUZu+fXLuLFS0KCMmcM5uKH17rig==
    
    id: 3
    phone: S1(40):AYnPXA6QcxlpOITeaZZM3+U/s7KtXyTqv5KAbB8=
    metric: 8
    email: S1(48):AS8hnzz6g1fGCabK2cbhPn5X7Qvc7FKmLTXnKF1iKF2nRjGQ
    ssn: S1(40):AQ1jSCdIsctniCkCHNEbsc+kliJArHmM6bOJRg==
    
    id: 4
    phone: S1(40):AXFnPkf620wIBXxW0uuBXSVERzRzvhdgbCdaYIw=
    metric: 1
    email: S1(48):AZgs1pfjEPLL6sDAcIZWuCZhC/saw/rQrujxAFy/O60uIA==
    ssn: S1(40):AatdICHDpwCc19ELEjCS8zAmeSVvx/1KB5/S2Q==
    
    End of query results
    
  8. Quit the client:

    quit
    
  9. Terminate the DLP proxy server:

    killall java
    
  10. Write the log.txt file to the terminal to show the output from the DLP proxy:

    cat  log.txt
    

    The output from the audit phase is shown at the end of the file, matching the three InfoTypes you configured earlier:

    RunCommand
    DLP Result
    DLP Findings:
            Quote:                          Info type: PHONE_NUMBER Likelihood: VERY_LIKELY
            Quote:                          Info type: EMAIL_ADDRESS        Likelihood: LIKELY
            Quote:                          Info type: US_SOCIAL_SECURITY_NUMBER    Likelihood: VERY_LIKELY
            Quote:                          Info type: PHONE_NUMBER Likelihood: VERY_LIKELY
            Quote:                          Info type: EMAIL_ADDRESS        Likelihood: LIKELY
            Quote:                          Info type: US_SOCIAL_SECURITY_NUMBER    Likelihood: VERY_LIKELY
            Quote:                          Info type: PHONE_NUMBER Likelihood: VERY_LIKELY
            Quote:                          Info type: EMAIL_ADDRESS        Likelihood: LIKELY
            Quote:                          Info type: US_SOCIAL_SECURITY_NUMBER    Likelihood: VERY_LIKELY
            Quote:                          Info type: PHONE_NUMBER Likelihood: VERY_LIKELY
            Quote:                          Info type: EMAIL_ADDRESS        Likelihood: LIKELY
            Quote:                          Info type: US_SOCIAL_SECURITY_NUMBER    Likelihood: VERY_LIKELY
    
    Got DLP Result
    Sensitive information found
    write log: 1503994
    Logged Main.Query: 1503994
    

Cleaning up

The easiest way to eliminate billing is to delete the Cloud project you created for the tutorial. Alternatively, you can delete the individual resources.

Delete the project

  1. In the Cloud Console, go to the Manage resources page.

    Go to the Manage resources page

  2. In the project list, select the project that you want to delete and then click Delete .
  3. In the dialog, type the project ID and then click Shut down to delete the project.

What's next