This tutorial demonstrates how you can use a proxy that passes the results of a query through Cloud Data Loss Prevention (Cloud DLP) so the results returned are tokenized or de-identified. This proxy is a demonstration for this tutorial and is not intended for production purposes.
The tutorial is intended for database admins, security professionals, and cloud architects interested in using Cloud DLP to tokenize and de-identify data stored in Cloud SQL. This tutorial assumes you are familiar with Cloud Shell and Cloud SQL.
In every organization, there is an ever increasing amount of data being stored, much of which can be considered as sensitive data. Not everyone accessing this stored data needs to see the sensitive data when running queries against the database.
This tutorial shows one way you can allow your users to access a Cloud SQL database that contains sensitive data, but prevent this sensitive data from being viewed when looking at query results.
The following diagram shows the deployment architecture for this tutorial.
In this tutorial, you set up a DLP proxy that acts as an interface between Cloud SQL and a simple SQL client.
Objectives
- Create a database and populate a table with sample sensitive data.
- Create Cloud DLP templates.
- Download the DLP proxy server and client JAR files.
- Configure the DLP proxy to connect to the Cloud SQL instance database and to use the Cloud DLP templates.
- Test the proxy features.
Costs
This tutorial uses the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.
When you finish this tutorial, you can avoid continued billing by deleting the resources you created. For more information, see Cleaning up.
Before you begin
-
Sign in to your Google Account.
If you don't already have one, sign up for a new account.
-
In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.
- Enable the required APIs.
Creating a Cloud SQL for MySQL database
In this section, you create a Cloud SQL instance, create a Cloud SQL for MySQL database, and add some sample sensitive data.
-
In the Cloud Console, activate Cloud Shell.
At the bottom of the Cloud Console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Cloud SDK already installed, including the
gcloud
command-line tool, and with values already set for your current project. It can take a few seconds for the session to initialize.
All commands in this tutorial are run in Cloud Shell.
- Copy the files needed to complete the tutorial from Google's public
Cloud Storage bucket into a folder called
gcp-dlp-tutorial
:mkdir gcp-dlp-proxy-tutorial gsutil cp gs://solutions-public-assets/dlp-cloudsql-proxy/*.* gcp-dlp-proxy-tutorial
- Set the Cloud project:
gcloud config set project project-id
Replace the following:
project-id
: Your Cloud project ID.
- Change directories to the folder you copied the tutorial files to:
cd gcp-dlp-proxy-tutorial
- To see a list of zones, run the following command:
gcloud compute zones list
For more information about zones, see Geography and regions.
- Choose a zone to create the Cloud SQL instance and export the
zone as an environment variable:
export ZONE=zone
Replace the following:
zone
: A zone of your choosing where you create the Cloud SQL instance.
-
Create a Cloud SQL for MySQL instance called
dlp-test-instance
:gcloud sql instances create dlp-test-instance --zone $ZONE
The instance is created when the output is similar to the following:
Creating Cloud SQL instance...done. Created [https://www.googleapis.com/sql/v1beta4/projects/[PROJECT_ID]/instances/dlp-test-instance]. NAME DATABASE_VERSION LOCATION TIER PRIMARY_ADDRESS PRIVATE_ADDRESS STATUS dlp-test-instance MYSQL_5_7 [ZONE] db-n1-standard-1 35.195.26.16 - RUNNABLE
- Set the password for the
root@%
MYSQL user.gcloud sql users set-password root \ --host=% --instance=dlp-test-instance \ --password=password
Replace the following:
password
: A strong password.
The output displays the following:
Updating Cloud SQL user...done.
- Create a database in your instance called
dlp-test-db
:gcloud sql databases create dlp-test-db --instance dlp-test-instance
The output is similar to the following:
Creating Cloud SQL database...done. Created database [dlp-test-db]. instance: dlp-test-instance name: dlp-test-db project: project-id
- Create a database user called
dlptester
:gcloud sql users create dlptester \ --host=% --instance=dlp-test-instance --password=password-database
Replace the following:
password-database
: Substitute a strong password. Make a note of the password because you need it later in the tutorial.
The output displays the following:
Creating Cloud SQL user...done. Created user [dlptester].
-
Connect to the Cloud SQL for MySQL instance:
gcloud sql connect dlp-test-instance --user=dlptester
When prompted, enter
password-database
. - At the MySQL prompt, create a table called
TEST_DATA
and populate it with a few rows of sample sensitive data:source create_test_data.sql
- At the MySQL prompt, run a query to show that the test data was loaded
into the table:
select * from TEST_DATA LIMIT 10;
The results of the query are similar to the following:
MySQL [dlp-test-db]> select * from TEST_DATA LIMIT 10; +------+---------------------+--------------+-------------+--------+ | id | email | phone | ssn | metric | +------+---------------------+--------------+-------------+--------+ | 1 | mallory@example.org | 858-222-0222 | 222-22-2222 | 5 | | 2 | james@example.org | 858-333-0333 | 333-33-3333 | 8 | | 3 | mallory@example.org | 858-222-0222 | 222-22-2222 | 8 | | 4 | maria@example.org | 858-444-0444 | 444-44-4444 | 1 | | 1 | mallory@example.org | 858-222-0222 | 222-22-2222 | 5 | | 2 | james@example.org | 858-333-0333 | 333-33-3333 | 8 | | 3 | mallory@example.org | 858-222-0222 | 222-22-2222 | 8 | | 4 | maria@example.org | 858-444-0444 | 444-44-4444 | 1 | +------+---------------------+--------------+-------------+--------+ 8 rows in set (0.00 sec)
- Exit the MySQL client:
exit
Creating Cloud DLP templates
In this section, you create two Cloud DLP templates. An inspect
template to inspect the data in the TEST_DATA
tables for sensitive data as
defined by the InfoTypes. The second template is a de-identification template
to tokenize the sensitive data defined by the InfoTypes.
Create an inspect template
In the Cloud Console, go to the Cloud DLP page.
Click Create and select Template.
In the Template ID field, enter
inspect-template-1
.Click Continue.
Set the InfoTypes to
EMAIL_ADDRESS
,PHONE_NUMBER
, andUS_SOCIAL_SECURITY_NUMBER
.Click Create.
Create a de-identification template
In this section, you create a de-identification template named
de-identify-template-1
, configured to replace the sensitive information with a
surrogate value generated by using cryptographic hashing. For more information,
see the
cryptoHashConfig
section of the Cloud DLP documentation.
Go to the API Explorer.
In the parent field, replace
[YOUR-PROJECT]
with your Cloud project ID.Select Execute, and when prompted, select Execute again.
You get a
200
response and output similar to the following:{ "name": "projects/YOUR-PROJECT-ID/deidentifyTemplates/de-identify-template-1", "createTime": "2019-10-15T12:46:51.944253Z", "updateTime": "2019-10-15T12:46:51.944253Z", "deidentifyConfig": { "infoTypeTransformations": { "transformations": [ { "infoTypes": [ { "name": "PHONE_NUMBER" }, { "name": "US_SOCIAL_SECURITY_NUMBER" }, { "name": "EMAIL_ADDRESS" } ], "primitiveTransformation": { "cryptoDeterministicConfig": { "cryptoKey": { "transient": { "name": "abc" } }, "surrogateInfoType": { "name": "S1" }
To validate that your de-identify template was successfully saved, go back to the API Explorer to make a call to list the templates.
In the parent field, enter
projects/project-id
.Select Execute, and when prompted select Execute again.
The output is similar to the following and includes the template you created:
Configuring and using the DLP proxy
In this section, you configure the DLP proxy, connect to the proxy by using the client, and test that the query results are processed by Cloud DLP.
In Cloud Shell, change to the directory with the folder that you copied the tutorial files to:
cd ~/gcp-dlp-proxy-tutorial
Edit the DLP proxy server configuration file:
nano config.json
Edit the following lines replacing the placeholders with the values that reflect your configuration
"db_instance": "project_id:region:instance_name", "db_dbname": "dbase_name", "projectID": "project_id", "audit_template": "inspect_template", "deidentify_template": "deidentify_template",
Replace the following:
project_id
: Your Cloud project ID.region
: The region name corresponding to the zone where your Cloud SQL instance was created. For example, if your instance was created in theus-central1-a
zone, then the region isus-central1.
instance_name
: Your Cloud SQL instance name such asdlp-test-instance
.dbase_name
: Your database name such asdlp-test-db
.inspect_template
: Your inspect template name such asinspect-template-1
.inspect_template
: Your de-identification template name such asde-identify-template-1
.
To start the DLP proxy, run the proxy in the background by directing terminal output to the
log.txt
file:java -jar dlp-cloudsql-proxy-server.jar database_userpassword-databaselog.txt 2>&1 &
Replace the following:
database_user
: The username for the database user.password-database
: The password for the database user that you previously created.
To check that the DLP proxy is running, check that the process is running:
ps -ax | grep dlp-cloudsql-proxy-server.jar
The process running the DLP proxy is shown in the output:
1233 pts/1 Sl 0:08 java -jar dlP-cloudsql-proxy-Server.jar dlptester XXXXXXXX 1266 pts/1 S+ 0:00 grep --color=auto dlp-cloud sql-proxy-Server
If you don't see the process running, check the
log.txt
file for error messages.Start the simple client provided to connect to the DLP proxy on port
5000
:java -jar dlp-sql-proxy-client-0.1.0.jar 5000
From the client, run a simple query. This command can take a few moments because Cloud DLP needs to process the data before returning it to you.
select * from TEST_DATA LIMIT 10;
The output contains the phone number, email address, and SSN values in the query results replaced with cryptographically hashed surrogate values. Compare this to the results returned when not using the proxy earlier.
Connected to SQL Proxy Enter query here (enter "quit" to disconnect): select * from TEST_DATA LIMIT 10; id: 1 phone: S1(40):AYnPXA6QcxlpOITeaZZM3+U/s7KtXyTqv5KAbB8= metric: 5 email: S1(48):AS8hnzz6g1fGCabK2cbhPn5X7Qvc7FKmLTXnKF1iKF2nRjGQ ssn: S1(40):AQ1jSCdIsctniCkCHNEbsc+kliJArHmM6bOJRg== id: 2 phone: S1(40):AfiaKlOYHAzgWGtrkpvaCWUY1e2yvMaK7IQvM/M= metric: 8 email: S1(48):ASF7UBFuZ/xUzXJjD1Ap745xcGmKpTjw+IwIBB/F/OX53w== ssn: S1(40):AWYXh7U8PhNUZu+fXLuLFS0KCMmcM5uKH17rig== id: 3 phone: S1(40):AYnPXA6QcxlpOITeaZZM3+U/s7KtXyTqv5KAbB8= metric: 8 email: S1(48):AS8hnzz6g1fGCabK2cbhPn5X7Qvc7FKmLTXnKF1iKF2nRjGQ ssn: S1(40):AQ1jSCdIsctniCkCHNEbsc+kliJArHmM6bOJRg== id: 4 phone: S1(40):AXFnPkf620wIBXxW0uuBXSVERzRzvhdgbCdaYIw= metric: 1 email: S1(48):AZgs1pfjEPLL6sDAcIZWuCZhC/saw/rQrujxAFy/O60uIA== ssn: S1(40):AatdICHDpwCc19ELEjCS8zAmeSVvx/1KB5/S2Q== End of query results
Quit the client:
quit
Terminate the DLP proxy server:
killall java
Write the
log.txt
file to the terminal to show the output from the DLP proxy:cat log.txt
The output from the audit phase is shown at the end of the file, matching the three InfoTypes you configured earlier:
RunCommand DLP Result DLP Findings: Quote: Info type: PHONE_NUMBER Likelihood: VERY_LIKELY Quote: Info type: EMAIL_ADDRESS Likelihood: LIKELY Quote: Info type: US_SOCIAL_SECURITY_NUMBER Likelihood: VERY_LIKELY Quote: Info type: PHONE_NUMBER Likelihood: VERY_LIKELY Quote: Info type: EMAIL_ADDRESS Likelihood: LIKELY Quote: Info type: US_SOCIAL_SECURITY_NUMBER Likelihood: VERY_LIKELY Quote: Info type: PHONE_NUMBER Likelihood: VERY_LIKELY Quote: Info type: EMAIL_ADDRESS Likelihood: LIKELY Quote: Info type: US_SOCIAL_SECURITY_NUMBER Likelihood: VERY_LIKELY Quote: Info type: PHONE_NUMBER Likelihood: VERY_LIKELY Quote: Info type: EMAIL_ADDRESS Likelihood: LIKELY Quote: Info type: US_SOCIAL_SECURITY_NUMBER Likelihood: VERY_LIKELY Got DLP Result Sensitive information found write log: 1503994 Logged Main.Query: 1503994
Cleaning up
The easiest way to eliminate billing is to delete the Cloud project you created for the tutorial. Alternatively, you can delete the individual resources.Delete the project
- In the Cloud Console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
What's next
- Read Example architecture for using a DLP proxy to query a database containing sensitive data for examples of how you can use the DLP proxy in your architecture
- Try out other Google Cloud features for yourself. Have a look at our tutorials.