You can schedule inspection scans of your content using Cloud Data Loss Prevention (DLP)'s job trigger feature. Job triggers are events that automate running Cloud DLP jobs to scan GCP storage repositories (Cloud Storage, BigQuery, and Cloud Datastore).
Before you begin
This quickstart assumes that you already have a storage repository in mind that you want to scan. If not, consider scanning one of the available BigQuery public datasets.
Open Cloud DLP
To access Cloud DLP in the GCP Console:
Alternatively, do the following:
- In the GCP Console, if the navigation menu isn't visible, click the navigation button in the upper-left corner of the page.
- Point to Security, and then click Data Loss Prevention.
The main Cloud DLP page opens.
Create a new job trigger and choose input data
To create a job trigger in Cloud DLP:
In the GCP Console, open Cloud DLP.
From the Create menu, choose Job or job trigger.
Alternatively, click the following button:
On the Create job or job trigger page, first enter a name for the job. You can use letters, numbers, and hyphens.
Next, from the Storage type menu, choose what kind of repository stores the data you want to scan—Cloud Storage, BigQuery, or Cloud Datastore:
- For Cloud Storage, either enter the URL of the bucket you want to scan, or choose Include/exclude from the Location type menu, and then click Browse to navigate to the bucket or subfolder you want to scan. Select the Scan folder recursively checkbox to scan the specified directory and all contained directories. Leave it unselected to scan only the specified directory and no deeper.
- For BigQuery, enter the identifiers for the project, dataset, and table that you want to scan.
- For Cloud Datastore, enter the identifiers for the project, namespace, and kind that you want to scan.
Once you're finished specifying the data location and any advanced configuration details, click Continue.
Configure detection parameters
The Configure detection section is where you specify the types of sensitive data you want to scan for.
For this quickstart, leave these sections to their default values. This will cause Cloud DLP to scan a portion of the data repository you've specified (50% of all files in Cloud Storage; up to 1,000 rows in BigQuery) for all of the basic built-in information types (infoTypes).
For detailed information about the settings in this section, see Configure detection in "Creating Cloud DLP jobs and job triggers."
Add post-scan actions
The Add actions section is where you specify actions for Cloud DLP to take with the results of the inspection scan after it has completed. In this step, you will choose to save the inspection results to a new BigQuery table.
For a detailed explanation of each option, see Add actions in "Creating and scheduling Cloud DLP inspection jobs."
Click the BigQuery toggle. As shown in the following screenshot, in the Project ID field, type your project identifier. In the Dataset ID field, type the name you've given your dataset. Leave the Table ID field blank so that Cloud DLP creates a new table. When you're done, click Continue
For more information about actions, see the Actions conceptual topic.
Set a schedule
The Schedule section is where you tell Cloud DLP how often you want it to kick off the job trigger and run the job you've just specified.
Choose Create a trigger to run the job on a periodic schedule from the menu. The default value for how often the job runs is 24 hours. You can change this to any value between 1 and 60 days, specifying the span in hours, days, or weeks.
Select the Limit scans to only new content added or modified after previous scans are completed checkbox to only scan content that is new since the last scan. Be aware that this only applies to content added since the storage repository was last scanned by this job trigger's spawned jobs.
Review the job trigger
The Review section contains a JSON-formatted summary of the job settings you just specified.
Click Create to create the job trigger.
Run the job trigger and view results
Once you create the job trigger, the Trigger details page appears.
To trigger a job immediately, click Run now at the top of the screen.
Jobs that have been triggered by this job trigger are listed in the Triggered jobs section of the details page. After the job trigger you created has run once, select the job by clicking its name beneath the Name column.
The Job details page lists the job's findings first, followed by information about what was scanned for.
If you chose to save results to BigQuery, on the Trigger details page, click View findings in BigQuery. Within the dataset you specified, Cloud DLP has created a new table with the results of the scan. (If Cloud DLP didn't find any matches to your search criteria, no new table will be present.)
To avoid incurring charges to your GCP account for the resources used in this quickstart:
Delete the project
The easiest way to eliminate billing is to delete the project that you created for the tutorial.
To delete the project:
- In the GCP Console, go to the Projects page.
- In the project list, select the project you want to delete and click Delete delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Delete the job trigger
If you created the job trigger in an existing project that you want to keep:
If necessary, choose the name of the project in which you created a job trigger from the menu at the top of the GCP Console. Then open Cloud DLP in the GCP Console.
Click the Job triggers tab. The console displays a list of all job triggers for the current project.
In the Actions column for the job trigger you want to delete, click the three vertical dots, and then click Delete.
Alternatively, from the list of job triggers, click the name of the job you want to delete. On the job trigger's detail page, click Delete.
- Learn more about creating inspection jobs and job triggers, using either Cloud DLP in the GCP Console, the Cloud DLP API, or client libraries in several programming languages: Creating and scheduling Cloud DLP inspection jobs.