Using Cloud DLP with Cloud Data Fusion

This guide explains how to use Cloud Data Loss Prevention (DLP) with Cloud Data Fusion.

Cloud Data Fusion provides a Cloud DLP plugin that provides two transforms that can filter or redact your sensitive data:

  • The PII Filter transform allows you to filter sensitive records from an input stream of data.

  • The Redact transform allows you to transform sensitive data, such as masking the data or applying a one-way hash function.

Costs

This guide uses billable components of Google Cloud, including:

Use the pricing calculator to generate a cost estimate based on your projected usage. New Google Cloud users might be eligible for a free trial.

Before you begin

  1. In the Cloud Console, on the project selector page, select or create a Cloud project.

    Go to the project selector page

  2. Enable the Cloud Data Fusion API for your project.

    Enable the Cloud Data Fusion API

  3. Enable the Cloud DLP API for your project.

    Enable the Cloud DLP API

  4. Create a Cloud Data Fusion instance.

Get Cloud DLP permissions

  1. Open the Cloud IAM page in the Cloud Console.

  2. In the permissions table, in the Member column, find the service account that matches the format service-project-number@gcp-sa-datafusion.iam.gserviceaccount.com.

  3. Click the pencil icon to the right of the service account.

  4. Click Add Another Role.

  5. Click the dropdown that appears.

  6. Use the search bar to search and then select DLP Administrator.

  7. Click Save. Check that DLP Administrator appears in the Role column.

Deploy the Cloud DLP plugin

  1. In the Cloud Console, open the Instances page.

    Open the Instances page

  2. In the Action column, click the View Instance link. Clicking the link opens the Cloud Data Fusion web UI in a new browser tab.

  3. In the Cloud Data Fusion web UI, click Hub in the upper right.

  4. Click the Data Loss Prevention plugin.

  5. Click Deploy.

  6. Click Finish.

  7. Click Create a pipeline.

Use the PII Filter transform

This transform separates sensitive records from non-sensitive records. A record is considered sensitive if it matches criteria that you define in a Cloud DLP template. For example, when you create your template, you can define sensitive data to be credit card information or Social Security numbers.

  1. Create a Cloud DLP inspection template.

  2. In the Studio page of the Cloud Data Fusion web UI, click to expand the Transform menu.

  3. Click the PII Filter transform.

  4. Hold the pointer over the PII Filter node and click Properties.

  5. Under Filter on, choose whether you want to filter records or fields.

    In compliance with Cloud DLP limits, if a record exceeds 0.5 MB, your Cloud Data Fusion pipeline will fail. To avoid such a failure, filter by field instead of record.

  6. Under Template ID, enter the template ID of the Cloud DLP template you created.

  7. Under Error Handling, define how to proceed when your pipeline encounters sensitive data. Choose one of the following error handling options:

    • Stop pipeline: Stops the pipeline as soon as an error is encountered.
    • Skip record: Skips the record that caused the error. The pipeline continues to run, and no error is reported.
    • Send to error: Sends errors to the error port. The pipeline continues to run.
  8. Click the X button.

Use the Redact transform

This transform identifies sensitive records in the input stream and applies transformations that you define to those records. A record is considered sensitive if it matches pre-defined Cloud DLP filters you chose or a custom template you defined.

  1. In the Studio page of the Cloud Data Fusion web UI, click to expand the Transform menu.

  2. Click the Redact transform.

  3. Hold the pointer over the Redact node and click Properties.

  4. Choose if you want to apply transformations to pre-defined filters or if you'd like to create your own.

    You cannot combine these two options. You can either use pre-defined filters OR create a custom template.

    Pre-defined filters

    To apply transformations to pre-defined filters, leave the Custom Template set to No, and under Matching, define a rule:

    1. Following Apply, click the dropdown and choose a transformation. Learn more about the available transformations in the Description section of the plugin's Documentation tab.

    2. Following on, click the dropdown and choose a category, which is a set of pre-defined Cloud DLP filters grouped together by type. For the full list of provided categories and what filters they contain, see the DLP Filter Mapping section in the plugin's Documentation tab.

    To set multiple matching rules, click the + button.

    Custom template

    To apply transformations according to a custom template, set the Custom Template to Yes.

    1. Create a custom Cloud DLP template.

    2. Back in the Cloud Data Fusion web UI, in the Redact properties menu, under Template ID, enter the template ID of the custom template you created.

  5. Click the X button.

What's next