Inspect data from external sources using hybrid jobs

This topic describes how to use hybrid jobs and hybrid job triggers to inspect external data for sensitive information. To learn more about hybrid jobs and hybrid job triggers—including examples of hybrid environments—see Hybrid jobs and hybrid job triggers.

Introduction to hybrid jobs and hybrid job triggers

Hybrid jobs and hybrid job triggers let you broaden the scope of protection that Cloud Data Loss Prevention provides beyond simple content inspection requests and Google Cloud storage repository scanning. Using hybrid jobs and hybrid job triggers, you can stream data from virtually any source—including outside Google Cloud—directly to Cloud DLP, and let Cloud DLP inspect the data for sensitive information. Cloud DLP automatically saves and aggregates the scan results for further analysis.

Comparison of hybrid jobs and hybrid job triggers

When you create hybrid jobs, they run until you stop them. They accept all incoming data as long as the data is properly routed and formatted.

Hybrid job triggers work in a similar manner to hybrid jobs, but you don't need to explicitly stop a job within a hybrid job trigger. Cloud DLP automatically stops jobs within hybrid job triggers at the end of each day.

In addition, with a hybrid job trigger, you can stop and start new jobs within the trigger without having to reconfigure your hybridInspect requests. For example, you can send data to a hybrid job trigger, then stop the active job, change its configuration, start a new job within that trigger, and then continue to send data to the same trigger.

For more guidance about which option fits your use case, see Typical hybrid inspection scenarios on this page.

Definition of terms

This topic uses the following terms:

  • External data: data stored outside Google Cloud or data that Cloud DLP doesn't natively support.

  • Hybrid job: an inspection job that is configured to scan data from virtually any source.

  • Hybrid job trigger: a job trigger that is configured to scan data from virtually any source.

  • hybridInspect request: a request that contains the external data that you want to inspect. When sending this request, you specify the hybrid job or hybrid job trigger to send the request to.

For general information about jobs and job triggers, see Jobs and job triggers.

Hybrid inspection process

There are three steps in the hybrid inspection process.

  1. Choose the data that you want to send to Cloud DLP.

    The data can originate from within Google Cloud or outside it. For example, you can configure a custom script or application to send data to Cloud DLP, enabling you to inspect data in flight, from another cloud service, an on-premises data repository, or virtually any other data source.

  2. Set up a hybrid job or hybrid job trigger in Cloud DLP from scratch or using an inspection template.

    After you set up a hybrid job or hybrid job trigger, Cloud DLP actively listens for data sent to it. When your custom script or application sends data to this hybrid job or hybrid job trigger, the data is inspected and its results stored according to the configuration.

    When you set up the hybrid job or hybrid job trigger, you can specify where you want to save or publish the findings. Options include saving to BigQuery and publishing notifications to Pub/Sub, Cloud Monitoring, or email.

  3. Send a hybridInspect request to the hybrid job or hybrid job trigger.

    A hybridInspect request contains the data to be scanned. In the request, include metadata (also referred to as labels and table identifiers) that describes the content and lets Cloud DLP identify the information you want to track. For example, if you're scanning related data across several requests (such as rows in the same database table), you can use the same metadata in those related requests. You can then, collect, tally, and analyze findings for that database table.

As the hybrid job runs and inspects requests, inspection results are available when Cloud DLP generates them. In contrast, actions, like Pub/Sub notifications, don't occur until your application ends the hybrid job.

Diagram depicting the hybrid job inspection process

Considerations

When working with hybrid jobs and job triggers, consider the following points:

  • Hybrid jobs and hybrid job triggers don't support filtering and sampling.
  • Jobs and job triggers aren't subject to service level objectives (SLO), but there are steps you can take to reduce latency. For more information, see Job latency.

Before you begin

Before setting up and using hybrid jobs or hybrid job triggers, be sure you've done the following:

Create a new project, enable billing, and enable Cloud DLP

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project. Learn how to check if billing is enabled on a project.

  4. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  5. Make sure that billing is enabled for your Google Cloud project. Learn how to check if billing is enabled on a project.

  6. Enable the Cloud DLP API.

    Enable the API

Configure the data source

Before Cloud DLP can inspect your data, you must send the data to Cloud DLP. Regardless of what method you use to configure the hybrid job or hybrid job trigger, you must set up your external source to send data to the DLP API.

For information about the required format for hybrid inspection requests, see Hybrid content item formatting. For information about the types of metadata that you can include with the data in your request, see Types of metadata you can provide.

Create a hybrid job or hybrid job trigger

To let Cloud DLP inspect the data that you send to it, you must first set up a hybrid job or hybrid job trigger. For information about which one to create, see Typical hybrid inspection scenarios on this page.

Console

In the Google Cloud console, go to the Create job or job trigger page:

Go to Create job or job trigger

The following sections describe how to fill in the sections of the Create job or job trigger page that are relevant to hybrid inspection operations.

Choose input data

In this section, you specify the input data for Cloud DLP to inspect.

  1. Optional: For Name, name the job by entering a value in the Job ID field. Leaving this field blank causes Cloud DLP to auto-generate an identifier.
  2. Optional: From the Resource location menu, choose the region where you want to store the hybrid job or hybrid job trigger. For more information, see Specifying processing locations.
  3. For Storage type, select Hybrid.

  4. Optional: For Description, describe the hybrid job or hybrid job trigger that you're creating. For example, you can include information about the source of the data to be inspected.

  5. Optional: For Required labels, click Add label, and enter a label that you want to require from hybridInspect requests. A hybridInspect request that doesn't specify this label isn't processed by this hybrid job or hybrid job trigger. You can add up to 10 required labels. For more information, see Require labels from hybridInspect requests on this page.

  6. Optional: For Optional labels, enter any key-value pairs that you want to attach to the results of all hybridInspect requests sent to this job or job trigger. You can add up to 10 optional labels. For more information, see Optional labels.

  7. Optional: For Tabular data options, enter the field name of the primary key column if you plan to send tabular data in your hybridInspect requests. For more information, see Tabular data options.

  8. Click Continue.

Configure detection

In this section, you specify the types of sensitive data that Cloud DLP will inspect the input data for. Your choices are:

  • Template: If you've already created a template in the current project that you want to use to define the Cloud DLP detection parameters, click the Template name field, and then choose the template from the list that appears.
  • InfoTypes: Cloud DLP selects the most common built-in infoTypes to detect. To change the infoTypes, or to choose a custom infoType to use, click Manage infoTypes. You can also fine-tune the detection criteria in the Inspection rulesets and Confidence threshold sections. For more details, see Configure detection.

After configuring detection parameters, click Continue.

Add actions

This section is where you specify where to save the findings from each inspection scan and whether to be notified by email or Pub/Sub notification message whenever a scan has completed. If you don't save findings to BigQuery, the scan results only contain statistics about the number and infoTypes of the findings.

  • Save to BigQuery: Every time a scan runs, Cloud DLP saves scan findings to the BigQuery table you specify here. If you don't specify a table ID, BigQuery will assign a default name to a new table the first time the scan runs. If you specify an existing table, Cloud DLP appends scan findings to it.
  • Publish to Pub/Sub: When a job is done, a Pub/Sub message will be emitted.
  • Notify by email: When a job is done, an email message will be sent.
  • Publish to Cloud Monitoring: When a job is done, its findings will be published to Monitoring.

After choosing actions, click Continue.

Schedule

This section is where you specify whether to create a single job that runs immediately or a job trigger that runs every time properly routed and formatted data is received by Cloud DLP.

Do one of the following:

  • To run the hybrid job immediately, choose None (run the one-off job immediately upon creation).

  • To configure the job so that data received from the source triggers the job, choose Create a trigger to run the job on a periodic schedule.

    Hybrid job triggers aggregate API calls, letting you see finding results and trends over time.

For more information, see Comparison of hybrid jobs and hybrid job triggers.

Review

You can review a JSON summary of the scan here. Be sure to note the name of the hybrid ob or hybrid job trigger; you need this information when sending data to Cloud DLP for inspection.

After reviewing the JSON summary, click Create.

Cloud DLP starts the hybrid job or hybrid job trigger immediately. An inspection scan is started when you send a hybridInspect request to this hybrid job or hybrid job trigger.

API

A job is represented in the DLP API by the DlpJobs resource. To create a hybrid job, you call the projects.locations.dlpJobs.create method.

A job trigger is represented in the DLP API by the JobTrigger resource. To create a hybrid job trigger, you call the projects.locations.jobTriggers.create method.

The DlpJobs or JobTrigger object that you create must have the following settings:

  1. In the inspectJob field, set an InspectJobConfig object.
  2. In the InspectJobConfig object, in the storageConfig field, set a StorageConfig object.
  3. In the StorageConfig object, in the hybridOptions field, set a HybridOptions object. This object contains metadata about the data that you want to inspect.
  4. In the InspectJobConfig object, in the actions field, add any actions (Action) that you want Cloud DLP to perform at the end of each job.

    The publishSummaryToCscc and publishFindingsToCloudDataCatalog actions aren't supported for this operation. For more information about actions, see Actions.

  5. Specify what to scan for and how to scan by doing one or both of the following:

    • Set the inspectTemplateName field to the full resource name of an inspection template that you want to use, if available.

    • Set the inspectConfig field.

    If you set both inspectTemplateName and inspectConfig fields, their settings are combined.

About the JSON examples

The following tabs contain JSON examples that you can send to Cloud DLP to create a hybrid job or a hybrid job trigger. These hybrid job and hybrid job trigger examples are configured to do the following:

  • Process any hybridInspect request if the request has the label appointment-bookings-comments.
  • Scan the content in the hybridInspect request for email addresses.
  • Attach the "env": "prod" label to findings.
  • For tabular data, get the value of the cell in the booking_id column (the primary key) that's in the same row as the cell where the sensitive data was found. Cloud DLP attaches this identifier to the finding, so that you can trace the finding to the specific row it came from.
  • Send an email when the job stops. The email goes to IAM project owners and technical Essential Contacts.
  • Send the findings to Cloud Monitoring when the job is stopped.

To view the JSON examples, see the following tabs.

Hybrid job

This tab contains a JSON example that you can use to create a hybrid job.

To create a hybrid job, send a POST request to the following endpoint.

HTTP method and URL

POST https://dlp.googleapis.com/v2/projects/PROJECT_ID/locations/REGION/dlpJobs

Replace the following:

  • PROJECT_ID: the ID of the project where you want to store the hybrid job.
  • REGION: the geographic region where you want to store the hybrid job.

JSON input

{
  "jobId": "postgresql-table-comments",
  "inspectJob": {
    "actions": [
      {
        "jobNotificationEmails": {}
      },
      {
        "publishToStackdriver": {}
      }
    ],
    "inspectConfig": {
      "infoTypes": [
        {
          "name": "EMAIL_ADDRESS"
        }
      ],
      "minLikelihood": "POSSIBLE",
      "includeQuote": true
    },
    "storageConfig": {
      "hybridOptions": {
        "description": "Hybrid job for data from the comments field of a table that contains customer appointment bookings",
        "requiredFindingLabelKeys": [
          "appointment-bookings-comments"
        ],
        "labels": {
          "env": "prod"
        },
        "tableOptions": {
          "identifyingFields": [
            {
              "name": "booking_id"
            }
          ]
        }
      }
    }
  }
}

JSON output

{
"name": "projects/PROJECT_ID/locations/REGION/dlpJobs/i-postgresql-table-comments",
"type": "INSPECT_JOB",
"state": "ACTIVE",
"inspectDetails": {
  "requestedOptions": {
    "snapshotInspectTemplate": {},
    "jobConfig": {
      "storageConfig": {
        "hybridOptions": {
          "description": "Hybrid job for data from the comments field of a table that contains customer appointment bookings",
          "requiredFindingLabelKeys": [
            "appointment-bookings-comments"
          ],
          "labels": {
            "env": "prod"
          },
          "tableOptions": {
            "identifyingFields": [
              {
                "name": "booking_id"
              }
            ]
          }
        }
      },
      "inspectConfig": {
        "infoTypes": [
          {
            "name": "EMAIL_ADDRESS"
          }
        ],
        "minLikelihood": "POSSIBLE",
        "limits": {},
        "includeQuote": true
      },
      "actions": [
        {
          "jobNotificationEmails": {}
        },
        {
          "publishToStackdriver": {}
        }
      ]
    }
  },
  "result": {
    "hybridStats": {}
  }
},
"createTime": "JOB_CREATION_DATETIME",
"startTime": "JOB_START_DATETIME"
}

Cloud DLP creates the hybrid job and generates a job ID. In this example, the job ID is i-postgresql-table-comments. Take note of the job ID. You need it in your hybridInspect request.

To stop a hybrid job, you must call the projects.locations.dlpJobs.finish method explicitly. The DLP API doesn't automatically stop hybrid jobs. In contrast, the DLP API automatically stops jobs within hybrid job triggers at the end of each day.

Hybrid job trigger

This tab contains a JSON example that you can use to create a hybrid job trigger.

To create a hybrid job trigger, send a POST request to the following endpoint.

HTTP method and URL

POST https://dlp.googleapis.com/v2/projects/PROJECT_ID/locations/REGION/jobTriggers

Replace the following:

  • PROJECT_ID: the ID of the project where you want to store the hybrid job trigger.
  • REGION: the geographic region where you want to store the hybrid job trigger.

JSON input

{
    "triggerId": "postgresql-table-comments",
    "jobTrigger": {
      "triggers": [
        {
          "manual": {}
        }
      ],
      "inspectJob": {
        "actions": [
          {
            "jobNotificationEmails": {}
          },
          {
            "publishToStackdriver": {}
          }
        ],
        "inspectConfig": {
          "infoTypes": [
              {
                "name": "EMAIL_ADDRESS"
              }
          ],
          "minLikelihood": "POSSIBLE",
          "limits": {},
          "includeQuote": true
        },
        "storageConfig": {
          "hybridOptions": {
            "description": "Hybrid job trigger for data from the comments field of a table that contains customer appointment bookings",
            "requiredFindingLabelKeys": [
                "appointment-bookings-comments"
              ],
            "labels": {
              "env": "prod"
            },
            "tableOptions": {
              "identifyingFields": [
                {
                  "name": "booking_id"
                }
              ]
            }
          }
        }
      }
    }
  }

JSON output

{
"name": "projects/PROJECT_ID/locations/REGION/jobTriggers/postgresql-table-comments",
"inspectJob": {
  "storageConfig": {
    "hybridOptions": {
      "description": "Hybrid job trigger for data from the comments field of a table that contains customer appointment bookings",
      "requiredFindingLabelKeys": [
        "appointment-bookings-comments"
      ],
      "labels": {
        "env": "prod"
      },
      "tableOptions": {
        "identifyingFields": [
          {
            "name": "booking_id"
          }
        ]
      }
    }
  },
  "inspectConfig": {
    "infoTypes": [
      {
        "name": "EMAIL_ADDRESS"
      }
    ],
    "minLikelihood": "POSSIBLE",
    "limits": {},
    "includeQuote": true
  },
  "actions": [
    {
      "jobNotificationEmails": {}
    },
    {
      "publishToStackdriver": {}
    }
  ]
},
"triggers": [
  {
    "manual": {}
  }
],
"createTime": ""JOB_CREATION_DATETIME",
"updateTime": "TRIGGER_UPDATE_DATETIME",
"status": "HEALTHY"
}

Cloud DLP creates the hybrid job trigger. The output contains the name of the hybrid job trigger. In this example, that is postgresql-table-comments. Take note of the name. You need it in your hybridInspect request.

Unlike with hybrid jobs, the DLP API automatically stops jobs within hybrid job triggers at the end of each day. Thus, you don't need to explicitly call the projects.locations.dlpJobs.finish method.

When creating a hybrid job or hybrid job trigger, you can use the APIs Explorer on the following API reference pages, respectively:

In the Request parameters field, enter projects/PROJECT_ID/locations/REGION. Then, in the Request body field, paste the sample JSON for the object you're trying to create.

A successful request, even one created in APIs Explorer, creates a hybrid job or hybrid job trigger.

For general information about using JSON to send requests to the DLP API, see the JSON quickstart.

Send data to the hybrid job or hybrid job trigger

To inspect data, you must send a hybridInspect request, in the correct format, to either a hybrid job or hybrid job trigger.

Hybrid content item formatting

The following is a simple example of a hybridInspect request sent to Cloud DLP for processing by a hybrid job or hybrid job trigger. Note the structure of the JSON object, including the hybridItem field, which contains the following fields:

  • item: contains the actual content to inspect.
  • findingDetails: contains metadata to associate with the content.
{
  "hybridItem": {
    "item": {
      "value": "My email is test@example.org"
    },
    "findingDetails": {
      "containerDetails": {
        "fullPath": "10.0.0.2:logs1:app1",
        "relativePath": "app1",
        "rootPath": "10.0.0.2:logs1",
        "type": "logging_sys",
        "version": "1.2"
      },
      "labels": {
        "env": "prod",
        "appointment-bookings-comments": ""
      }
    }
  }
}

For comprehensive information about the contents of hybrid inspection items, see the API reference content for the HybridContentItem object.

Hybrid inspection endpoints

For data to be inspected using a hybrid job or hybrid job trigger, you must send a hybridInspect request to the correct endpoint.

HTTP method and URL for hybrid jobs

POST https://dlp.googleapis.com/v2/projects/PROJECT_ID/locations/REGION/dlpJobs/JOB_ID:hybridInspect

For more information about this endpoint, see the API reference page for the projects.locations.dlpJobs.hybridInspect method.

HTTP method and URL for hybrid job triggers

https://dlp.googleapis.com/v2/projects/PROJECT_ID/locations/REGION/jobTriggers/TRIGGER_NAME:hybridInspect

For more information about this endpoint, see the API reference page for the projects.locations.jobTriggers.hybridInspect method.

Replace the following:

  • PROJECT_ID: your project identifier.
  • REGION: the geographical region where you want to store the hybridInspect request. This region must be the same as the hybrid job's region.
  • JOB_ID: the ID that you gave the hybrid job, prefixed with i-.

    To look up the job ID, in Cloud DLP, click Inspection > Inspect jobs.

  • TRIGGER_NAME: the name that you gave the hybrid job trigger.

    To look up the name of the job trigger, in Cloud DLP, click Inspection > Job triggers.

Require labels from hybridInspect requests

If you want to control which hybridInspect requests can be processed by a hybrid job or hybrid job trigger, you can set required labels. Any hybridInspect requests for that hybrid job or hybrid job trigger that don't include these required labels are rejected.

To set a required label, do the following:

  1. When creating the hybrid job or hybrid job trigger, set the requiredFindingLabelKeys field to a list of required labels.

    The following example sets appointment-bookings-comments as a required label in a hybrid job or hybrid job trigger.

    "hybridOptions": {
      ...
      "requiredFindingLabelKeys": [
        "appointment-bookings-comments"
      ],
      "labels": {
        "env": "prod"
      },
      ...
    }
    
  2. In the hybridInspect request, in the labels field, add each required label as a key in a key-value pair. The corresponding value can be an empty string.

    The following example sets the required label, appointment-bookings-comments, in a hybridInspect request.

    {
      "hybridItem": {
        "item": {
          "value": "My email is test@example.org"
        },
        "findingDetails": {
          "containerDetails": {...},
          "labels": {
            "appointment-bookings-comments": ""
          }
        }
      }
    }
    

If you don't include the required label in your hybridInspect request, you get an error like the following:

{
  "error": {
    "code": 400,
    "message": "Trigger required labels that were not included: [appointment-bookings-comments]",
    "status": "INVALID_ARGUMENT"
  }
}

Typical hybrid inspection scenarios

The following sections describe typical uses for hybrid inspection and their corresponding workflows.

Perform a one-off scan

Execute a one-off scan of a database outside of Google Cloud as part of a quarterly spot check of databases.

  1. Create a hybrid job using the Google Cloud console or the DLP API.

  2. Send data to the job by calling projects.locations.dlpJobs.hybridInspect. If you want to inspect more data, repeat this step as many times as needed.

  3. After sending data for inspection, call the projects.locations.dlpJobs.finish method.

    Cloud DLP performs the actions specified in your projects.locations.dlpJobs.create request.

Configure continuous monitoring

Monitor all new content added daily to a database that Cloud DLP does not natively support.

  1. Create a hybrid job trigger using the Google Cloud console or the DLP API.

  2. Activate the job trigger by calling the projects.locations.jobTriggers.activate method.

  3. Send data to the job trigger by calling projects.locations.jobTriggers.hybridInspect. If you want to inspect more data, repeat this step as many times as needed.

In this case, you don't need to call the projects.locations.dlpJobs.finish method. Cloud DLP auto-partitions the data that you send. As long as the job trigger is active, at the end of each day, Cloud DLP performs the actions you specified when you created your hybrid job trigger.

Scan data coming into a database

Scan data coming into a database, while controlling how the data is partitioned. Each job in a job trigger is a single partition.

  1. Create a hybrid job trigger using the Google Cloud console or the DLP API.

  2. Activate the job trigger by calling the projects.locations.jobTriggers.activate method.

    The system returns the job ID of a single job. You need this job ID in the next step.

  3. Send data to the job by calling projects.locations.dlpJobs.hybridInspect.

    In this case, you send the data to the job instead of the job trigger. This approach lets you control how the data that you send for inspection is partitioned. If you want to add more data for inspection in the current partition, repeat this step.

  4. After sending data to the job, call the projects.locations.dlpJobs.finish method.

    Cloud DLP performs the actions specified in your projects.locations.jobTriggers.create request.

  5. If you want to create another job for the next partition, activate the job trigger again, and then send the data to the resulting job.

Monitor traffic from a proxy

Monitor traffic from a proxy installed between two custom applications.

  1. Create a hybrid job trigger using the Google Cloud console or the DLP API.

  2. Activate the job trigger by calling the projects.locations.jobTriggers.activate method.

  3. Send data to the job trigger by calling projects.locations.jobTriggers.hybridInspect. If you want to inspect more data, repeat this step as many times as needed.

    You can call this request indefinitely for all network traffic. Make sure you include metadata in each request.

In this case, you don't need to call the projects.locations.dlpJobs.finish method. Cloud DLP auto-partitions the data that you send. As long as the job trigger is active, at the end of each day, Cloud DLP performs the actions you specified when you created your hybrid job trigger.

What's next