This topic describes how to use hybrid jobs and hybrid job triggers to inspect external data for sensitive information. To learn more about hybrid jobs and hybrid job triggers—including examples of hybrid environments—see Hybrid jobs and hybrid job triggers.
Introduction to hybrid jobs and hybrid job triggers
Hybrid jobs and hybrid job triggers let you broaden the scope of protection that Sensitive Data Protection provides beyond simple content inspection requests and Google Cloud storage repository scanning. Using hybrid jobs and hybrid job triggers, you can stream data from virtually any source—including outside Google Cloud—directly to Sensitive Data Protection, and let Sensitive Data Protection inspect the data for sensitive information. Sensitive Data Protection automatically saves and aggregates the scan results for further analysis.
Comparison of hybrid jobs and hybrid job triggers
When you create hybrid jobs, they run until you stop them. They accept all incoming data as long as the data is properly routed and formatted.
Hybrid job triggers work in a similar manner to hybrid jobs, but you don't need to explicitly stop a job within a hybrid job trigger. Sensitive Data Protection automatically stops jobs within hybrid job triggers at the end of each day.
In addition, with a hybrid job trigger, you can stop and start new jobs within
the trigger without having to reconfigure your hybridInspect
requests. For
example, you can send data to a hybrid job trigger, then stop the active job,
change its configuration, start a new job within that trigger, and then continue
to send data to the same trigger.
For more guidance about which option fits your use case, see Typical hybrid inspection scenarios on this page.
Definition of terms
This topic uses the following terms:
External data: data stored outside Google Cloud or data that Sensitive Data Protection doesn't natively support.
Hybrid job: an inspection job that is configured to scan data from virtually any source.
Hybrid job trigger: a job trigger that is configured to scan data from virtually any source.
hybridInspect
request: a request that contains the external data that you want to inspect. When sending this request, you specify the hybrid job or hybrid job trigger to send the request to.
For general information about jobs and job triggers, see Jobs and job triggers.
Hybrid inspection process
There are three steps in the hybrid inspection process.
Choose the data that you want to send to Sensitive Data Protection.
The data can originate from within Google Cloud or outside it. For example, you can configure a custom script or application to send data to Sensitive Data Protection, enabling you to inspect data in flight, from another cloud service, an on-premises data repository, or virtually any other data source.
Set up a hybrid job or hybrid job trigger in Sensitive Data Protection from scratch or using an inspection template.
After you set up a hybrid job or hybrid job trigger, Sensitive Data Protection actively listens for data sent to it. When your custom script or application sends data to this hybrid job or hybrid job trigger, the data is inspected and its results stored according to the configuration.
When you set up the hybrid job or hybrid job trigger, you can specify where you want to save or publish the findings. Options include saving to BigQuery and publishing notifications to Pub/Sub, Cloud Monitoring, or email.
Send a
hybridInspect
request to the hybrid job or hybrid job trigger.A
hybridInspect
request contains the data to be scanned. In the request, include metadata (also referred to as labels and table identifiers) that describes the content and lets Sensitive Data Protection identify the information you want to track. For example, if you're scanning related data across several requests (such as rows in the same database table), you can use the same metadata in those related requests. You can then, collect, tally, and analyze findings for that database table.
As the hybrid job runs and inspects requests, inspection results are available when Sensitive Data Protection generates them. In contrast, actions, like Pub/Sub notifications, don't occur until your application ends the hybrid job.
Considerations
When working with hybrid jobs and job triggers, consider the following points:
- Hybrid jobs and hybrid job triggers don't support filtering and sampling.
- Jobs and job triggers aren't subject to service level objectives (SLO), but there are steps you can take to reduce latency. For more information, see Job latency.
Before you begin
Before setting up and using hybrid jobs or hybrid job triggers, be sure you've done the following:
Create a new project, enable billing, and enable Sensitive Data Protection
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Sensitive Data Protection API.
Configure the data source
Before Sensitive Data Protection can inspect your data, you must send the data to Sensitive Data Protection. Regardless of what method you use to configure the hybrid job or hybrid job trigger, you must set up your external source to send data to the DLP API.
For information about the required format for hybrid inspection requests, see Hybrid content item formatting. For information about the types of metadata that you can include with the data in your request, see Types of metadata you can provide.
Create a hybrid job or hybrid job trigger
To let Sensitive Data Protection inspect the data that you send to it, you must first set up a hybrid job or hybrid job trigger. For information about which one to create, see Typical hybrid inspection scenarios on this page.
Console
In the Google Cloud console, go to the Create job or job trigger page:
Go to Create job or job trigger
The following sections describe how to fill in the sections of the Create job or job trigger page that are relevant to hybrid inspection operations.
Choose input data
In this section, you specify the input data for Sensitive Data Protection to inspect.
- Optional: For Name, name the job by entering a value in the Job ID field. Leaving this field blank causes Sensitive Data Protection to auto-generate an identifier.
- Optional: From the Resource location menu, choose the region where you want to store the hybrid job or hybrid job trigger. For more information, see Specifying processing locations.
For Storage type, select Hybrid.
Optional: For Description, describe the hybrid job or hybrid job trigger that you're creating. For example, you can include information about the source of the data to be inspected.
Optional: For Required labels, click Add label, and enter a label that you want to require from
hybridInspect
requests. AhybridInspect
request that doesn't specify this label isn't processed by this hybrid job or hybrid job trigger. You can add up to 10 required labels. For more information, see Require labels fromhybridInspect
requests on this page.Optional: For Optional labels, enter any key-value pairs that you want to attach to the results of all
hybridInspect
requests sent to this job or job trigger. You can add up to 10 optional labels. For more information, see Optional labels.Optional: For Tabular data options, enter the field name of the primary key column if you plan to send tabular data in your
hybridInspect
requests. For more information, see Tabular data options.Click Continue.
Configure detection
In this section, you specify the types of sensitive data that Sensitive Data Protection will inspect the input data for. Your choices are:
- Template: If you've already created a template in the current project that you want to use to define the Sensitive Data Protection detection parameters, click the Template name field, and then choose the template from the list that appears.
- InfoTypes: Sensitive Data Protection selects the most common built-in infoTypes to detect. To change the infoTypes, or to choose a custom infoType to use, click Manage infoTypes. You can also fine-tune the detection criteria in the Inspection rulesets and Confidence threshold sections. For more details, see Configure detection.
After configuring detection parameters, click Continue.
Add actions
This section is where you specify where to save the findings from each inspection scan and whether to be notified by email or Pub/Sub notification message whenever a scan has completed. If you don't save findings to BigQuery, the scan results only contain statistics about the number and infoTypes of the findings.
- Save to BigQuery: Every time a scan runs, Sensitive Data Protection saves scan findings to the BigQuery table you specify here. If you don't specify a table ID, BigQuery will assign a default name to a new table the first time the scan runs. If you specify an existing table, Sensitive Data Protection appends scan findings to it.
Publish to Pub/Sub: When a job is done, a Pub/Sub message will be emitted.
Notify by email: When a job is done, an email message will be sent.
Publish to Cloud Monitoring: When a job is done, its findings will be published to Monitoring.
After choosing actions, click Continue.
Schedule
This section is where you specify whether to create a single job that runs immediately or a job trigger that runs every time properly routed and formatted data is received by Sensitive Data Protection.
Do one of the following:
To run the hybrid job immediately, choose None (run the one-off job immediately upon creation).
To configure the job so that data received from the source triggers the job, choose Create a trigger to run the job on a periodic schedule.
Hybrid job triggers aggregate API calls, letting you see finding results and trends over time.
For more information, see Comparison of hybrid jobs and hybrid job triggers.
Review
You can review a JSON summary of the scan here. Be sure to note the name of the hybrid ob or hybrid job trigger; you need this information when sending data to Sensitive Data Protection for inspection.
After reviewing the JSON summary, click Create.
Sensitive Data Protection starts the hybrid job or hybrid job trigger immediately.
An inspection scan is started when you send a hybridInspect
request to this hybrid job or hybrid job trigger.
API
A job is represented in the DLP API by the
DlpJobs
resource. To create a hybrid job,
you call the
projects.locations.dlpJobs.create
method.
A job trigger is represented in the DLP API by the
JobTrigger
resource. To
create a hybrid job trigger, you call the
projects.locations.jobTriggers.create
method.
The DlpJobs
or JobTrigger
object that you create must have the following
settings:
- In the
inspectJob
field, set anInspectJobConfig
object. - In the
InspectJobConfig
object, in thestorageConfig
field, set aStorageConfig
object. - In the
StorageConfig
object, in thehybridOptions
field, set aHybridOptions
object. This object contains metadata about the data that you want to inspect. In the
InspectJobConfig
object, in theactions
field, add any actions (Action
) that you want Sensitive Data Protection to perform at the end of each job.The
publishSummaryToCscc
andpublishFindingsToCloudDataCatalog
actions aren't supported for this operation. For more information about actions, see Actions.Specify what to scan for and how to scan by doing one or both of the following:
Set the
inspectTemplateName
field to the full resource name of an inspection template that you want to use, if available.Set the
inspectConfig
field.
If you set both
inspectTemplateName
andinspectConfig
fields, their settings are combined.
About the JSON examples
The following tabs contain JSON examples that you can send to Sensitive Data Protection to create a hybrid job or a hybrid job trigger. These hybrid job and hybrid job trigger examples are configured to do the following:
- Process any
hybridInspect
request if the request has the labelappointment-bookings-comments
. - Scan the content in the
hybridInspect
request for email addresses. - Attach the
"env": "prod"
label to findings. - For tabular data, get the value of the cell in the
booking_id
column (the primary key) that's in the same row as the cell where the sensitive data was found. Sensitive Data Protection attaches this identifier to the finding, so that you can trace the finding to the specific row it came from. - Send an email when the job stops. The email goes to IAM project owners and technical Essential Contacts.
- Send the findings to Cloud Monitoring when the job is stopped.
To view the JSON examples, see the following tabs.
Hybrid job
This tab contains a JSON example that you can use to create a hybrid job.
To create a hybrid job, send a POST
request to the following endpoint.
HTTP method and URL
POST https://dlp.googleapis.com/v2/projects/PROJECT_ID/locations/REGION/dlpJobs
Replace the following:
- PROJECT_ID: the ID of the project where you want to store the hybrid job.
- REGION: the geographic region where you want to store the hybrid job.
JSON input
{
"jobId": "postgresql-table-comments",
"inspectJob": {
"actions": [
{
"jobNotificationEmails": {}
},
{
"publishToStackdriver": {}
}
],
"inspectConfig": {
"infoTypes": [
{
"name": "EMAIL_ADDRESS"
}
],
"minLikelihood": "POSSIBLE",
"includeQuote": true
},
"storageConfig": {
"hybridOptions": {
"description": "Hybrid job for data from the comments field of a table that contains customer appointment bookings",
"requiredFindingLabelKeys": [
"appointment-bookings-comments"
],
"labels": {
"env": "prod"
},
"tableOptions": {
"identifyingFields": [
{
"name": "booking_id"
}
]
}
}
}
}
}
JSON output
{ "name": "projects/PROJECT_ID/locations/REGION/dlpJobs/i-postgresql-table-comments", "type": "INSPECT_JOB", "state": "ACTIVE", "inspectDetails": { "requestedOptions": { "snapshotInspectTemplate": {}, "jobConfig": { "storageConfig": { "hybridOptions": { "description": "Hybrid job for data from the comments field of a table that contains customer appointment bookings", "requiredFindingLabelKeys": [ "appointment-bookings-comments" ], "labels": { "env": "prod" }, "tableOptions": { "identifyingFields": [ { "name": "booking_id" } ] } } }, "inspectConfig": { "infoTypes": [ { "name": "EMAIL_ADDRESS" } ], "minLikelihood": "POSSIBLE", "limits": {}, "includeQuote": true }, "actions": [ { "jobNotificationEmails": {} }, { "publishToStackdriver": {} } ] } }, "result": { "hybridStats": {} } }, "createTime": "JOB_CREATION_DATETIME", "startTime": "JOB_START_DATETIME" }
Sensitive Data Protection creates the hybrid job and generates a job ID. In this
example, the job ID is i-postgresql-table-comments
. Take note of the job ID.
You need it in your hybridInspect
request.
To stop a hybrid job, you must call the
projects.locations.dlpJobs.finish
method explicitly. The DLP API doesn't automatically stop hybrid
jobs. In contrast, the DLP API automatically stops jobs within hybrid
job triggers at the end of each day.
Hybrid job trigger
This tab contains a JSON example that you can use to create a hybrid job trigger.
To create a hybrid job trigger, send a POST
request to the following endpoint.
HTTP method and URL
POST https://dlp.googleapis.com/v2/projects/PROJECT_ID/locations/REGION/jobTriggers
Replace the following:
- PROJECT_ID: the ID of the project where you want to store the hybrid job trigger.
- REGION: the geographic region where you want to store the hybrid job trigger.
JSON input
{
"triggerId": "postgresql-table-comments",
"jobTrigger": {
"triggers": [
{
"manual": {}
}
],
"inspectJob": {
"actions": [
{
"jobNotificationEmails": {}
},
{
"publishToStackdriver": {}
}
],
"inspectConfig": {
"infoTypes": [
{
"name": "EMAIL_ADDRESS"
}
],
"minLikelihood": "POSSIBLE",
"limits": {},
"includeQuote": true
},
"storageConfig": {
"hybridOptions": {
"description": "Hybrid job trigger for data from the comments field of a table that contains customer appointment bookings",
"requiredFindingLabelKeys": [
"appointment-bookings-comments"
],
"labels": {
"env": "prod"
},
"tableOptions": {
"identifyingFields": [
{
"name": "booking_id"
}
]
}
}
}
}
}
}
JSON output
{ "name": "projects/PROJECT_ID/locations/REGION/jobTriggers/postgresql-table-comments", "inspectJob": { "storageConfig": { "hybridOptions": { "description": "Hybrid job trigger for data from the comments field of a table that contains customer appointment bookings", "requiredFindingLabelKeys": [ "appointment-bookings-comments" ], "labels": { "env": "prod" }, "tableOptions": { "identifyingFields": [ { "name": "booking_id" } ] } } }, "inspectConfig": { "infoTypes": [ { "name": "EMAIL_ADDRESS" } ], "minLikelihood": "POSSIBLE", "limits": {}, "includeQuote": true }, "actions": [ { "jobNotificationEmails": {} }, { "publishToStackdriver": {} } ] }, "triggers": [ { "manual": {} } ], "createTime": ""JOB_CREATION_DATETIME", "updateTime": "TRIGGER_UPDATE_DATETIME", "status": "HEALTHY" }
Sensitive Data Protection creates the hybrid job trigger. The output contains the
name of the hybrid job trigger. In this example, that is
postgresql-table-comments
. Take note of the name. You need it in your
hybridInspect
request.
Unlike with hybrid jobs, the DLP API automatically stops jobs within
hybrid job triggers at the end of each day. Thus, you don't need to explicitly
call the
projects.locations.dlpJobs.finish
method.
When creating a hybrid job or hybrid job trigger, you can use the APIs Explorer on the following API reference pages, respectively:
In the Request parameters field, enter
projects/PROJECT_ID/locations/REGION
. Then,
in the Request body field, paste the sample JSON for the object you're
trying to create.
A successful request, even one created in APIs Explorer, creates a hybrid job or hybrid job trigger.
For general information about using JSON to send requests to the DLP API, see the JSON quickstart.
Send data to the hybrid job or hybrid job trigger
To inspect data, you must send a hybridInspect
request, in the correct format,
to either a hybrid job or hybrid job trigger.
Hybrid content item formatting
The following is a simple example of a hybridInspect
request sent to
Sensitive Data Protection for processing by a hybrid job or hybrid job trigger.
Note the structure of the JSON object, including the hybridItem
field, which
contains the following fields:
item
: contains the actual content to inspect.findingDetails
: contains metadata to associate with the content.
{
"hybridItem": {
"item": {
"value": "My email is test@example.org"
},
"findingDetails": {
"containerDetails": {
"fullPath": "10.0.0.2:logs1:app1",
"relativePath": "app1",
"rootPath": "10.0.0.2:logs1",
"type": "logging_sys",
"version": "1.2"
},
"labels": {
"env": "prod",
"appointment-bookings-comments": ""
}
}
}
}
For comprehensive information about the contents of hybrid inspection items,
see the API reference content for the HybridContentItem
object.
Hybrid inspection endpoints
For data to be inspected using a hybrid job or hybrid job trigger, you must send
a hybridInspect
request to the correct endpoint.
HTTP method and URL for hybrid jobs
POST https://dlp.googleapis.com/v2/projects/PROJECT_ID/locations/REGION/dlpJobs/JOB_ID:hybridInspect
For more information about this endpoint, see the API reference page for the
projects.locations.dlpJobs.hybridInspect
method.
HTTP method and URL for hybrid job triggers
https://dlp.googleapis.com/v2/projects/PROJECT_ID/locations/REGION/jobTriggers/TRIGGER_NAME:hybridInspect
For more information about this endpoint, see the API reference page for the
projects.locations.jobTriggers.hybridInspect
method.
Replace the following:
- PROJECT_ID: your project identifier.
- REGION: the geographical region
where you want to store the
hybridInspect
request. This region must be the same as the hybrid job's region. JOB_ID: the ID that you gave the hybrid job, prefixed with
i-
.To look up the job ID, in Sensitive Data Protection, click Inspection > Inspect jobs.
TRIGGER_NAME: the name that you gave the hybrid job trigger.
To look up the name of the job trigger, in Sensitive Data Protection, click Inspection > Job triggers.
Require labels from hybridInspect
requests
If you want to control which hybridInspect
requests can be processed by a hybrid job or hybrid job trigger, you can
set required labels. Any hybridInspect
requests for that hybrid job or hybrid job trigger that don't include these
required labels are rejected.
To set a required label, do the following:
When creating the hybrid job or hybrid job trigger, set the
requiredFindingLabelKeys
field to a list of required labels.The following example sets
appointment-bookings-comments
as a required label in a hybrid job or hybrid job trigger."hybridOptions": { ... "requiredFindingLabelKeys": [ "appointment-bookings-comments" ], "labels": { "env": "prod" }, ... }
In the
hybridInspect
request, in thelabels
field, add each required label as a key in a key-value pair. The corresponding value can be an empty string.The following example sets the required label,
appointment-bookings-comments
, in ahybridInspect
request.{ "hybridItem": { "item": { "value": "My email is test@example.org" }, "findingDetails": { "containerDetails": {...}, "labels": { "appointment-bookings-comments": "" } } } }
If you don't include the required label in your hybridInspect
request, you get
an error like the following:
{ "error": { "code": 400, "message": "Trigger required labels that were not included: [appointment-bookings-comments]", "status": "INVALID_ARGUMENT" } }
Code sample: Create a hybrid job trigger and send data to it
C#
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PHP
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Typical hybrid inspection scenarios
The following sections describe typical uses for hybrid inspection and their corresponding workflows.
Perform a one-off scan
Execute a one-off scan of a database outside of Google Cloud as part of a quarterly spot check of databases.
Create a hybrid job using the Google Cloud console or the DLP API.
Send data to the job by calling
projects.locations.dlpJobs.hybridInspect
. If you want to inspect more data, repeat this step as many times as needed.After sending data for inspection, call the
projects.locations.dlpJobs.finish
method.Sensitive Data Protection performs the actions specified in your
projects.locations.dlpJobs.create
request.
Configure continuous monitoring
Monitor all new content added daily to a database that Sensitive Data Protection does not natively support.
Create a hybrid job trigger using the Google Cloud console or the DLP API.
Activate the job trigger by calling the
projects.locations.jobTriggers.activate
method.Send data to the job trigger by calling
projects.locations.jobTriggers.hybridInspect
. If you want to inspect more data, repeat this step as many times as needed.
In this case, you don't need to call the projects.locations.dlpJobs.finish
method. Sensitive Data Protection auto-partitions the data that you send. As long
as the job trigger is active, at the end of each day, Sensitive Data Protection
performs the actions you specified when you created your hybrid job trigger.
Scan data coming into a database
Scan data coming into a database, while controlling how the data is partitioned. Each job in a job trigger is a single partition.
Create a hybrid job trigger using the Google Cloud console or the DLP API.
Activate the job trigger by calling the
projects.locations.jobTriggers.activate
method.The system returns the job ID of a single job. You need this job ID in the next step.
Send data to the job by calling
projects.locations.dlpJobs.hybridInspect
.In this case, you send the data to the job instead of the job trigger. This approach lets you control how the data that you send for inspection is partitioned. If you want to add more data for inspection in the current partition, repeat this step.
After sending data to the job, call the
projects.locations.dlpJobs.finish
method.Sensitive Data Protection performs the actions specified in your
projects.locations.jobTriggers.create
request.If you want to create another job for the next partition, activate the job trigger again, and then send the data to the resulting job.
Monitor traffic from a proxy
Monitor traffic from a proxy installed between two custom applications.
Create a hybrid job trigger using the Google Cloud console or the DLP API.
Activate the job trigger by calling the
projects.locations.jobTriggers.activate
method.Send data to the job trigger by calling
projects.locations.jobTriggers.hybridInspect
. If you want to inspect more data, repeat this step as many times as needed.You can call this request indefinitely for all network traffic. Make sure you include metadata in each request.
In this case, you don't need to call the projects.locations.dlpJobs.finish
method. Sensitive Data Protection auto-partitions the data that you send. As long
as the job trigger is active, at the end of each day, Sensitive Data Protection
performs the actions you specified when you created your hybrid job trigger.
What's next
- Learn more about how hybrid jobs and hybrid job triggers work.