Jobs and job triggers

A job is an action that Sensitive Data Protection runs to either scan content for sensitive data or calculate the risk of re-identification. Sensitive Data Protection creates and runs a job resource whenever you tell it to inspect your data.

There are currently two types of Sensitive Data Protection jobs:

  • Inspection jobs inspect your content for sensitive data according to your criteria and generate summary reports of where and what type of sensitive data exists.
  • Risk analysis jobs analyze de-identified data and return metrics about the likelihood that the data can be re-identified.

You can schedule when Sensitive Data Protection runs jobs by creating job triggers. A job trigger is an event that automates the creation of Sensitive Data Protection jobs to scan Google Cloud storage repositories, including Cloud Storage buckets, BigQuery tables, and Datastore kinds.

Job triggers enable you to schedule scan jobs by setting intervals at which each trigger goes off. They can be configured to look for new findings since the last scan run to help monitor changes or additions to content, or to generate up-to-date findings reports. Scheduled triggers run on an interval that you set, from 1 day to 60 days.

Next steps

More information about how to create, edit, and run jobs and job triggers in the following topics:

In addition, the following quickstart is available:

The JobTrigger object

A job trigger is represented in the DLP API by the JobTrigger object.

Job trigger configuration fields

Each JobTrigger contains several configuration fields, including:

  • The trigger's name and display name, and a description.
  • A collection of Trigger objects, each of which contains a Schedule object, which defines the scan recurrence in seconds.
  • An InspectJobConfig object, which contains the configuration information for the triggered job.
  • A Status enumeration, which indicates whether the trigger is currently active.
  • Timestamp fields representing creation, update, and last run times.
  • A collection of Error objects, if any were encountered when the trigger was activated.

Job trigger methods

Each JobTrigger object also includes several built-in methods. Using these methods you can:

Job latency

There are no service level objectives (SLO) guaranteed for jobs and job triggers. Latency is affected by several factors, including the amount of data to scan, the storage repository being scanned, the type and number of infoTypes you are scanning for, the region where the job is processed, and the computing resources available in that region. Therefore, the latency of inspection jobs can't be determined in advance.

To help reduce job latency, you can try the following:

  • If sampling is available for your job or job trigger, enable it.
  • Avoid enabling infoTypes that you don't need. Although the following are useful in certain scenarios, these infoTypes can make requests run much more slowly than requests that don't include them:

    • PERSON_NAME
    • FEMALE_NAME
    • MALE_NAME
    • FIRST_NAME
    • LAST_NAME
    • DATE_OF_BIRTH
    • LOCATION
    • STREET_ADDRESS
    • ORGANIZATION_NAME
  • Always specify infoTypes explicitly. Do not use an empty infoTypes list.

  • If possible, use a different processing region.

If you're still having latency issues with jobs after trying these techniques, consider using content.inspect or content.deidentify requests instead of jobs. These methods are covered under the Service Level Agreement. For more information, see Sensitive Data Protection Service Level Agreement.

Limit scans to only new content

You can configure your job trigger to automatically set the timespan date for files stored in Cloud Storage or BigQuery. When you set the TimespanConfig object to auto-populate, Sensitive Data Protection only scans data that was added or modified since the trigger last ran:

...
  timespan_config {
        enable_auto_population_of_timespan_config: true
      }
...

For BigQuery inspection, only rows that are at least three hours old are included in the scan. See the known issue related to this operation.

Trigger jobs at file upload

In addition to the support for job triggers—which is built into Sensitive Data Protection—Google Cloud also has a variety of other components that you can use to integrate or trigger Sensitive Data Protection jobs. For example, you can use Cloud Run functions to trigger a Sensitive Data Protection scan every time a file is uploaded to Cloud Storage.

For information about how to set up this operation, see Automating the classification of data uploaded to Cloud Storage.