Hybrid jobs and job triggers

Hybrid jobs and job triggers encompass a set of asynchronous API methods that allow you to scan payloads of data sent from virtually any source for sensitive information, and then store the findings in Google Cloud. Hybrid jobs enable you to write your own data crawlers that behave and serve data similarly to the Cloud Data Loss Prevention (DLP) storage inspection methods.

Using hybrid jobs, you can stream data from any source to Cloud DLP. Cloud DLP inspects the data for sensitive information or PII, and then saves the inspection scan results to a Cloud DLP job resource. You can examine the scan results in the Cloud DLP Console UI or API, or you can specify post-scan actions to run, such as saving inspection results data to a BigQuery table or emitting a Pub/Sub notification.

The hybrid jobs workflow is summarized in the following diagram:

Diagram of hybrid jobs dataflow, showing your application sending data from
an external source to Cloud DLP, Cloud DLP inspecting
the data, and then either saving or publishing
findings.

This conceptual topic describes hybrid jobs and job triggers and how they work. To learn how to implement hybrid jobs and job triggers, see Inspecting external data using hybrid jobs.

About hybrid environments

"Hybrid" environments are common in organizations. Many organizations store and process sensitive data using some combination of the following:

  • Other cloud providers
  • On-premises servers or other data repositories
  • Non-native storage systems, such as systems running inside a virtual machine
  • Web and mobile apps
  • Google Cloud-based solutions

Using hybrid jobs, Cloud DLP can inspect data sent to it from any of these sources. Listed here are some example scenarios:

  • Inspect data stored in Amazon Relational Database Service (RDS), MySQL running inside a virtual machine, or an on-premises database.
  • Inspect and tokenize data as you migrate from on-premises to the cloud, or between production, development, and analytics.
  • Inspect and redact transactions from a web or mobile application before storing the data at rest.

Inspection options

As described in more detail in Method types, when you want to inspect content for sensitive data, Cloud DLP provides three default options:

  • Content methods inspection: Using content inspection, you stream small payloads of data to Cloud DLP along with instructions about what to inspect for. Cloud DLP then inspects the data for sensitive content and PII, and then returns the results of its scan back to you.
  • Storage methods inspection: Using storage inspection, Cloud DLP inspects a Google Cloud-based storage repository such as a BigQuery database, Cloud Storage bucket, or Datastore kind. You tell Cloud DLP what to inspect and what to inspect for, and then Cloud DLP runs a job that scans the repository. After the scan is complete, Cloud DLP saves a summary of the results of the scan back to the job. You can additionally specify that the results are sent to another Google Cloud product for analysis, such as a separate BigQuery table.
  • Hybrid jobs inspection: Hybrid jobs provide the benefits of both of the previous two methods. They enable you to stream data as you would with the content methods, while gaining the storage, visualization, and actions of storage inspection jobs. All inspection configuration is managed within Cloud DLP, with no extra configuration required on the client side. Hybrid jobs can be useful for scanning non-native storage systems such as a database running in a virtual machine (VM), on-premises, or on another cloud. Hybrid methods can also be useful for inspecting processing systems such as migration workloads, or even to proxy service-to-service communication. While content methods can also do this, hybrid methods provide you the findings storage backend that can aggregate your data across multiple API calls so that you don't have to.

About hybrid jobs and job triggers

A hybrid job is effectively a hybrid of content methods and storage methods. The basic workflow for using hybrid jobs and job triggers is as follows:

  1. You write a script or create a workflow that sends data to Cloud DLP for inspection along with some metadata.
  2. You configure and create a hybrid job resource or trigger and enable it to activate when it receives data.
  3. Your script or workflow runs on the client side and sends data to Cloud DLP. The data includes an activation message and the job trigger's identifier, which triggers the inspection.
  4. Cloud DLP inspects the data according to the criteria you set in the hybrid job or trigger.
  5. Cloud DLP saves the results of the scan to the hybrid job resource, along with metadata that you provide. You can examine the results using the Cloud DLP UI in Cloud Console.
  6. Optionally, Cloud DLP can run post-scan actions, such as saving inspection results data to a BigQuery table or notifying you by email or Pub/Sub.

A hybrid job trigger enables you to create, activate, and stop jobs so that you can trigger actions whenever you need. By ensuring that your script or code sends data that includes the hybrid job trigger's identifier, you don't need to update your script or code whenever a new job is started.

Typical hybrid job scenarios

Following are some typical scenarios that hybrid jobs are well-suited for:

  • You want to execute a one-off scan of a database outside of Google Cloud as part of a quarterly spot check of databases.
  • You want to monitor all new content added on a daily basis to a database that Cloud DLP does not natively support.
  • You want to monitor traffic in a network using Cloud DLP Filter for Envoy (a WebAssembly HTTP filter for Envoy sidecar proxies) to identify problematic sensitive data movement.

Hybrid job-supported actions

Like other Cloud DLP jobs, hybrid jobs support actions. Not all actions apply to hybrid jobs. Following are the currently supported actions along with information about how they work. Be aware that with the Pub/Sub, email, and Cloud Monitoring actions, findings are made available when the job ends.

  • Save findings to DLP and Save findings to BigQuery: Findings are saved to a Cloud DLP resource or BigQuery table, respectively. These actions work with hybrid jobs similarly to how they work with other job types, with one important difference: With hybrid jobs, findings are made available while the job is running; with other job types, findings are made available when the job ends.
  • Send Pub/Sub: When a job is done, a Pub/Sub message will be emitted.
  • Send Email: When a job is done, an email message will be sent.
  • Publish to Cloud Monitoring: When a job is done, its findings will be published to Monitoring.

Summary

Following are some key features and benefits of using hybrid jobs and job triggers:

  • Hybrid jobs enable you to stream data to Cloud DLP from virtually any source, on- or off-cloud.
  • Hybrid job triggers activate when Cloud DLP receives a data stream that includes an activation message and the job trigger's identifier.
  • You can wait until the inspection scan has completed, or you can stop the job manually. Inspection results are saved to a Cloud DLP or to BigQuery whether you allow the job to finish or stop the job early.
  • Cloud DLP inspection scan results from a hybrid job trigger are saved to a hybrid job resource within Cloud DLP.
  • You can examine the inspection scan results by viewing the job trigger resource within Cloud DLP.
  • You can also instruct Cloud DLP to, using an action, send hybrid job results to a BigQuery database and notify you by email or Pub/Sub notification.

What's next