Cloud Data Loss Prevention (Cloud DLP) is now a part of Sensitive Data Protection. The API name remains the same: Cloud Data Loss Prevention API (DLP API). For information about the services that make up Sensitive Data Protection, see Sensitive Data Protection overview.

InspectJobConfig

JSON representation
StorageConfig
- JSON representation
DatastoreOptions
- JSON representation
PartitionId
- JSON representation
KindExpression
- JSON representation
CloudStorageOptions
- JSON representation
FileSet
- JSON representation
CloudStorageRegexFileSet
- JSON representation
SampleMethod
BigQueryOptions
- JSON representation
SampleMethod
HybridOptions
- JSON representation
TableOptions
- JSON representation
TimespanConfig
- JSON representation

Controls what and how to inspect for findings.

JSON representation
{ "storageConfig": { object (`StorageConfig`) }, "inspectConfig": { object (`InspectConfig`) }, "inspectTemplateName": string, "actions": [ { object (`Action`) } ] }

Fields
`storageConfig`	`object (StorageConfig)` The data to scan.
`inspectConfig`	`object (InspectConfig)` How and what to scan for.
`inspectTemplateName`	`string` If provided, will be used as the default for all values in InspectConfig. `inspectConfig` will be merged into the values persisted as part of the template.
`actions[]`	`object (Action)` Actions to execute at the completion of the job.

StorageConfig

Shared message indicating Cloud storage type.

JSON representation

JSON representation
{ "timespanConfig": { object (`TimespanConfig`) }, // Union field `type` can be only one of the following: "datastoreOptions": { object (`DatastoreOptions`) }, "cloudStorageOptions": { object (`CloudStorageOptions`) }, "bigQueryOptions": { object (`BigQueryOptions`) }, "hybridOptions": { object (`HybridOptions`) } // End of list of possible types for union field `type`. }

{
  "timespanConfig": {
    object (TimespanConfig)
  },

  // Union field type can be only one of the following:
  "datastoreOptions": {
    object (DatastoreOptions)
  },
  "cloudStorageOptions": {
    object (CloudStorageOptions)
  },
  "bigQueryOptions": {
    object (BigQueryOptions)
  },
  "hybridOptions": {
    object (HybridOptions)
  }
  // End of list of possible types for union field type.
}

Fields
`timespanConfig`	`object (TimespanConfig)` Configuration of the timespan of the items to include in scanning.
Union field `type`. Type of storage system to inspect. `type` can be only one of the following:
`datastoreOptions`	`object (DatastoreOptions)` Google Cloud Datastore options.
`cloudStorageOptions`	`object (CloudStorageOptions)` Cloud Storage options.
`bigQueryOptions`	`object (BigQueryOptions)` BigQuery options.
`hybridOptions`	`object (HybridOptions)` Hybrid inspection options.

DatastoreOptions

Options defining a data set within Google Cloud Datastore.

JSON representation
{ "partitionId": { object (`PartitionId`) }, "kind": { object (`KindExpression`) } }

Fields

Fields
`partitionId`	`object (PartitionId)` A partition ID identifies a grouping of entities. The grouping is always by project and namespace, however the namespace ID may be empty.
`kind`	`object (KindExpression)` The kind to process.

partitionId

object (PartitionId)

A partition ID identifies a grouping of entities. The grouping is always by project and namespace, however the namespace ID may be empty.

kind

object (KindExpression)

The kind to process.

PartitionId

Datastore partition ID. A partition ID identifies a grouping of entities. The grouping is always by project and namespace, however the namespace ID may be empty.

A partition ID contains several dimensions: project ID and namespace ID.

JSON representation
{ "projectId": string, "namespaceId": string }

Fields

Fields
`projectId`	`string` The ID of the project to which the entities belong.
`namespaceId`	`string` If not empty, the ID of the namespace to which the entities belong.

projectId

string

The ID of the project to which the entities belong.

namespaceId

string

If not empty, the ID of the namespace to which the entities belong.

KindExpression

A representation of a Datastore kind.

JSON representation
{ "name": string }

Fields

Fields
`name`	`string` The name of the kind.

name

string

The name of the kind.

CloudStorageOptions

Options defining a file or a set of files within a Cloud Storage bucket.

JSON representation

JSON representation
{ "fileSet": { object (`FileSet`) }, "bytesLimitPerFile": string, "bytesLimitPerFilePercent": integer, "fileTypes": [ enum (`FileType`) ], "sampleMethod": enum (`SampleMethod`), "filesLimitPercent": integer }

{
  "fileSet": {
    object (FileSet)
  },
  "bytesLimitPerFile": string,
  "bytesLimitPerFilePercent": integer,
  "fileTypes": [
    enum (FileType)
  ],
  "sampleMethod": enum (SampleMethod),
  "filesLimitPercent": integer
}

Fields
`fileSet`	`object (FileSet)` The set of one or more files to scan.
`bytesLimitPerFile`	`string (int64 format)` Max number of bytes to scan from a file. If a scanned file's size is bigger than this value then the rest of the bytes are omitted. Only one of `bytesLimitPerFile` and `bytesLimitPerFilePercent` can be specified. This field can't be set if de-identification is requested. For certain file types, setting this field has no effect. For more information, see Limits on bytes scanned per file.
`bytesLimitPerFilePercent`	`integer` Max percentage of bytes to scan from a file. The rest are omitted. The number of bytes scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and 100 means no limit. Defaults to 0. Only one of bytesLimitPerFile and bytesLimitPerFilePercent can be specified. This field can't be set if de-identification is requested. For certain file types, setting this field has no effect. For more information, see Limits on bytes scanned per file.
`fileTypes[]`	`enum (FileType)` List of file type groups to include in the scan. If empty, all files are scanned and available data format processors are applied. In addition, the binary content of the selected files is always scanned as well. Images are scanned only as binary if the specified region does not support image inspection and no fileTypes were specified. Image inspection is restricted to 'global', 'us', 'asia', and 'europe'.
`sampleMethod`	`enum (SampleMethod)` How to sample the data.
`filesLimitPercent`	`integer` Limits the number of files to scan to this percentage of the input FileSet. Number of files scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and 100 means no limit. Defaults to 0.

FileSet

Set of files to scan.

JSON representation
{ "url": string, "regexFileSet": { object (`CloudStorageRegexFileSet`) } }

Fields

Fields
`url`	`string` The Cloud Storage url of the file(s) to scan, in the format `gs://<bucket>/<path>`. Trailing wildcard in the path is allowed. If the url ends in a trailing slash, the bucket or directory represented by the url will be scanned non-recursively (content in sub-directories will not be scanned). This means that `gs://mybucket/` is equivalent to `gs://mybucket/`, and `gs://mybucket/directory/` is equivalent to `gs://mybucket/directory/`. Exactly one of `url` or `regexFileSet` must be set.
`regexFileSet`	`object (CloudStorageRegexFileSet)` The regex-filtered set of files to scan. Exactly one of `url` or `regexFileSet` must be set.

url

string

The Cloud Storage url of the file(s) to scan, in the format gs://<bucket>/<path>. Trailing wildcard in the path is allowed.

If the url ends in a trailing slash, the bucket or directory represented by the url will be scanned non-recursively (content in sub-directories will not be scanned). This means that gs://mybucket/ is equivalent to gs://mybucket/*, and gs://mybucket/directory/ is equivalent to gs://mybucket/directory/*.

Exactly one of url or regexFileSet must be set.

regexFileSet

object (CloudStorageRegexFileSet)

The regex-filtered set of files to scan. Exactly one of url or regexFileSet must be set.

CloudStorageRegexFileSet

Message representing a set of files in a Cloud Storage bucket. Regular expressions are used to allow fine-grained control over which files in the bucket to include.

Included files are those that match at least one item in includeRegex and do not match any items in excludeRegex. Note that a file that matches items from both lists will not be included. For a match to occur, the entire file path (i.e., everything in the url after the bucket name) must match the regular expression.

For example, given the input {bucketName: "mybucket", includeRegex: ["directory1/.*"], excludeRegex: ["directory1/excluded.*"]}:

gs://mybucket/directory1/myfile will be included
gs://mybucket/directory1/directory2/myfile will be included (.* matches across /)
gs://mybucket/directory0/directory1/myfile will not be included (the full path doesn't match any items in includeRegex)
gs://mybucket/directory1/excludedfile will not be included (the path matches an item in excludeRegex)

If includeRegex is left empty, it will match all files by default (this is equivalent to setting includeRegex: [".*"]).

Some other common use cases:

{bucketName: "mybucket", excludeRegex: [".*\.pdf"]} will include all files in mybucket except for .pdf files
{bucketName: "mybucket", includeRegex: ["directory/[^/]+"]} will include all files directly under gs://mybucket/directory/, without matching across /

JSON representation
{ "bucketName": string, "includeRegex": [ string ], "excludeRegex": [ string ] }

Fields

Fields
`bucketName`	`string` The name of a Cloud Storage bucket. Required.
`includeRegex[]`	`string` A list of regular expressions matching file paths to include. All files in the bucket that match at least one of these regular expressions will be included in the set of files, except for those that also match an item in `excludeRegex`. Leaving this field empty will match all files by default (this is equivalent to including `.*` in the list). Regular expressions use RE2 syntax; a guide can be found under the google/re2 repository on GitHub.
`excludeRegex[]`	`string` A list of regular expressions matching file paths to exclude. All files in the bucket that match at least one of these regular expressions will be excluded from the scan. Regular expressions use RE2 syntax; a guide can be found under the google/re2 repository on GitHub.

bucketName

string

The name of a Cloud Storage bucket. Required.

includeRegex[]

string

A list of regular expressions matching file paths to include. All files in the bucket that match at least one of these regular expressions will be included in the set of files, except for those that also match an item in excludeRegex. Leaving this field empty will match all files by default (this is equivalent to including .* in the list).

Regular expressions use RE2 syntax; a guide can be found under the google/re2 repository on GitHub.

excludeRegex[]

string

A list of regular expressions matching file paths to exclude. All files in the bucket that match at least one of these regular expressions will be excluded from the scan.

Regular expressions use RE2 syntax; a guide can be found under the google/re2 repository on GitHub.

SampleMethod

How to sample bytes if not all bytes are scanned. Meaningful only when used in conjunction with bytesLimitPerFile. If not specified, scanning would start from the top.

Enums
`SAMPLE_METHOD_UNSPECIFIED`	No sampling.
`TOP`	Scan from the top (default).
`RANDOM_START`	For each file larger than bytesLimitPerFile, randomly pick the offset to start scanning. The scanned bytes are contiguous.

BigQueryOptions

Options defining BigQuery table and row identifiers.

JSON representation

JSON representation
{ "tableReference": { object (`BigQueryTable`) }, "identifyingFields": [ { object (`FieldId`) } ], "rowsLimit": string, "rowsLimitPercent": integer, "sampleMethod": enum (`SampleMethod`), "excludedFields": [ { object (`FieldId`) } ], "includedFields": [ { object (`FieldId`) } ] }

{
  "tableReference": {
    object (BigQueryTable)
  },
  "identifyingFields": [
    {
      object (FieldId)
    }
  ],
  "rowsLimit": string,
  "rowsLimitPercent": integer,
  "sampleMethod": enum (SampleMethod),
  "excludedFields": [
    {
      object (FieldId)
    }
  ],
  "includedFields": [
    {
      object (FieldId)
    }
  ]
}

Fields
`tableReference`	`object (BigQueryTable)` Complete BigQuery table reference.
`identifyingFields[]`	`object (FieldId)` Table fields that may uniquely identify a row within the table. When `actions.saveFindings.outputConfig.table` is specified, the values of columns specified here are available in the output table under `location.content_locations.record_location.record_key.id_values`. Nested fields such as `person.birthdate.year` are allowed.
`rowsLimit`	`string (int64 format)` Max number of rows to scan. If the table has more rows than this value, the rest of the rows are omitted. If not set, or if set to 0, all rows will be scanned. Only one of rowsLimit and rowsLimitPercent can be specified. Cannot be used in conjunction with TimespanConfig.
`rowsLimitPercent`	`integer` Max percentage of rows to scan. The rest are omitted. The number of rows scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and 100 means no limit. Defaults to 0. Only one of rowsLimit and rowsLimitPercent can be specified. Cannot be used in conjunction with TimespanConfig. Caution: A known issue is causing the `rowsLimitPercent` field to behave unexpectedly. We recommend using `rowsLimit` instead.
`sampleMethod`	`enum (SampleMethod)` How to sample the data.
`excludedFields[]`	`object (FieldId)` References to fields excluded from scanning. This allows you to skip inspection of entire columns which you know have no findings. When inspecting a table, we recommend that you inspect all columns. Otherwise, findings might be affected because hints from excluded columns will not be used.
`includedFields[]`	`object (FieldId)` Limit scanning only to these fields. When inspecting a table, we recommend that you inspect all columns. Otherwise, findings might be affected because hints from excluded columns will not be used.

SampleMethod

How to sample rows if not all rows are scanned. Meaningful only when used in conjunction with either rowsLimit or rowsLimitPercent. If not specified, rows are scanned in the order BigQuery reads them.

Enums
`SAMPLE_METHOD_UNSPECIFIED`	No sampling.
`TOP`	Scan groups of rows in the order BigQuery provides (default). Multiple groups of rows may be scanned in parallel, so results may not appear in the same order the rows are read.
`RANDOM_START`	Randomly pick groups of rows to scan.

HybridOptions

Configuration to control jobs where the content being inspected is outside of Google Cloud Platform.

JSON representation
{ "description": string, "requiredFindingLabelKeys": [ string ], "labels": { string: string, ... }, "tableOptions": { object (`TableOptions`) } }

Fields
`description`	`string` A short description of where the data is coming from. Will be stored once in the job. 256 max length.
`requiredFindingLabelKeys[]`	`string` These are labels that each inspection request must include within their 'finding_labels' map. Request may contain others, but any missing one of these will be rejected. Label keys must be between 1 and 63 characters long and must conform to the following regular expression: `[a-z]([-a-z0-9]*[a-z0-9])?`. No more than 10 keys can be required.
`labels`	`map (key: string, value: string)` To organize findings, these labels will be added to each finding. Label keys must be between 1 and 63 characters long and must conform to the following regular expression: `[a-z]([-a-z0-9][a-z0-9])?`. Label values must be between 0 and 63 characters long and must conform to the regular expression `([a-z]([-a-z0-9][a-z0-9])?)?`. No more than 10 labels can be associated with a given finding. Examples: `"environment" : "production"` `"pipeline" : "etl"` An object containing a list of `"key": value` pairs. Example: `{ "name": "wrench", "mass": "1.3kg", "count": "3" }`.
`tableOptions`	`object (TableOptions)` If the container is a table, additional information to make findings meaningful such as the columns that are primary keys.

TableOptions

Instructions regarding the table content being inspected.

JSON representation
{ "identifyingFields": [ { object (`FieldId`) } ] }

Fields

Fields
`identifyingFields[]`	`object (FieldId)` The columns that are the primary keys for table objects included in ContentItem. A copy of this cell's value will stored alongside alongside each finding so that the finding can be traced to the specific row it came from. No more than 3 may be provided.

identifyingFields[]

object (FieldId)

The columns that are the primary keys for table objects included in ContentItem. A copy of this cell's value will stored alongside alongside each finding so that the finding can be traced to the specific row it came from. No more than 3 may be provided.

TimespanConfig

Configuration of the timespan of the items to include in scanning. Currently only supported when inspecting Cloud Storage and BigQuery.

JSON representation
{ "startTime": string, "endTime": string, "timestampField": { object (`FieldId`) }, "enableAutoPopulationOfTimespanConfig": boolean }

Fields
`startTime`	`string (Timestamp format)` Exclude files, tables, or rows older than this value. If not set, no lower time limit is applied. Uses RFC 3339, where generated output will always be Z-normalized and uses 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`endTime`	`string (Timestamp format)` Exclude files, tables, or rows newer than this value. If not set, no upper time limit is applied. Uses RFC 3339, where generated output will always be Z-normalized and uses 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`timestampField`	`object (FieldId)` Specification of the field containing the timestamp of scanned items. Used for data sources like Datastore and BigQuery. For BigQuery If this value is not specified and the table was modified between the given start and end times, the entire table will be scanned. If this value is specified, then rows are filtered based on the given start and end times. Rows with a `NULL` value in the provided BigQuery column are skipped. Valid data types of the provided BigQuery column are: `INTEGER`, `DATE`, `TIMESTAMP`, and `DATETIME`. If your BigQuery table is partitioned at ingestion time, you can use any of the following pseudo-columns as your timestamp field. When used with Cloud DLP, these pseudo-column names are case sensitive. `_PARTITIONTIME` `_PARTITIONDATE` `_PARTITION_LOAD_TIME` For Datastore If this value is specified, then entities are filtered based on the given start and end times. If an entity does not contain the provided timestamp property or contains empty or invalid values, then it is included. Valid data types of the provided timestamp property are: `TIMESTAMP`. See the known issue related to this operation.
`enableAutoPopulationOfTimespanConfig`	`boolean` When the job is started by a JobTrigger we will automatically figure out a valid startTime to avoid scanning files that have not been modified since the last time the JobTrigger executed. This will be based on the time of the execution of the last run of the JobTrigger or the timespan endTime used in the last run of the JobTrigger. For BigQuery Inspect jobs triggered by automatic population will scan data that is at least three hours old when the job starts. This is because streaming buffer rows are not read during inspection and reading up to the current timestamp will result in skipped rows. See the known issue related to this operation.