- JSON representation
- StorageConfig
- DatastoreOptions
- PartitionId
- KindExpression
- CloudStorageOptions
- FileSet
- CloudStorageRegexFileSet
- SampleMethod
- BigQueryOptions
- SampleMethod
- HybridOptions
- TableOptions
- TimespanConfig
Controls what and how to inspect for findings.
JSON representation |
---|
{ "storageConfig": { object ( |
Fields | |
---|---|
storage |
The data to scan. |
inspect |
How and what to scan for. |
inspect |
If provided, will be used as the default for all values in InspectConfig. |
actions[] |
Actions to execute at the completion of the job. |
StorageConfig
Shared message indicating Cloud storage type.
JSON representation |
---|
{ "timespanConfig": { object ( |
Fields | |
---|---|
timespan |
Configuration of the timespan of the items to include in scanning. |
Union field type . Type of storage system to inspect. type can be only one of the following: |
|
datastore |
Google Cloud Datastore options. |
cloud |
Cloud Storage options. |
big |
BigQuery options. |
hybrid |
Hybrid inspection options. |
DatastoreOptions
Options defining a data set within Google Cloud Datastore.
JSON representation |
---|
{ "partitionId": { object ( |
Fields | |
---|---|
partition |
A partition ID identifies a grouping of entities. The grouping is always by project and namespace, however the namespace ID may be empty. |
kind |
The kind to process. |
PartitionId
Datastore partition ID. A partition ID identifies a grouping of entities. The grouping is always by project and namespace, however the namespace ID may be empty.
A partition ID contains several dimensions: project ID and namespace ID.
JSON representation |
---|
{ "projectId": string, "namespaceId": string } |
Fields | |
---|---|
project |
The ID of the project to which the entities belong. |
namespace |
If not empty, the ID of the namespace to which the entities belong. |
KindExpression
A representation of a Datastore kind.
JSON representation |
---|
{ "name": string } |
Fields | |
---|---|
name |
The name of the kind. |
CloudStorageOptions
Options defining a file or a set of files within a Cloud Storage bucket.
JSON representation |
---|
{ "fileSet": { object ( |
Fields | |
---|---|
file |
The set of one or more files to scan. |
bytes |
Max number of bytes to scan from a file. If a scanned file's size is bigger than this value then the rest of the bytes are omitted. Only one of |
bytes |
Max percentage of bytes to scan from a file. The rest are omitted. The number of bytes scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and 100 means no limit. Defaults to 0. Only one of bytesLimitPerFile and bytesLimitPerFilePercent can be specified. This field can't be set if de-identification is requested. For certain file types, setting this field has no effect. For more information, see Limits on bytes scanned per file. |
file |
List of file type groups to include in the scan. If empty, all files are scanned and available data format processors are applied. In addition, the binary content of the selected files is always scanned as well. Images are scanned only as binary if the specified region does not support image inspection and no fileTypes were specified. Image inspection is restricted to 'global', 'us', 'asia', and 'europe'. |
sample |
How to sample the data. |
files |
Limits the number of files to scan to this percentage of the input FileSet. Number of files scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and 100 means no limit. Defaults to 0. |
FileSet
Set of files to scan.
JSON representation |
---|
{
"url": string,
"regexFileSet": {
object ( |
Fields | |
---|---|
url |
The Cloud Storage url of the file(s) to scan, in the format If the url ends in a trailing slash, the bucket or directory represented by the url will be scanned non-recursively (content in sub-directories will not be scanned). This means that Exactly one of |
regex |
The regex-filtered set of files to scan. Exactly one of |
CloudStorageRegexFileSet
Message representing a set of files in a Cloud Storage bucket. Regular expressions are used to allow fine-grained control over which files in the bucket to include.
Included files are those that match at least one item in includeRegex
and do not match any items in excludeRegex
. Note that a file that matches items from both lists will not be included. For a match to occur, the entire file path (i.e., everything in the url after the bucket name) must match the regular expression.
For example, given the input {bucketName: "mybucket", includeRegex:
["directory1/.*"], excludeRegex:
["directory1/excluded.*"]}
:
gs://mybucket/directory1/myfile
will be includedgs://mybucket/directory1/directory2/myfile
will be included (.*
matches across/
)gs://mybucket/directory0/directory1/myfile
will not be included (the full path doesn't match any items inincludeRegex
)gs://mybucket/directory1/excludedfile
will not be included (the path matches an item inexcludeRegex
)
If includeRegex
is left empty, it will match all files by default (this is equivalent to setting includeRegex: [".*"]
).
Some other common use cases:
{bucketName: "mybucket", excludeRegex: [".*\.pdf"]}
will include all files inmybucket
except for .pdf files{bucketName: "mybucket", includeRegex: ["directory/[^/]+"]}
will include all files directly undergs://mybucket/directory/
, without matching across/
JSON representation |
---|
{ "bucketName": string, "includeRegex": [ string ], "excludeRegex": [ string ] } |
Fields | |
---|---|
bucket |
The name of a Cloud Storage bucket. Required. |
include |
A list of regular expressions matching file paths to include. All files in the bucket that match at least one of these regular expressions will be included in the set of files, except for those that also match an item in Regular expressions use RE2 syntax; a guide can be found under the google/re2 repository on GitHub. |
exclude |
A list of regular expressions matching file paths to exclude. All files in the bucket that match at least one of these regular expressions will be excluded from the scan. Regular expressions use RE2 syntax; a guide can be found under the google/re2 repository on GitHub. |
SampleMethod
How to sample bytes if not all bytes are scanned. Meaningful only when used in conjunction with bytesLimitPerFile. If not specified, scanning would start from the top.
Enums | |
---|---|
SAMPLE_METHOD_UNSPECIFIED |
No sampling. |
TOP |
Scan from the top (default). |
RANDOM_START |
For each file larger than bytesLimitPerFile, randomly pick the offset to start scanning. The scanned bytes are contiguous. |
BigQueryOptions
Options defining BigQuery table and row identifiers.
JSON representation |
---|
{ "tableReference": { object ( |
Fields | |
---|---|
table |
Complete BigQuery table reference. |
identifying |
Table fields that may uniquely identify a row within the table. When |
rows |
Max number of rows to scan. If the table has more rows than this value, the rest of the rows are omitted. If not set, or if set to 0, all rows will be scanned. Only one of rowsLimit and rowsLimitPercent can be specified. Cannot be used in conjunction with TimespanConfig. |
rows |
Max percentage of rows to scan. The rest are omitted. The number of rows scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and 100 means no limit. Defaults to 0. Only one of rowsLimit and rowsLimitPercent can be specified. Cannot be used in conjunction with TimespanConfig. Caution: A known issue is causing the |
sample |
How to sample the data. |
excluded |
References to fields excluded from scanning. This allows you to skip inspection of entire columns which you know have no findings. When inspecting a table, we recommend that you inspect all columns. Otherwise, findings might be affected because hints from excluded columns will not be used. |
included |
Limit scanning only to these fields. When inspecting a table, we recommend that you inspect all columns. Otherwise, findings might be affected because hints from excluded columns will not be used. |
SampleMethod
How to sample rows if not all rows are scanned. Meaningful only when used in conjunction with either rowsLimit or rowsLimitPercent. If not specified, rows are scanned in the order BigQuery reads them.
Enums | |
---|---|
SAMPLE_METHOD_UNSPECIFIED |
No sampling. |
TOP |
Scan groups of rows in the order BigQuery provides (default). Multiple groups of rows may be scanned in parallel, so results may not appear in the same order the rows are read. |
RANDOM_START |
Randomly pick groups of rows to scan. |
HybridOptions
Configuration to control jobs where the content being inspected is outside of Google Cloud Platform.
JSON representation |
---|
{
"description": string,
"requiredFindingLabelKeys": [
string
],
"labels": {
string: string,
...
},
"tableOptions": {
object ( |
Fields | |
---|---|
description |
A short description of where the data is coming from. Will be stored once in the job. 256 max length. |
required |
These are labels that each inspection request must include within their 'finding_labels' map. Request may contain others, but any missing one of these will be rejected. Label keys must be between 1 and 63 characters long and must conform to the following regular expression: No more than 10 keys can be required. |
labels |
To organize findings, these labels will be added to each finding. Label keys must be between 1 and 63 characters long and must conform to the following regular expression: Label values must be between 0 and 63 characters long and must conform to the regular expression No more than 10 labels can be associated with a given finding. Examples:
An object containing a list of |
table |
If the container is a table, additional information to make findings meaningful such as the columns that are primary keys. |
TableOptions
Instructions regarding the table content being inspected.
JSON representation |
---|
{
"identifyingFields": [
{
object ( |
Fields | |
---|---|
identifying |
The columns that are the primary keys for table objects included in ContentItem. A copy of this cell's value will stored alongside alongside each finding so that the finding can be traced to the specific row it came from. No more than 3 may be provided. |
TimespanConfig
Configuration of the timespan of the items to include in scanning. Currently only supported when inspecting Cloud Storage and BigQuery.
JSON representation |
---|
{
"startTime": string,
"endTime": string,
"timestampField": {
object ( |
Fields | |
---|---|
start |
Exclude files, tables, or rows older than this value. If not set, no lower time limit is applied. A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: |
end |
Exclude files, tables, or rows newer than this value. If not set, no upper time limit is applied. A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: |
timestamp |
Specification of the field containing the timestamp of scanned items. Used for data sources like Datastore and BigQuery. For BigQuery If this value is not specified and the table was modified between the given start and end times, the entire table will be scanned. If this value is specified, then rows are filtered based on the given start and end times. Rows with a If your BigQuery table is partitioned at ingestion time, you can use any of the following pseudo-columns as your timestamp field. When used with Cloud DLP, these pseudo-column names are case sensitive.
For Datastore If this value is specified, then entities are filtered based on the given start and end times. If an entity does not contain the provided timestamp property or contains empty or invalid values, then it is included. Valid data types of the provided timestamp property are: See the known issue related to this operation. |
enable |
When the job is started by a JobTrigger we will automatically figure out a valid startTime to avoid scanning files that have not been modified since the last time the JobTrigger executed. This will be based on the time of the execution of the last run of the JobTrigger or the timespan endTime used in the last run of the JobTrigger. For BigQuery Inspect jobs triggered by automatic population will scan data that is at least three hours old when the job starts. This is because streaming buffer rows are not read during inspection and reading up to the current timestamp will result in skipped rows. See the known issue related to this operation. |