InspectJobConfig

JSON representation
{
  "storageConfig": {
    object(StorageConfig)
  },
  "inspectConfig": {
    object(InspectConfig)
  },
  "inspectTemplateName": string,
  "actions": [
    {
      object(Action)
    }
  ]
}
Fields
storageConfig

object(StorageConfig)

The data to scan.

inspectConfig

object(InspectConfig)

How and what to scan for.

inspectTemplateName

string

If provided, will be used as the default for all values in InspectConfig. inspectConfig will be merged into the values persisted as part of the template.

actions[]

object(Action)

Actions to execute at the completion of the job. Are executed in the order provided.

StorageConfig

Shared message indicating Cloud storage type.

JSON representation
{
  "timespanConfig": {
    object(TimespanConfig)
  },

  // Union field type can be only one of the following:
  "datastoreOptions": {
    object(DatastoreOptions)
  },
  "cloudStorageOptions": {
    object(CloudStorageOptions)
  },
  "bigQueryOptions": {
    object(BigQueryOptions)
  }
  // End of list of possible types for union field type.
}
Fields
timespanConfig

object(TimespanConfig)

Union field type.

type can be only one of the following:

datastoreOptions

object(DatastoreOptions)

Google Cloud Datastore options specification.

cloudStorageOptions

object(CloudStorageOptions)

Google Cloud Storage options specification.

bigQueryOptions

object(BigQueryOptions)

BigQuery options specification.

DatastoreOptions

Options defining a data set within Google Cloud Datastore.

JSON representation
{
  "partitionId": {
    object(PartitionId)
  },
  "kind": {
    object(KindExpression)
  }
}
Fields
partitionId

object(PartitionId)

A partition ID identifies a grouping of entities. The grouping is always by project and namespace, however the namespace ID may be empty.

kind

object(KindExpression)

The kind to process.

PartitionId

Datastore partition ID. A partition ID identifies a grouping of entities. The grouping is always by project and namespace, however the namespace ID may be empty.

A partition ID contains several dimensions: project ID and namespace ID.

JSON representation
{
  "projectId": string,
  "namespaceId": string
}
Fields
projectId

string

The ID of the project to which the entities belong.

namespaceId

string

If not empty, the ID of the namespace to which the entities belong.

KindExpression

A representation of a Datastore kind.

JSON representation
{
  "name": string
}
Fields
name

string

The name of the kind.

CloudStorageOptions

Options defining a file or a set of files within a Google Cloud Storage bucket.

JSON representation
{
  "fileSet": {
    object(FileSet)
  },
  "bytesLimitPerFile": string,
  "bytesLimitPerFilePercent": number,
  "fileTypes": [
    enum(FileType)
  ],
  "sampleMethod": enum(SampleMethod),
  "filesLimitPercent": number
}
Fields
fileSet

object(FileSet)

The set of one or more files to scan.

bytesLimitPerFile

string (int64 format)

Max number of bytes to scan from a file. If a scanned file's size is bigger than this value then the rest of the bytes are omitted. Only one of bytesLimitPerFile and bytesLimitPerFilePercent can be specified.

bytesLimitPerFilePercent

number

Max percentage of bytes to scan from a file. The rest are omitted. The number of bytes scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and 100 means no limit. Defaults to 0. Only one of bytesLimitPerFile and bytesLimitPerFilePercent can be specified.

fileTypes[]

enum(FileType)

List of file type groups to include in the scan. If empty, all files are scanned and available data format processors are applied. In addition, the binary content of the selected files is always scanned as well.

sampleMethod

enum(SampleMethod)

filesLimitPercent

number

Limits the number of files to scan to this percentage of the input FileSet. Number of files scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and 100 means no limit. Defaults to 0.

FileSet

Set of files to scan.

JSON representation
{
  "url": string,
  "regexFileSet": {
    object(CloudStorageRegexFileSet)
  }
}
Fields
url

string

The Cloud Storage url of the file(s) to scan, in the format gs://<bucket>/<path>. Trailing wildcard in the path is allowed. Exactly one of url or regexFileSet must be set.

regexFileSet

object(CloudStorageRegexFileSet)

The regex-filtered set of files to scan. Exactly one of url or regexFileSet must be set.

CloudStorageRegexFileSet

Message representing a set of files in a Cloud Storage bucket. Regular expressions are used to allow fine-grained control over which files in the bucket to include.

Included files are those that match at least one item in includeRegex and do not match any items in excludeRegex. Note that a file that matches items from both lists will not be included. For a match to occur, the entire file path (i.e., everything in the url after the bucket name) must match the regular expression.

For example, given the input {bucketName: "mybucket", includeRegex: ["directory1/.*"], excludeRegex: ["directory1/excluded.*"]}:

  • gs://mybucket/directory1/myfile will be included
  • gs://mybucket/directory1/directory2/myfile will be included (.* matches across /)
  • gs://mybucket/directory0/directory1/myfile will not be included (the full path doesn't match any items in includeRegex)
  • gs://mybucket/directory1/excludedfile will not be included (the path matches an item in excludeRegex)

If includeRegex is left empty, it will match all files by default (this is equivalent to setting includeRegex: [".*"]).

Some other common use cases:

  • {bucketName: "mybucket", excludeRegex: [".*\.pdf"]} will include all files in mybucket except for .pdf files
  • {bucketName: "mybucket", includeRegex: ["directory/[^/]+"]} will include all files directly under gs://mybucket/directory/, without matching across /
JSON representation
{
  "bucketName": string,
  "includeRegex": [
    string
  ],
  "excludeRegex": [
    string
  ]
}
Fields
bucketName

string

The name of a Cloud Storage bucket. Required.

includeRegex[]

string

A list of regular expressions matching file paths to include. All files in the bucket that match at least one of these regular expressions will be included in the set of files, except for those that also match an item in excludeRegex. Leaving this field empty will match all files by default (this is equivalent to including .* in the list).

Regular expressions use RE2 syntax; a guide can be found under the google/re2 repository on GitHub.

excludeRegex[]

string

A list of regular expressions matching file paths to exclude. All files in the bucket that match at least one of these regular expressions will be excluded from the scan.

Regular expressions use RE2 syntax; a guide can be found under the google/re2 repository on GitHub.

FileType

Definitions of file type groups to scan.

Enums
FILE_TYPE_UNSPECIFIED Includes all files.
BINARY_FILE Includes all file extensions not covered by text file types.
TEXT_FILE Included file extensions: asc, brf, c, cc, cpp, csv, cxx, c++, cs, css, dart, eml, go, h, hh, hpp, hxx, h++, hs, html, htm, shtml, shtm, xhtml, lhs, ini, java, js, json, ocaml, md, mkd, markdown, m, ml, mli, pl, pm, php, phtml, pht, py, pyw, rb, rbw, rs, rc, scala, sh, sql, tex, txt, text, tsv, vcard, vcs, wml, xml, xsl, xsd, yml, yaml.

SampleMethod

How to sample bytes if not all bytes are scanned. Meaningful only when used in conjunction with bytesLimitPerFile. If not specified, scanning would start from the top.

Enums
SAMPLE_METHOD_UNSPECIFIED
TOP Scan from the top (default).
RANDOM_START For each file larger than bytesLimitPerFile, randomly pick the offset to start scanning. The scanned bytes are contiguous.

BigQueryOptions

Options defining BigQuery table and row identifiers.

JSON representation
{
  "tableReference": {
    object(BigQueryTable)
  },
  "identifyingFields": [
    {
      object(FieldId)
    }
  ],
  "rowsLimit": string,
  "rowsLimitPercent": number,
  "sampleMethod": enum(SampleMethod),
  "excludedFields": [
    {
      object(FieldId)
    }
  ]
}
Fields
tableReference

object(BigQueryTable)

Complete BigQuery table reference.

identifyingFields[]

object(FieldId)

References to fields uniquely identifying rows within the table. Nested fields in the format, like person.birthdate.year, are allowed.

rowsLimit

string (int64 format)

Max number of rows to scan. If the table has more rows than this value, the rest of the rows are omitted. If not set, or if set to 0, all rows will be scanned. Only one of rowsLimit and rowsLimitPercent can be specified. Cannot be used in conjunction with TimespanConfig.

rowsLimitPercent

number

Max percentage of rows to scan. The rest are omitted. The number of rows scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and 100 means no limit. Defaults to 0. Only one of rowsLimit and rowsLimitPercent can be specified. Cannot be used in conjunction with TimespanConfig.

sampleMethod

enum(SampleMethod)

excludedFields[]

object(FieldId)

References to fields excluded from scanning. This allows you to skip inspection of entire columns which you know have no findings.

SampleMethod

How to sample rows if not all rows are scanned. Meaningful only when used in conjunction with rowsLimit. If not specified, scanning would start from the top.

Enums
SAMPLE_METHOD_UNSPECIFIED
TOP Scan from the top (default).
RANDOM_START Randomly pick the row to start scanning. The scanned rows are contiguous.

TimespanConfig

Configuration of the timespan of the items to include in scanning. Currently only supported when inspecting Google Cloud Storage and BigQuery.

JSON representation
{
  "startTime": string,
  "endTime": string,
  "timestampField": {
    object(FieldId)
  },
  "enableAutoPopulationOfTimespanConfig": boolean
}
Fields
startTime

string (Timestamp format)

Exclude files or rows older than this value.

A timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds. Example: "2014-10-02T15:01:23.045123456Z".

endTime

string (Timestamp format)

Exclude files or rows newer than this value. If set to zero, no upper time limit is applied.

A timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds. Example: "2014-10-02T15:01:23.045123456Z".

timestampField

object(FieldId)

Specification of the field containing the timestamp of scanned items. Used for data sources like Datastore or BigQuery. If not specified for BigQuery, table last modification timestamp is checked against given time span. The valid data types of the timestamp field are: for BigQuery - timestamp, date, datetime; for Datastore - timestamp. Datastore entity will be scanned if the timestamp property does not exist or its value is empty or invalid.

enableAutoPopulationOfTimespanConfig

boolean

When the job is started by a JobTrigger we will automatically figure out a valid startTime to avoid scanning files that have not been modified since the last time the JobTrigger executed. This will be based on the time of the execution of the last run of the JobTrigger.

Action

A task to execute on the completion of a job. See https://cloud.google.com/dlp/docs/concepts-actions to learn more.

JSON representation
{

  // Union field action can be only one of the following:
  "saveFindings": {
    object(SaveFindings)
  },
  "pubSub": {
    object(PublishToPubSub)
  },
  "publishSummaryToCscc": {
    object(PublishSummaryToCscc)
  }
  // End of list of possible types for union field action.
}
Fields

Union field action.

action can be only one of the following:

saveFindings

object(SaveFindings)

Save resulting findings in a provided location.

pubSub

object(PublishToPubSub)

Publish a notification to a pubsub topic.

publishSummaryToCscc

object(PublishSummaryToCscc)

Publish summary to Cloud Security Command Center (Alpha).

SaveFindings

If set, the detailed findings will be persisted to the specified OutputStorageConfig. Only a single instance of this action can be specified. Compatible with: Inspect, Risk

JSON representation
{
  "outputConfig": {
    object(OutputStorageConfig)
  }
}
Fields
outputConfig

object(OutputStorageConfig)

OutputStorageConfig

Cloud repository for storing output.

JSON representation
{
  "outputSchema": enum(OutputSchema),
  "table": {
    object(BigQueryTable)
  }
}
Fields
outputSchema

enum(OutputSchema)

Schema used for writing the findings for Inspect jobs. This field is only used for Inspect and must be unspecified for Risk jobs. Columns are derived from the Finding object. If appending to an existing table, any columns from the predefined schema that are missing will be added. No columns in the existing table will be deleted.

If unspecified, then all available columns will be used for a new table or an (existing) table with no schema, and no changes will be made to an existing table that has a schema.

table

object(BigQueryTable)

Store findings in an existing table or a new table in an existing dataset. If tableId is not set a new one will be generated for you with the following format: dlp_googleapis_yyyy_mm_dd_[dlp_job_id]. Pacific timezone will be used for generating the date details.

For Inspect, each column in an existing output table must have the same name, type, and mode of a field in the Finding object.

For Risk, an existing output table should be the output of a previous Risk analysis job run on the same source table, with the same privacy metric and quasi-identifiers. Risk jobs that analyze the same table but compute a different privacy metric, or use different sets of quasi-identifiers, cannot store their results in the same table.

OutputSchema

Predefined schemas for storing findings.

Enums
OUTPUT_SCHEMA_UNSPECIFIED
BASIC_COLUMNS Basic schema including only infoType, quote, certainty, and timestamp.
GCS_COLUMNS Schema tailored to findings from scanning Google Cloud Storage.
DATASTORE_COLUMNS Schema tailored to findings from scanning Google Datastore.
BIG_QUERY_COLUMNS Schema tailored to findings from scanning Google BigQuery.
ALL_COLUMNS Schema containing all columns.

PublishToPubSub

Publish the results of a DlpJob to a pub sub channel. Compatible with: Inspect, Risk

JSON representation
{
  "topic": string
}
Fields
topic

string

Cloud Pub/Sub topic to send notifications to. The topic must have given publishing access rights to the DLP API service account executing the long running DlpJob sending the notifications. Format is projects/{project}/topics/{topic}.

PublishSummaryToCscc

Publish the result summary of a DlpJob to the Cloud Security Command Center (CSCC Alpha). This action is only available for projects which are parts of an organization and whitelisted for the alpha Cloud Security Command Center. The action will publish count of finding instances and their info types. The summary of findings will be persisted in CSCC and are governed by CSCC service-specific policy, see https://cloud.google.com/terms/service-terms Only a single instance of this action can be specified. Compatible with: Inspect

Was this page helpful? Let us know how we did:

Send feedback about...

Data Loss Prevention API