REST Resource: projects.locations.dataScans

Resource: DataScan

Represents a user-visible job which provides the insights for the related data source.

For example:

  • Data Quality: generates queries based on the rules and runs against the data to get data quality check results.
  • Data Profile: analyzes the data in table(s) and generates insights about the structure, content and relationships (such as null percent, cardinality, min/max/mean, etc).
JSON representation
{
  "name": string,
  "uid": string,
  "description": string,
  "displayName": string,
  "labels": {
    string: string,
    ...
  },
  "state": enum (State),
  "createTime": string,
  "updateTime": string,
  "data": {
    object (DataSource)
  },
  "executionSpec": {
    object (ExecutionSpec)
  },
  "executionStatus": {
    object (ExecutionStatus)
  },
  "type": enum (DataScanType),

  // Union field spec can be only one of the following:
  "dataQualitySpec": {
    object (DataQualitySpec)
  },
  "dataProfileSpec": {
    object (DataProfileSpec)
  }
  // End of list of possible types for union field spec.

  // Union field result can be only one of the following:
  "dataQualityResult": {
    object (DataQualityResult)
  },
  "dataProfileResult": {
    object (DataProfileResult)
  }
  // End of list of possible types for union field result.
}
Fields
name

string

Output only. The relative resource name of the scan, of the form: projects/{project}/locations/{locationId}/dataScans/{datascan_id}, where project refers to a projectId or project_number and locationId refers to a GCP region.

uid

string

Output only. System generated globally unique ID for the scan. This ID will be different if the scan is deleted and re-created with the same name.

description

string

Optional. Description of the scan.

  • Must be between 1-1024 characters.
displayName

string

Optional. User friendly display name.

  • Must be between 1-256 characters.
labels

map (key: string, value: string)

Optional. User-defined labels for the scan.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

state

enum (State)

Output only. Current state of the DataScan.

createTime

string (Timestamp format)

Output only. The time when the scan was created.

A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z".

updateTime

string (Timestamp format)

Output only. The time when the scan was last updated.

A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z".

data

object (DataSource)

Required. The data source for DataScan.

executionSpec

object (ExecutionSpec)

Optional. DataScan execution settings.

If not specified, the fields in it will use their default values.

executionStatus

object (ExecutionStatus)

Output only. Status of the data scan execution.

type

enum (DataScanType)

Output only. The type of DataScan.

Union field spec. Data Scan related setting. It is required and immutable which means once data_quality_spec is set, it cannot be changed to data_profile_spec. spec can be only one of the following:
dataQualitySpec

object (DataQualitySpec)

DataQualityScan related setting.

dataProfileSpec

object (DataProfileSpec)

DataProfileScan related setting.

Union field result. The result of the data scan. result can be only one of the following:
dataQualityResult

object (DataQualityResult)

Output only. The result of the data quality scan.

dataProfileResult

object (DataProfileResult)

Output only. The result of the data profile scan.

DataSource

The data source for DataScan.

JSON representation
{

  // Union field source can be only one of the following:
  "entity": string,
  "resource": string
  // End of list of possible types for union field source.
}
Fields
Union field source. The source is required and immutable. Once it is set, it cannot be change to others. source can be only one of the following:
entity

string

Immutable. The Dataplex entity that represents the data source (e.g. BigQuery table) for DataScan, of the form: projects/{project_number}/locations/{locationId}/lakes/{lakeId}/zones/{zoneId}/entities/{entityId}.

resource

string

Immutable. The service-qualified full resource name of the cloud resource for a DataScan job to scan against. The field could be: BigQuery table of type "TABLE" for DataProfileScan/DataQualityScan Format: //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID

ExecutionSpec

DataScan execution settings.

JSON representation
{
  "trigger": {
    object (Trigger)
  },

  // Union field incremental can be only one of the following:
  "field": string
  // End of list of possible types for union field incremental.
}
Fields
trigger

object (Trigger)

Optional. Spec related to how often and when a scan should be triggered.

If not specified, the default is OnDemand, which means the scan will not run until the user calls dataScans.run API.

Union field incremental. Spec related to incremental scan of the data

When an option is selected for incremental scan, it cannot be unset or changed. If not specified, a data scan will run for all data in the table. incremental can be only one of the following:

field

string

Immutable. The unnested field (of type Date or Timestamp) that contains values which monotonically increase over time.

If not specified, a data scan will run for all data in the table.

Trigger

DataScan scheduling and trigger settings.

JSON representation
{

  // Union field mode can be only one of the following:
  "onDemand": {
    object (OnDemand)
  },
  "schedule": {
    object (Schedule)
  }
  // End of list of possible types for union field mode.
}
Fields

Union field mode. DataScan scheduling and trigger settings.

If not specified, the default is onDemand. mode can be only one of the following:

onDemand

object (OnDemand)

The scan runs once via dataScans.run API.

schedule

object (Schedule)

The scan is scheduled to run periodically.

OnDemand

This type has no fields.

The scan runs once via dataScans.run API.

Schedule

The scan is scheduled to run periodically.

JSON representation
{
  "cron": string
}
Fields
cron

string

Required. Cron schedule for running scans periodically.

To explicitly set a timezone in the cron tab, apply a prefix in the cron tab: "CRON_TZ=${IANA_TIME_ZONE}" or "TZ=${IANA_TIME_ZONE}". The ${IANA_TIME_ZONE} may only be a valid string from IANA time zone database (wikipedia). For example, CRON_TZ=America/New_York 1 * * * *, or TZ=America/New_York 1 * * * *.

This field is required for Schedule scans.

ExecutionStatus

Status of the data scan execution.

JSON representation
{
  "latestJobStartTime": string,
  "latestJobEndTime": string
}
Fields
latestJobStartTime

string (Timestamp format)

The time when the latest DataScanJob started.

A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z".

latestJobEndTime

string (Timestamp format)

The time when the latest DataScanJob ended.

A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z".

Methods

create

Creates a DataScan resource.

delete

Deletes a DataScan resource.

get

Gets a DataScan resource.

getIamPolicy

Gets the access control policy for a resource.

list

Lists DataScans.

patch

Updates a DataScan resource.

run

Runs an on-demand execution of a DataScan

setIamPolicy

Sets the access control policy on the specified resource.

testIamPermissions

Returns permissions that a caller has on the specified resource.