Package google.cloud.dataplex.v1

Index

ContentService

ContentService manages Notebook and SQL Scripts for Dataplex.

CreateContent

rpc CreateContent(CreateContentRequest) returns (Content)

Create a content.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteContent

rpc DeleteContent(DeleteContentRequest) returns (Empty)

Delete a content.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetContent

rpc GetContent(GetContentRequest) returns (Content)

Get a content resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetIamPolicy

rpc GetIamPolicy(GetIamPolicyRequest) returns (Policy)

Gets the access control policy for a contentitem resource. A NOT_FOUND error is returned if the resource does not exist. An empty policy is returned if the resource exists but does not have a policy set on it.

Caller must have Google IAM dataplex.content.getIamPolicy permission on the resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListContent

rpc ListContent(ListContentRequest) returns (ListContentResponse)

List content.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

SetIamPolicy

rpc SetIamPolicy(SetIamPolicyRequest) returns (Policy)

Sets the access control policy on the specified contentitem resource. Replaces any existing policy.

Caller must have Google IAM dataplex.content.setIamPolicy permission on the resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

TestIamPermissions

rpc TestIamPermissions(TestIamPermissionsRequest) returns (TestIamPermissionsResponse)

Returns the caller's permissions on a resource. If the resource does not exist, an empty set of permissions is returned (a NOT_FOUND error is not returned).

A caller is not required to have Google IAM permission to make this request.

Note: This operation is designed to be used for building permission-aware UIs and command-line tools, not for authorization checking. This operation may "fail open" without warning.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

UpdateContent

rpc UpdateContent(UpdateContentRequest) returns (Content)

Update a content. Only supports full resource update.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DataScanService

DataScanService manages DataScan resources which can be configured to run various types of data scanning workload and generate enriched metadata (e.g. Data Profile, Data Quality) for the data source.

CreateDataScan

rpc CreateDataScan(CreateDataScanRequest) returns (Operation)

Creates a DataScan resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteDataScan

rpc DeleteDataScan(DeleteDataScanRequest) returns (Operation)

Deletes a DataScan resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetDataScan

rpc GetDataScan(GetDataScanRequest) returns (DataScan)

Gets a DataScan resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetDataScanJob

rpc GetDataScanJob(GetDataScanJobRequest) returns (DataScanJob)

Gets a DataScanJob resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListDataScanJobs

rpc ListDataScanJobs(ListDataScanJobsRequest) returns (ListDataScanJobsResponse)

Lists DataScanJobs under the given DataScan.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListDataScans

rpc ListDataScans(ListDataScansRequest) returns (ListDataScansResponse)

Lists DataScans.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

RunDataScan

rpc RunDataScan(RunDataScanRequest) returns (RunDataScanResponse)

Runs an on-demand execution of a DataScan

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

UpdateDataScan

rpc UpdateDataScan(UpdateDataScanRequest) returns (Operation)

Updates a DataScan resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DataTaxonomyService

DataTaxonomyService enables attribute-based governance. The resources currently offered include DataTaxonomy and DataAttribute.

CreateDataAttribute

rpc CreateDataAttribute(CreateDataAttributeRequest) returns (Operation)

Create a DataAttribute resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

CreateDataAttributeBinding

rpc CreateDataAttributeBinding(CreateDataAttributeBindingRequest) returns (Operation)

Create a DataAttributeBinding resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

CreateDataTaxonomy

rpc CreateDataTaxonomy(CreateDataTaxonomyRequest) returns (Operation)

Create a DataTaxonomy resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteDataAttribute

rpc DeleteDataAttribute(DeleteDataAttributeRequest) returns (Operation)

Deletes a Data Attribute resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteDataAttributeBinding

rpc DeleteDataAttributeBinding(DeleteDataAttributeBindingRequest) returns (Operation)

Deletes a DataAttributeBinding resource. All attributes within the DataAttributeBinding must be deleted before the DataAttributeBinding can be deleted.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteDataTaxonomy

rpc DeleteDataTaxonomy(DeleteDataTaxonomyRequest) returns (Operation)

Deletes a DataTaxonomy resource. All attributes within the DataTaxonomy must be deleted before the DataTaxonomy can be deleted.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetDataAttribute

rpc GetDataAttribute(GetDataAttributeRequest) returns (DataAttribute)

Retrieves a Data Attribute resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetDataAttributeBinding

rpc GetDataAttributeBinding(GetDataAttributeBindingRequest) returns (DataAttributeBinding)

Retrieves a DataAttributeBinding resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetDataTaxonomy

rpc GetDataTaxonomy(GetDataTaxonomyRequest) returns (DataTaxonomy)

Retrieves a DataTaxonomy resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListDataAttributeBindings

rpc ListDataAttributeBindings(ListDataAttributeBindingsRequest) returns (ListDataAttributeBindingsResponse)

Lists DataAttributeBinding resources in a project and location.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListDataAttributes

rpc ListDataAttributes(ListDataAttributesRequest) returns (ListDataAttributesResponse)

Lists Data Attribute resources in a DataTaxonomy.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListDataTaxonomies

rpc ListDataTaxonomies(ListDataTaxonomiesRequest) returns (ListDataTaxonomiesResponse)

Lists DataTaxonomy resources in a project and location.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

UpdateDataAttribute

rpc UpdateDataAttribute(UpdateDataAttributeRequest) returns (Operation)

Updates a DataAttribute resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

UpdateDataAttributeBinding

rpc UpdateDataAttributeBinding(UpdateDataAttributeBindingRequest) returns (Operation)

Updates a DataAttributeBinding resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

UpdateDataTaxonomy

rpc UpdateDataTaxonomy(UpdateDataTaxonomyRequest) returns (Operation)

Updates a DataTaxonomy resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DataplexService

Dataplex service provides data lakes as a service. The primary resources offered by this service are Lakes, Zones and Assets which collectively allow a data administrator to organize, manage, secure and catalog data across their organization located across cloud projects in a variety of storage systems including Cloud Storage and BigQuery.

CancelJob

rpc CancelJob(CancelJobRequest) returns (Empty)

Cancel jobs running for the task resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

CreateAsset

rpc CreateAsset(CreateAssetRequest) returns (Operation)

Creates an asset resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

CreateEnvironment

rpc CreateEnvironment(CreateEnvironmentRequest) returns (Operation)

Create an environment resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

CreateLake

rpc CreateLake(CreateLakeRequest) returns (Operation)

Creates a lake resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

CreateTask

rpc CreateTask(CreateTaskRequest) returns (Operation)

Creates a task resource within a lake.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

CreateZone

rpc CreateZone(CreateZoneRequest) returns (Operation)

Creates a zone resource within a lake.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteAsset

rpc DeleteAsset(DeleteAssetRequest) returns (Operation)

Deletes an asset resource. The referenced storage resource is detached (default) or deleted based on the associated Lifecycle policy.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteEnvironment

rpc DeleteEnvironment(DeleteEnvironmentRequest) returns (Operation)

Delete the environment resource. All the child resources must have been deleted before environment deletion can be initiated.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteLake

rpc DeleteLake(DeleteLakeRequest) returns (Operation)

Deletes a lake resource. All zones within the lake must be deleted before the lake can be deleted.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteTask

rpc DeleteTask(DeleteTaskRequest) returns (Operation)

Delete the task resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteZone

rpc DeleteZone(DeleteZoneRequest) returns (Operation)

Deletes a zone resource. All assets within a zone must be deleted before the zone can be deleted.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetAsset

rpc GetAsset(GetAssetRequest) returns (Asset)

Retrieves an asset resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetEnvironment

rpc GetEnvironment(GetEnvironmentRequest) returns (Environment)

Get environment resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetJob

rpc GetJob(GetJobRequest) returns (Job)

Get job resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetLake

rpc GetLake(GetLakeRequest) returns (Lake)

Retrieves a lake resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetTask

rpc GetTask(GetTaskRequest) returns (Task)

Get task resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetZone

rpc GetZone(GetZoneRequest) returns (Zone)

Retrieves a zone resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListAssetActions

rpc ListAssetActions(ListAssetActionsRequest) returns (ListActionsResponse)

Lists action resources in an asset.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListAssets

rpc ListAssets(ListAssetsRequest) returns (ListAssetsResponse)

Lists asset resources in a zone.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListEnvironments

rpc ListEnvironments(ListEnvironmentsRequest) returns (ListEnvironmentsResponse)

Lists environments under the given lake.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListJobs

rpc ListJobs(ListJobsRequest) returns (ListJobsResponse)

Lists Jobs under the given task.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListLakeActions

rpc ListLakeActions(ListLakeActionsRequest) returns (ListActionsResponse)

Lists action resources in a lake.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListLakes

rpc ListLakes(ListLakesRequest) returns (ListLakesResponse)

Lists lake resources in a project and location.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListSessions

rpc ListSessions(ListSessionsRequest) returns (ListSessionsResponse)

Lists session resources in an environment.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListTasks

rpc ListTasks(ListTasksRequest) returns (ListTasksResponse)

Lists tasks under the given lake.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListZoneActions

rpc ListZoneActions(ListZoneActionsRequest) returns (ListActionsResponse)

Lists action resources in a zone.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListZones

rpc ListZones(ListZonesRequest) returns (ListZonesResponse)

Lists zone resources in a lake.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

RunTask

rpc RunTask(RunTaskRequest) returns (RunTaskResponse)

Run an on demand execution of a Task.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • dataplex.tasks.run

For more information, see the IAM documentation.

UpdateAsset

rpc UpdateAsset(UpdateAssetRequest) returns (Operation)

Updates an asset resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

UpdateEnvironment

rpc UpdateEnvironment(UpdateEnvironmentRequest) returns (Operation)

Update the environment resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

UpdateLake

rpc UpdateLake(UpdateLakeRequest) returns (Operation)

Updates a lake resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

UpdateTask

rpc UpdateTask(UpdateTaskRequest) returns (Operation)

Update the task resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

UpdateZone

rpc UpdateZone(UpdateZoneRequest) returns (Operation)

Updates a zone resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

MetadataService

Metadata service manages metadata resources such as tables, filesets and partitions.

CreateEntity

rpc CreateEntity(CreateEntityRequest) returns (Entity)

Create a metadata entity.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

CreatePartition

rpc CreatePartition(CreatePartitionRequest) returns (Partition)

Create a metadata partition.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteEntity

rpc DeleteEntity(DeleteEntityRequest) returns (Empty)

Delete a metadata entity.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeletePartition

rpc DeletePartition(DeletePartitionRequest) returns (Empty)

Delete a metadata partition.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetEntity

rpc GetEntity(GetEntityRequest) returns (Entity)

Get a metadata entity.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetPartition

rpc GetPartition(GetPartitionRequest) returns (Partition)

Get a metadata partition of an entity.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListEntities

rpc ListEntities(ListEntitiesRequest) returns (ListEntitiesResponse)

List metadata entities in a zone.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListPartitions

rpc ListPartitions(ListPartitionsRequest) returns (ListPartitionsResponse)

List metadata partitions of an entity.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

UpdateEntity

rpc UpdateEntity(UpdateEntityRequest) returns (Entity)

Update a metadata entity. Only supports full resource update.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

Action

Action represents an issue requiring administrator action for resolution.

Fields
category

Category

The category of issue associated with the action.

issue

string

Detailed description of the issue requiring action.

detect_time

Timestamp

The time that the issue was detected.

name

string

Output only. The relative resource name of the action, of the form: projects/{project}/locations/{location}/lakes/{lake}/actions/{action} projects/{project}/locations/{location}/lakes/{lake}/zones/{zone}/actions/{action} projects/{project}/locations/{location}/lakes/{lake}/zones/{zone}/assets/{asset}/actions/{action}.

lake

string

Output only. The relative resource name of the lake, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}.

zone

string

Output only. The relative resource name of the zone, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}.

asset

string

Output only. The relative resource name of the asset, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/assets/{asset_id}.

data_locations[]

string

The list of data locations associated with this action. Cloud Storage locations are represented as URI paths(E.g. gs://bucket/table1/year=2020/month=Jan/). BigQuery locations refer to resource names(E.g. bigquery.googleapis.com/projects/project-id/datasets/dataset-id).

Union field details. Additional details about the action based on the action category. details can be only one of the following:
invalid_data_format

InvalidDataFormat

Details for issues related to invalid or unsupported data formats.

incompatible_data_schema

IncompatibleDataSchema

Details for issues related to incompatible schemas detected within data.

invalid_data_partition

InvalidDataPartition

Details for issues related to invalid or unsupported data partition structure.

missing_data

MissingData

Details for issues related to absence of data within managed resources.

missing_resource

MissingResource

Details for issues related to absence of a managed resource.

unauthorized_resource

UnauthorizedResource

Details for issues related to lack of permissions to access data resources.

failed_security_policy_apply

FailedSecurityPolicyApply

Details for issues related to applying security policy.

invalid_data_organization

InvalidDataOrganization

Details for issues related to invalid data arrangement.

Category

The category of issues.

Enums
CATEGORY_UNSPECIFIED Unspecified category.
RESOURCE_MANAGEMENT Resource management related issues.
SECURITY_POLICY Security policy related issues.
DATA_DISCOVERY Data and discovery related issues.

FailedSecurityPolicyApply

Failed to apply security policy to the managed resource(s) under a lake, zone or an asset. For a lake or zone resource, one or more underlying assets has a failure applying security policy to the associated managed resource.

Fields
asset

string

Resource name of one of the assets with failing security policy application. Populated for a lake or zone resource only.

IncompatibleDataSchema

Action details for incompatible schemas detected by discovery.

Fields
table

string

The name of the table containing invalid data.

existing_schema

string

The existing and expected schema of the table. The schema is provided as a JSON formatted structure listing columns and data types.

new_schema

string

The new and incompatible schema within the table. The schema is provided as a JSON formatted structured listing columns and data types.

sampled_data_locations[]

string

The list of data locations sampled and used for format/schema inference.

schema_change

SchemaChange

Whether the action relates to a schema that is incompatible or modified.

SchemaChange

Whether the action relates to a schema that is incompatible or modified.

Enums
SCHEMA_CHANGE_UNSPECIFIED Schema change unspecified.
INCOMPATIBLE Newly discovered schema is incompatible with existing schema.
MODIFIED Newly discovered schema has changed from existing schema for data in a curated zone.

InvalidDataFormat

Action details for invalid or unsupported data files detected by discovery.

Fields
sampled_data_locations[]

string

The list of data locations sampled and used for format/schema inference.

expected_format

string

The expected data format of the entity.

new_format

string

The new unexpected data format within the entity.

InvalidDataOrganization

This type has no fields.

Action details for invalid data arrangement.

InvalidDataPartition

Action details for invalid or unsupported partitions detected by discovery.

Fields
expected_structure

PartitionStructure

The issue type of InvalidDataPartition.

PartitionStructure

The expected partition structure.

Enums
PARTITION_STRUCTURE_UNSPECIFIED PartitionStructure unspecified.
CONSISTENT_KEYS Consistent hive-style partition definition (both raw and curated zone).
HIVE_STYLE_KEYS Hive style partition definition (curated zone only).

MissingData

This type has no fields.

Action details for absence of data detected by discovery.

MissingResource

This type has no fields.

Action details for resource references in assets that cannot be located.

UnauthorizedResource

This type has no fields.

Action details for unauthorized resource issues raised to indicate that the service account associated with the lake instance is not authorized to access or manage the resource associated with an asset.

Asset

An asset represents a cloud resource that is being managed within a lake as a member of a zone.

Fields
name

string

Output only. The relative resource name of the asset, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/assets/{asset_id}.

display_name

string

Optional. User friendly display name.

uid

string

Output only. System generated globally unique ID for the asset. This ID will be different if the asset is deleted and re-created with the same name.

create_time

Timestamp

Output only. The time when the asset was created.

update_time

Timestamp

Output only. The time when the asset was last updated.

labels

map<string, string>

Optional. User defined labels for the asset.

description

string

Optional. Description of the asset.

state

State

Output only. Current state of the asset.

resource_spec

ResourceSpec

Required. Specification of the resource that is referenced by this asset.

resource_status

ResourceStatus

Output only. Status of the resource referenced by this asset.

security_status

SecurityStatus

Output only. Status of the security policy applied to resource referenced by this asset.

discovery_spec

DiscoverySpec

Optional. Specification of the discovery feature applied to data referenced by this asset. When this spec is left unset, the asset will use the spec set on the parent zone.

discovery_status

DiscoveryStatus

Output only. Status of the discovery feature applied to data referenced by this asset.

DiscoverySpec

Settings to manage the metadata discovery and publishing for an asset.

Fields
enabled

bool

Optional. Whether discovery is enabled.

include_patterns[]

string

Optional. The list of patterns to apply for selecting data to include during discovery if only a subset of the data should considered. For Cloud Storage bucket assets, these are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these are interpreted as patterns to match table names.

exclude_patterns[]

string

Optional. The list of patterns to apply for selecting data to exclude during discovery. For Cloud Storage bucket assets, these are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these are interpreted as patterns to match table names.

csv_options

CsvOptions

Optional. Configuration for CSV data.

json_options

JsonOptions

Optional. Configuration for Json data.

Union field trigger. Determines when discovery is triggered. trigger can be only one of the following:
schedule

string

Optional. Cron schedule (https://en.wikipedia.org/wiki/Cron) for running discovery periodically. Successive discovery runs must be scheduled at least 60 minutes apart. The default value is to run discovery every 60 minutes. To explicitly set a timezone to the cron tab, apply a prefix in the cron tab: "CRON_TZ=${IANA_TIME_ZONE}" or TZ=${IANA_TIME_ZONE}". The ${IANA_TIME_ZONE} may only be a valid string from IANA time zone database. For example, CRON_TZ=America/New_York 1 * * * *, or TZ=America/New_York 1 * * * *.

CsvOptions

Describe CSV and similar semi-structured data formats.

Fields
header_rows

int32

Optional. The number of rows to interpret as header rows that should be skipped when reading data rows.

delimiter

string

Optional. The delimiter being used to separate values. This defaults to ','.

encoding

string

Optional. The character encoding of the data. The default is UTF-8.

disable_type_inference

bool

Optional. Whether to disable the inference of data type for CSV data. If true, all columns will be registered as strings.

JsonOptions

Describe JSON data format.

Fields
encoding

string

Optional. The character encoding of the data. The default is UTF-8.

disable_type_inference

bool

Optional. Whether to disable the inference of data type for Json data. If true, all columns will be registered as their primitive types (strings, number or boolean).

DiscoveryStatus

Status of discovery for an asset.

Fields
state

State

The current status of the discovery feature.

message

string

Additional information about the current state.

update_time

Timestamp

Last update time of the status.

last_run_time

Timestamp

The start time of the last discovery run.

stats

Stats

Data Stats of the asset reported by discovery.

last_run_duration

Duration

The duration of the last discovery run.

State

Current state of discovery.

Enums
STATE_UNSPECIFIED State is unspecified.
SCHEDULED Discovery for the asset is scheduled.
IN_PROGRESS Discovery for the asset is running.
PAUSED Discovery for the asset is currently paused (e.g. due to a lack of available resources). It will be automatically resumed.
DISABLED Discovery for the asset is disabled.

Stats

The aggregated data statistics for the asset reported by discovery.

Fields
data_items

int64

The count of data items within the referenced resource.

data_size

int64

The number of stored data bytes within the referenced resource.

tables

int64

The count of table entities within the referenced resource.

filesets

int64

The count of fileset entities within the referenced resource.

ResourceSpec

Identifies the cloud resource that is referenced by this asset.

Fields
name

string

Immutable. Relative name of the cloud resource that contains the data that is being managed within a lake. For example: projects/{project_number}/buckets/{bucket_id} projects/{project_number}/datasets/{dataset_id}

type

Type

Required. Immutable. Type of resource.

read_access_mode

AccessMode

Optional. Determines how read permissions are handled for each asset and their associated tables. Only available to storage buckets assets.

AccessMode

Access Mode determines how data stored within the resource is read. This is only applicable to storage bucket assets.

Enums
ACCESS_MODE_UNSPECIFIED Access mode unspecified.
DIRECT Default. Data is accessed directly using storage APIs.
MANAGED Data is accessed through a managed interface using BigQuery APIs.

Type

Type of resource.

Enums
TYPE_UNSPECIFIED Type not specified.
STORAGE_BUCKET Cloud Storage bucket.
BIGQUERY_DATASET BigQuery dataset.

ResourceStatus

Status of the resource referenced by an asset.

Fields
state

State

The current state of the managed resource.

message

string

Additional information about the current state.

update_time

Timestamp

Last update time of the status.

managed_access_identity

string

Output only. Service account associated with the BigQuery Connection.

State

The state of a resource.

Enums
STATE_UNSPECIFIED State unspecified.
READY Resource does not have any errors.
ERROR Resource has errors.

SecurityStatus

Security policy status of the asset. Data security policy, i.e., readers, writers & owners, should be specified in the lake/zone/asset IAM policy.

Fields
state

State

The current state of the security policy applied to the attached resource.

message

string

Additional information about the current state.

update_time

Timestamp

Last update time of the status.

State

The state of the security policy.

Enums
STATE_UNSPECIFIED State unspecified.
READY Security policy has been successfully applied to the attached resource.
APPLYING Security policy is in the process of being applied to the attached resource.
ERROR Security policy could not be applied to the attached resource due to errors.

AssetStatus

Aggregated status of the underlying assets of a lake or zone.

Fields
update_time

Timestamp

Last update time of the status.

active_assets

int32

Number of active assets.

security_policy_applying_assets

int32

Number of assets that are in process of updating the security policy on attached resources.

CancelJobRequest

Cancel task jobs.

Fields
name

string

Required. The resource name of the job: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/task/{task_id}/job/{job_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.tasks.cancel

Content

Content represents a user-visible notebook or a sql script

Fields
name

string

Output only. The relative resource name of the content, of the form: projects/{project_id}/locations/{location_id}/lakes/{lake_id}/content/{content_id}

uid

string

Output only. System generated globally unique ID for the content. This ID will be different if the content is deleted and re-created with the same name.

path

string

Required. The path for the Content file, represented as directory structure. Unique within a lake. Limited to alphanumerics, hyphens, underscores, dots and slashes.

create_time

Timestamp

Output only. Content creation time.

update_time

Timestamp

Output only. The time when the content was last updated.

labels

map<string, string>

Optional. User defined labels for the content.

description

string

Optional. Description of the content.

Union field data. Only returned in GetContent requests and not in ListContent request. data can be only one of the following:
data_text

string

Required. Content data in string format.

Union field content. Types of content content can be only one of the following:
sql_script

SqlScript

Sql Script related configurations.

notebook

Notebook

Notebook related configurations.

Notebook

Configuration for Notebook content.

Fields
kernel_type

KernelType

Required. Kernel Type of the notebook.

KernelType

Kernel Type of the Jupyter notebook.

Enums
KERNEL_TYPE_UNSPECIFIED Kernel Type unspecified.
PYTHON3 Python 3 Kernel.

SqlScript

Configuration for the Sql Script content.

Fields
engine

QueryEngine

Required. Query Engine to be used for the Sql Query.

QueryEngine

Query Engine Type of the SQL Script.

Enums
QUERY_ENGINE_UNSPECIFIED Value was unspecified.
SPARK Spark SQL Query.

CreateAssetRequest

Create asset request.

Fields
parent

string

Required. The resource name of the parent zone: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.assets.create
asset_id

string

Required. Asset identifier. This ID will be used to generate names such as table names when publishing metadata to Hive Metastore and BigQuery. * Must contain only lowercase letters, numbers and hyphens. * Must start with a letter. * Must end with a number or a letter. * Must be between 1-63 characters. * Must be unique within the zone.

asset

Asset

Required. Asset resource.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreateContentRequest

Create content request.

Fields
parent

string

Required. The resource name of the parent lake: projects/{project_id}/locations/{location_id}/lakes/{lake_id}

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.content.create
content

Content

Required. Content resource.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreateDataAttributeBindingRequest

Create DataAttributeBinding request.

Fields
parent

string

Required. The resource name of the parent data taxonomy projects/{project_number}/locations/{location_id}

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.dataAttributeBindings.create
data_attribute_binding_id

string

Required. DataAttributeBinding identifier. * Must contain only lowercase letters, numbers and hyphens. * Must start with a letter. * Must be between 1-63 characters. * Must end with a number or a letter. * Must be unique within the Location.

data_attribute_binding

DataAttributeBinding

Required. DataAttributeBinding resource.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreateDataAttributeRequest

Create DataAttribute request.

Fields
parent

string

Required. The resource name of the parent data taxonomy projects/{project_number}/locations/{location_id}/dataTaxonomies/{data_taxonomy_id}

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.dataAttributes.create
data_attribute_id

string

Required. DataAttribute identifier. * Must contain only lowercase letters, numbers and hyphens. * Must start with a letter. * Must be between 1-63 characters. * Must end with a number or a letter. * Must be unique within the DataTaxonomy.

data_attribute

DataAttribute

Required. DataAttribute resource.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreateDataScanRequest

Create dataScan request.

Fields
parent

string

Required. The resource name of the parent location: projects/{project}/locations/{location_id} where project refers to a project_id or project_number and location_id refers to a GCP region.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.datascans.create
data_scan

DataScan

Required. DataScan resource.

data_scan_id

string

Required. DataScan identifier.

  • Must contain only lowercase letters, numbers and hyphens.
  • Must start with a letter.
  • Must end with a number or a letter.
  • Must be between 1-63 characters.
  • Must be unique within the customer project / location.
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreateDataTaxonomyRequest

Create DataTaxonomy request.

Fields
parent

string

Required. The resource name of the data taxonomy location, of the form: projects/{project_number}/locations/{location_id} where location_id refers to a GCP region.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.dataTaxonomies.create
data_taxonomy_id

string

Required. DataTaxonomy identifier. * Must contain only lowercase letters, numbers and hyphens. * Must start with a letter. * Must be between 1-63 characters. * Must end with a number or a letter. * Must be unique within the Project.

data_taxonomy

DataTaxonomy

Required. DataTaxonomy resource.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreateEntityRequest

Create a metadata entity request.

Fields
parent

string

Required. The resource name of the parent zone: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.entities.create
entity

Entity

Required. Entity resource.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreateEnvironmentRequest

Create environment request.

Fields
parent

string

Required. The resource name of the parent lake: projects/{project_id}/locations/{location_id}/lakes/{lake_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.environments.create
environment_id

string

Required. Environment identifier. * Must contain only lowercase letters, numbers and hyphens. * Must start with a letter. * Must be between 1-63 characters. * Must end with a number or a letter. * Must be unique within the lake.

environment

Environment

Required. Environment resource.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreateLakeRequest

Create lake request.

Fields
parent

string

Required. The resource name of the lake location, of the form: projects/{project_number}/locations/{location_id} where location_id refers to a GCP region.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.lakes.create
lake_id

string

Required. Lake identifier. This ID will be used to generate names such as database and dataset names when publishing metadata to Hive Metastore and BigQuery. * Must contain only lowercase letters, numbers and hyphens. * Must start with a letter. * Must end with a number or a letter. * Must be between 1-63 characters. * Must be unique within the customer project / location.

lake

Lake

Required. Lake resource

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreatePartitionRequest

Create metadata partition request.

Fields
parent

string

Required. The resource name of the parent zone: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/entities/{entity_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.partitions.create
partition

Partition

Required. Partition resource.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreateTaskRequest

Create task request.

Fields
parent

string

Required. The resource name of the parent lake: projects/{project_number}/locations/{location_id}/lakes/{lake_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.tasks.create
task_id

string

Required. Task identifier.

task

Task

Required. Task resource.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreateZoneRequest

Create zone request.

Fields
parent

string

Required. The resource name of the parent lake: projects/{project_number}/locations/{location_id}/lakes/{lake_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.zones.create
zone_id

string

Required. Zone identifier. This ID will be used to generate names such as database and dataset names when publishing metadata to Hive Metastore and BigQuery. * Must contain only lowercase letters, numbers and hyphens. * Must start with a letter. * Must end with a number or a letter. * Must be between 1-63 characters. * Must be unique across all lakes from all locations in a project. * Must not be one of the reserved IDs (i.e. "default", "global-temp")

zone

Zone

Required. Zone resource.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

DataAccessSpec

DataAccessSpec holds the access control configuration to be enforced on data stored within resources (eg: rows, columns in BigQuery Tables). When associated with data, the data is only accessible to principals explicitly granted access through the DataAccessSpec. Principals with access to the containing resource are not implicitly granted access.

Fields
readers[]

string

Optional. The format of strings follows the pattern followed by IAM in the bindings. user:{email}, serviceAccount:{email} group:{email}. The set of principals to be granted reader role on data stored within resources.

DataAttribute

Denotes one dataAttribute in a dataTaxonomy, for example, PII. DataAttribute resources can be defined in a hierarchy. A single dataAttribute resource can contain specs of multiple types

PII
  - ResourceAccessSpec :
                - readers :foo@bar.com
  - DataAccessSpec :
                - readers :bar@foo.com
Fields
name

string

Output only. The relative resource name of the dataAttribute, of the form: projects/{project_number}/locations/{location_id}/dataTaxonomies/{dataTaxonomy}/attributes/{data_attribute_id}.

uid

string

Output only. System generated globally unique ID for the DataAttribute. This ID will be different if the DataAttribute is deleted and re-created with the same name.

create_time

Timestamp

Output only. The time when the DataAttribute was created.

update_time

Timestamp

Output only. The time when the DataAttribute was last updated.

description

string

Optional. Description of the DataAttribute.

display_name

string

Optional. User friendly display name.

labels

map<string, string>

Optional. User-defined labels for the DataAttribute.

parent_id

string

Optional. The ID of the parent DataAttribute resource, should belong to the same data taxonomy. Circular dependency in parent chain is not valid. Maximum depth of the hierarchy allowed is 4. [a -> b -> c -> d -> e, depth = 4]

attribute_count

int32

Output only. The number of child attributes present for this attribute.

etag

string

This checksum is computed by the server based on the value of other fields, and may be sent on update and delete requests to ensure the client has an up-to-date value before proceeding.

resource_access_spec

ResourceAccessSpec

Optional. Specified when applied to a resource (eg: Cloud Storage bucket, BigQuery dataset, BigQuery table).

data_access_spec

DataAccessSpec

Optional. Specified when applied to data stored on the resource (eg: rows, columns in BigQuery Tables).

DataAttributeBinding

DataAttributeBinding represents binding of attributes to resources. Eg: Bind 'CustomerInfo' entity with 'PII' attribute.

Fields
name

string

Output only. The relative resource name of the Data Attribute Binding, of the form: projects/{project_number}/locations/{location}/dataAttributeBindings/{data_attribute_binding_id}

uid

string

Output only. System generated globally unique ID for the DataAttributeBinding. This ID will be different if the DataAttributeBinding is deleted and re-created with the same name.

create_time

Timestamp

Output only. The time when the DataAttributeBinding was created.

update_time

Timestamp

Output only. The time when the DataAttributeBinding was last updated.

description

string

Optional. Description of the DataAttributeBinding.

display_name

string

Optional. User friendly display name.

labels

map<string, string>

Optional. User-defined labels for the DataAttributeBinding.

etag

string

This checksum is computed by the server based on the value of other fields, and may be sent on update and delete requests to ensure the client has an up-to-date value before proceeding. Etags must be used when calling the DeleteDataAttributeBinding and the UpdateDataAttributeBinding method.

attributes[]

string

Optional. List of attributes to be associated with the resource, provided in the form: projects/{project}/locations/{location}/dataTaxonomies/{dataTaxonomy}/attributes/{data_attribute_id}

paths[]

Path

Optional. The list of paths for items within the associated resource (eg. columns and partitions within a table) along with attribute bindings.

Union field resource_reference. The reference to the resource that is associated to attributes, or the query to match resources and associate attributes. resource_reference can be only one of the following:
resource

string

Optional. Immutable. The resource name of the resource that is associated to attributes. Presently, only entity resource is supported in the form: projects/{project}/locations/{location}/lakes/{lake}/zones/{zone}/entities/{entity_id} Must belong in the same project and region as the attribute binding, and there can only exist one active binding for a resource.

Path

Represents a subresource of the given resource, and associated bindings with it. Currently supported subresources are column and partition schema fields within a table.

Fields
name

string

Required. The name identifier of the path. Nested columns should be of the form: 'address.city'.

attributes[]

string

Optional. List of attributes to be associated with the path of the resource, provided in the form: projects/{project}/locations/{location}/dataTaxonomies/{dataTaxonomy}/attributes/{data_attribute_id}

DataProfileResult

DataProfileResult defines the output of DataProfileScan. Each field of the table will have field type specific profile result.

Fields
row_count

int64

The count of rows scanned.

profile

Profile

The profile information per field.

scanned_data

ScannedData

The data scanned for this result.

post_scan_actions_result

PostScanActionsResult

Output only. The result of post scan actions.

PostScanActionsResult

The result of post scan actions of DataProfileScan job.

Fields
bigquery_export_result

BigQueryExportResult

Output only. The result of BigQuery export post scan action.

BigQueryExportResult

The result of BigQuery export post scan action.

Fields
state

State

Output only. Execution state for the BigQuery exporting.

message

string

Output only. Additional information about the BigQuery exporting.

State

Execution state for the exporting.

Enums
STATE_UNSPECIFIED The exporting state is unspecified.
SUCCEEDED The exporting completed successfully.
FAILED The exporting is no longer running due to an error.
SKIPPED The exporting is skipped due to no valid scan result to export (usually caused by scan failed).

Profile

Contains name, type, mode and field type specific profile information.

Fields
fields[]

Field

List of fields with structural and profile information for each field.

Field

A field within a table.

Fields
name

string

The name of the field.

type

string

The data type retrieved from the schema of the data source. For instance, for a BigQuery native table, it is the BigQuery Table Schema. For a Dataplex Entity, it is the Entity Schema.

mode

string

The mode of the field. Possible values include:

  • REQUIRED, if it is a required field.
  • NULLABLE, if it is an optional field.
  • REPEATED, if it is a repeated field.
profile

ProfileInfo

Profile information for the corresponding field.

ProfileInfo

The profile information for each field type.

Fields
null_ratio

double

Ratio of rows with null value against total scanned rows.

distinct_ratio

double

Ratio of rows with distinct values against total scanned rows. Not available for complex non-groupable field type RECORD and fields with REPEATABLE mode.

top_n_values[]

TopNValue

The list of top N non-null values, frequency and ratio with which they occur in the scanned data. N is 10 or equal to the number of distinct values in the field, whichever is smaller. Not available for complex non-groupable field type RECORD and fields with REPEATABLE mode.

Union field field_info. Structural and profile information for specific field type. Not available, if mode is REPEATABLE. field_info can be only one of the following:
string_profile

StringFieldInfo

String type field information.

integer_profile

IntegerFieldInfo

Integer type field information.

double_profile

DoubleFieldInfo

Double type field information.

DoubleFieldInfo

The profile information for a double type field.

Fields
average

double

Average of non-null values in the scanned data. NaN, if the field has a NaN.

standard_deviation

double

Standard deviation of non-null values in the scanned data. NaN, if the field has a NaN.

min

double

Minimum of non-null values in the scanned data. NaN, if the field has a NaN.

quartiles[]

double

A quartile divides the number of data points into four parts, or quarters, of more-or-less equal size. Three main quartiles used are: The first quartile (Q1) splits off the lowest 25% of data from the highest 75%. It is also known as the lower or 25th empirical quartile, as 25% of the data is below this point. The second quartile (Q2) is the median of a data set. So, 50% of the data lies below this point. The third quartile (Q3) splits off the highest 25% of data from the lowest 75%. It is known as the upper or 75th empirical quartile, as 75% of the data lies below this point. Here, the quartiles is provided as an ordered list of quartile values for the scanned data, occurring in order Q1, median, Q3.

max

double

Maximum of non-null values in the scanned data. NaN, if the field has a NaN.

IntegerFieldInfo

The profile information for an integer type field.

Fields
average

double

Average of non-null values in the scanned data. NaN, if the field has a NaN.

standard_deviation

double

Standard deviation of non-null values in the scanned data. NaN, if the field has a NaN.

min

int64

Minimum of non-null values in the scanned data. NaN, if the field has a NaN.

quartiles[]

int64

A quartile divides the number of data points into four parts, or quarters, of more-or-less equal size. Three main quartiles used are: The first quartile (Q1) splits off the lowest 25% of data from the highest 75%. It is also known as the lower or 25th empirical quartile, as 25% of the data is below this point. The second quartile (Q2) is the median of a data set. So, 50% of the data lies below this point. The third quartile (Q3) splits off the highest 25% of data from the lowest 75%. It is known as the upper or 75th empirical quartile, as 75% of the data lies below this point. Here, the quartiles is provided as an ordered list of approximate quartile values for the scanned data, occurring in order Q1, median, Q3.

max

int64

Maximum of non-null values in the scanned data. NaN, if the field has a NaN.

StringFieldInfo

The profile information for a string type field.

Fields
min_length

int64

Minimum length of non-null values in the scanned data.

max_length

int64

Maximum length of non-null values in the scanned data.

average_length

double

Average length of non-null values in the scanned data.

TopNValue

Top N non-null values in the scanned data.

Fields
value

string

String value of a top N non-null value.

count

int64

Count of the corresponding value in the scanned data.

ratio

double

Ratio of the corresponding value in the field against the total number of rows in the scanned data.

DataProfileSpec

DataProfileScan related setting.

Fields
sampling_percent

float

Optional. The percentage of the records to be selected from the dataset for DataScan.

  • Value can range between 0.0 and 100.0 with up to 3 significant decimal digits.
  • Sampling is not applied if sampling_percent is not specified, 0 or 100.
row_filter

string

Optional. A filter applied to all rows in a single DataScan job. The filter needs to be a valid SQL expression for a WHERE clause in BigQuery standard SQL syntax. Example: col1 >= 0 AND col2 < 10

post_scan_actions

PostScanActions

Optional. Actions to take upon job completion..

include_fields

SelectedFields

Optional. The fields to include in data profile.

If not specified, all fields at the time of profile scan job execution are included, except for ones listed in exclude_fields.

exclude_fields

SelectedFields

Optional. The fields to exclude from data profile.

If specified, the fields will be excluded from data profile, regardless of include_fields value.

PostScanActions

The configuration of post scan actions of DataProfileScan job.

Fields
bigquery_export

BigQueryExport

Optional. If set, results will be exported to the provided BigQuery table.

BigQueryExport

The configuration of BigQuery export post scan action.

Fields
results_table

string

Optional. The BigQuery table to export DataProfileScan results to. Format: //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID

SelectedFields

The specification for fields to include or exclude in data profile scan.

Fields
field_names[]

string

Optional. Expected input is a list of fully qualified names of fields as in the schema.

Only top-level field names for nested fields are supported. For instance, if 'x' is of nested field type, listing 'x' is supported but 'x.y.z' is not supported. Here 'y' and 'y.z' are nested fields of 'x'.

DataQualityColumnResult

DataQualityColumnResult provides a more detailed, per-column view of the results.

Fields
column

string

Output only. The column specified in the DataQualityRule.

score

float

Output only. The column-level data quality score for this data scan job if and only if the 'column' field is set.

The score ranges between between [0, 100] (up to two decimal points).

DataQualityDimension

A dimension captures data quality intent about a defined subset of the rules specified.

Fields
name

string

The dimension name a rule belongs to. Supported dimensions are ["COMPLETENESS", "ACCURACY", "CONSISTENCY", "VALIDITY", "UNIQUENESS", "INTEGRITY"]

DataQualityDimensionResult

DataQualityDimensionResult provides a more detailed, per-dimension view of the results.

Fields
dimension

DataQualityDimension

Output only. The dimension config specified in the DataQualitySpec, as is.

passed

bool

Whether the dimension passed or failed.

score

float

Output only. The dimension-level data quality score for this data scan job if and only if the 'dimension' field is set.

The score ranges between [0, 100] (up to two decimal points).

DataQualityResult

The output of a DataQualityScan.

Fields
passed

bool

Overall data quality result -- true if all rules passed.

dimensions[]

DataQualityDimensionResult

A list of results at the dimension level.

A dimension will have a corresponding DataQualityDimensionResult if and only if there is at least one rule with the 'dimension' field set to it.

columns[]

DataQualityColumnResult

Output only. A list of results at the column level.

A column will have a corresponding DataQualityColumnResult if and only if there is at least one rule with the 'column' field set to it.

rules[]

DataQualityRuleResult

A list of all the rules in a job, and their results.

row_count

int64

The count of rows processed.

scanned_data

ScannedData

The data scanned for this result.

post_scan_actions_result

PostScanActionsResult

Output only. The result of post scan actions.

score

float

Output only. The overall data quality score.

The score ranges between [0, 100] (up to two decimal points).

PostScanActionsResult

The result of post scan actions of DataQualityScan job.

Fields
bigquery_export_result

BigQueryExportResult

Output only. The result of BigQuery export post scan action.

BigQueryExportResult

The result of BigQuery export post scan action.

Fields
state

State

Output only. Execution state for the BigQuery exporting.

message

string

Output only. Additional information about the BigQuery exporting.

State

Execution state for the exporting.

Enums
STATE_UNSPECIFIED The exporting state is unspecified.
SUCCEEDED The exporting completed successfully.
FAILED The exporting is no longer running due to an error.
SKIPPED The exporting is skipped due to no valid scan result to export (usually caused by scan failed).

DataQualityRule

A rule captures data quality intent about a data source.

Fields
column

string

Optional. The unnested column which this rule is evaluated against.

ignore_null

bool

Optional. Rows with null values will automatically fail a rule, unless ignore_null is true. In that case, such null rows are trivially considered passing.

This field is only valid for the following type of rules:

  • RangeExpectation
  • RegexExpectation
  • SetExpectation
  • UniquenessExpectation
dimension

string

Required. The dimension a rule belongs to. Results are also aggregated at the dimension level. Supported dimensions are ["COMPLETENESS", "ACCURACY", "CONSISTENCY", "VALIDITY", "UNIQUENESS", "INTEGRITY"]

threshold

double

Optional. The minimum ratio of passing_rows / total_rows required to pass this rule, with a range of [0.0, 1.0].

0 indicates default value (i.e. 1.0).

This field is only valid for row-level type rules.

name

string

Optional. A mutable name for the rule.

  • The name must contain only letters (a-z, A-Z), numbers (0-9), or hyphens (-).
  • The maximum length is 63 characters.
  • Must start with a letter.
  • Must end with a number or a letter.
description

string

Optional. Description of the rule.

  • The maximum length is 1,024 characters.
Union field rule_type. The rule-specific configuration. rule_type can be only one of the following:
range_expectation

RangeExpectation

Row-level rule which evaluates whether each column value lies between a specified range.

non_null_expectation

NonNullExpectation

Row-level rule which evaluates whether each column value is null.

set_expectation

SetExpectation

Row-level rule which evaluates whether each column value is contained by a specified set.

regex_expectation

RegexExpectation

Row-level rule which evaluates whether each column value matches a specified regex.

uniqueness_expectation

UniquenessExpectation

Row-level rule which evaluates whether each column value is unique.

statistic_range_expectation

StatisticRangeExpectation

Aggregate rule which evaluates whether the column aggregate statistic lies between a specified range.

row_condition_expectation

RowConditionExpectation

Row-level rule which evaluates whether each row in a table passes the specified condition.

table_condition_expectation

TableConditionExpectation

Aggregate rule which evaluates whether the provided expression is true for a table.

NonNullExpectation

This type has no fields.

Evaluates whether each column value is null.

RangeExpectation

Evaluates whether each column value lies between a specified range.

Fields
min_value

string

Optional. The minimum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.

max_value

string

Optional. The maximum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.

strict_min_enabled

bool

Optional. Whether each value needs to be strictly greater than ('>') the minimum, or if equality is allowed.

Only relevant if a min_value has been defined. Default = false.

strict_max_enabled

bool

Optional. Whether each value needs to be strictly lesser than ('<') the maximum, or if equality is allowed.

Only relevant if a max_value has been defined. Default = false.

RegexExpectation

Evaluates whether each column value matches a specified regex.

Fields
regex

string

Optional. A regular expression the column value is expected to match.

RowConditionExpectation

Evaluates whether each row passes the specified condition.

The SQL expression needs to use BigQuery standard SQL syntax and should produce a boolean value per row as the result.

Example: col1 >= 0 AND col2 < 10

Fields
sql_expression

string

Optional. The SQL expression.

SetExpectation

Evaluates whether each column value is contained by a specified set.

Fields
values[]

string

Optional. Expected values for the column value.

StatisticRangeExpectation

Evaluates whether the column aggregate statistic lies between a specified range.

Fields
statistic

ColumnStatistic

Optional. The aggregate metric to evaluate.

min_value

string

Optional. The minimum column statistic value allowed for a row to pass this validation.

At least one of min_value and max_value need to be provided.

max_value

string

Optional. The maximum column statistic value allowed for a row to pass this validation.

At least one of min_value and max_value need to be provided.

strict_min_enabled

bool

Optional. Whether column statistic needs to be strictly greater than ('>') the minimum, or if equality is allowed.

Only relevant if a min_value has been defined. Default = false.

strict_max_enabled

bool

Optional. Whether column statistic needs to be strictly lesser than ('<') the maximum, or if equality is allowed.

Only relevant if a max_value has been defined. Default = false.

ColumnStatistic

The list of aggregate metrics a rule can be evaluated against.

Enums
STATISTIC_UNDEFINED Unspecified statistic type
MEAN Evaluate the column mean
MIN Evaluate the column min
MAX Evaluate the column max

TableConditionExpectation

Evaluates whether the provided expression is true.

The SQL expression needs to use BigQuery standard SQL syntax and should produce a scalar boolean result.

Example: MIN(col1) >= 0

Fields
sql_expression

string

Optional. The SQL expression.

UniquenessExpectation

This type has no fields.

Evaluates whether the column has duplicates.

DataQualityRuleResult

DataQualityRuleResult provides a more detailed, per-rule view of the results.

Fields
rule

DataQualityRule

The rule specified in the DataQualitySpec, as is.

passed

bool

Whether the rule passed or failed.

evaluated_count

int64

The number of rows a rule was evaluated against.

This field is only valid for row-level type rules.

Evaluated count can be configured to either

  • include all rows (default) - with null rows automatically failing rule evaluation, or
  • exclude null rows from the evaluated_count, by setting ignore_nulls = true.
passed_count

int64

The number of rows which passed a rule evaluation.

This field is only valid for row-level type rules.

null_count

int64

The number of rows with null values in the specified column.

pass_ratio

double

The ratio of passed_count / evaluated_count.

This field is only valid for row-level type rules.

failing_rows_query

string

The query to find rows that did not pass this rule.

This field is only valid for row-level type rules.

DataQualityScanRuleResult

Information about the result of a data quality rule for data quality scan. The monitored resource is 'DataScan'.

Fields
job_id

string

Identifier of the specific data scan job this log entry is for.

data_source

string

The data source of the data scan (e.g. BigQuery table name).

column

string

The column which this rule is evaluated against.

rule_name

string

The name of the data quality rule.

rule_type

RuleType

The type of the data quality rule.

evalution_type

EvaluationType

The evaluation type of the data quality rule.

rule_dimension

string

The dimension of the data quality rule.

threshold_percent

double

The passing threshold ([0.0, 100.0]) of the data quality rule.

result

Result

The result of the data quality rule.

evaluated_row_count

int64

The number of rows evaluated against the data quality rule. This field is only valid for rules of PER_ROW evaluation type.

passed_row_count

int64

The number of rows which passed a rule evaluation. This field is only valid for rules of PER_ROW evaluation type.

null_row_count

int64

The number of rows with null values in the specified column.

EvaluationType

The evaluation type of the data quality rule.

Enums
EVALUATION_TYPE_UNSPECIFIED An unspecified evaluation type.
PER_ROW The rule evaluation is done at per row level.
AGGREGATE The rule evaluation is done for an aggregate of rows.

Result

Whether the data quality rule passed or failed.

Enums
RESULT_UNSPECIFIED An unspecified result.
PASSED The data quality rule passed.
FAILED The data quality rule failed.

RuleType

The type of the data quality rule.

Enums
RULE_TYPE_UNSPECIFIED An unspecified rule type.
NON_NULL_EXPECTATION Please see https://cloud.google.com/dataplex/docs/reference/rest/v1/DataQualityRule#nonnullexpectation.
RANGE_EXPECTATION Please see https://cloud.google.com/dataplex/docs/reference/rest/v1/DataQualityRule#rangeexpectation.
REGEX_EXPECTATION Please see https://cloud.google.com/dataplex/docs/reference/rest/v1/DataQualityRule#regexexpectation.
ROW_CONDITION_EXPECTATION Please see https://cloud.google.com/dataplex/docs/reference/rest/v1/DataQualityRule#rowconditionexpectation.
SET_EXPECTATION Please see https://cloud.google.com/dataplex/docs/reference/rest/v1/DataQualityRule#setexpectation.
STATISTIC_RANGE_EXPECTATION Please see https://cloud.google.com/dataplex/docs/reference/rest/v1/DataQualityRule#statisticrangeexpectation.
TABLE_CONDITION_EXPECTATION Please see https://cloud.google.com/dataplex/docs/reference/rest/v1/DataQualityRule#tableconditionexpectation.
UNIQUENESS_EXPECTATION Please see https://cloud.google.com/dataplex/docs/reference/rest/v1/DataQualityRule#uniquenessexpectation.

DataQualitySpec

DataQualityScan related setting.

Fields
rules[]

DataQualityRule

Required. The list of rules to evaluate against a data source. At least one rule is required.

sampling_percent

float

Optional. The percentage of the records to be selected from the dataset for DataScan.

  • Value can range between 0.0 and 100.0 with up to 3 significant decimal digits.
  • Sampling is not applied if sampling_percent is not specified, 0 or 100.
row_filter

string

Optional. A filter applied to all rows in a single DataScan job. The filter needs to be a valid SQL expression for a WHERE clause in BigQuery standard SQL syntax. Example: col1 >= 0 AND col2 < 10

post_scan_actions

PostScanActions

Optional. Actions to take upon job completion.

PostScanActions

The configuration of post scan actions of DataQualityScan.

Fields
bigquery_export

BigQueryExport

Optional. If set, results will be exported to the provided BigQuery table.

BigQueryExport

The configuration of BigQuery export post scan action.

Fields
results_table

string

Optional. The BigQuery table to export DataQualityScan results to. Format: //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID

DataScan

Represents a user-visible job which provides the insights for the related data source.

For example:

  • Data Quality: generates queries based on the rules and runs against the data to get data quality check results.
  • Data Profile: analyzes the data in table(s) and generates insights about the structure, content and relationships (such as null percent, cardinality, min/max/mean, etc).
Fields
name

string

Output only. The relative resource name of the scan, of the form: projects/{project}/locations/{location_id}/dataScans/{datascan_id}, where project refers to a project_id or project_number and location_id refers to a GCP region.

uid

string

Output only. System generated globally unique ID for the scan. This ID will be different if the scan is deleted and re-created with the same name.

description

string

Optional. Description of the scan.

  • Must be between 1-1024 characters.
display_name

string

Optional. User friendly display name.

  • Must be between 1-256 characters.
labels

map<string, string>

Optional. User-defined labels for the scan.

state

State

Output only. Current state of the DataScan.

create_time

Timestamp

Output only. The time when the scan was created.

update_time

Timestamp

Output only. The time when the scan was last updated.

data

DataSource

Required. The data source for DataScan.

execution_spec

ExecutionSpec

Optional. DataScan execution settings.

If not specified, the fields in it will use their default values.

execution_status

ExecutionStatus

Output only. Status of the data scan execution.

type

DataScanType

Output only. The type of DataScan.

Union field spec. Data Scan related setting. It is required and immutable which means once data_quality_spec is set, it cannot be changed to data_profile_spec. spec can be only one of the following:
data_quality_spec

DataQualitySpec

DataQualityScan related setting.

data_profile_spec

DataProfileSpec

DataProfileScan related setting.

Union field result. The result of the data scan. result can be only one of the following:
data_quality_result

DataQualityResult

Output only. The result of the data quality scan.

data_profile_result

DataProfileResult

Output only. The result of the data profile scan.

ExecutionSpec

DataScan execution settings.

Fields
trigger

Trigger

Optional. Spec related to how often and when a scan should be triggered.

If not specified, the default is OnDemand, which means the scan will not run until the user calls RunDataScan API.

Union field incremental. Spec related to incremental scan of the data

When an option is selected for incremental scan, it cannot be unset or changed. If not specified, a data scan will run for all data in the table. incremental can be only one of the following:

field

string

Immutable. The unnested field (of type Date or Timestamp) that contains values which monotonically increase over time.

If not specified, a data scan will run for all data in the table.

ExecutionStatus

Status of the data scan execution.

Fields
latest_job_start_time

Timestamp

The time when the latest DataScanJob started.

latest_job_end_time

Timestamp

The time when the latest DataScanJob ended.

DataScanEvent

These messages contain information about the execution of a datascan. The monitored resource is 'DataScan' Next ID: 13

Fields
data_source

string

The data source of the data scan

job_id

string

The identifier of the specific data scan job this log entry is for.

create_time

Timestamp

The time when the data scan job was created.

start_time

Timestamp

The time when the data scan job started to run.

end_time

Timestamp

The time when the data scan job finished.

type

ScanType

The type of the data scan.

state

State

The status of the data scan job.

message

string

The message describing the data scan job event.

spec_version

string

A version identifier of the spec which was used to execute this job.

trigger

Trigger

The trigger type of the data scan job.

scope

Scope

The scope of the data scan (e.g. full, incremental).

post_scan_actions_result

PostScanActionsResult

The result of post scan actions.

Union field result. The result of the data scan job. result can be only one of the following:
data_profile

DataProfileResult

Data profile result for data profile type data scan.

data_quality

DataQualityResult

Data quality result for data quality type data scan.

Union field appliedConfigs. The applied configs in the data scan job. appliedConfigs can be only one of the following:
data_profile_configs

DataProfileAppliedConfigs

Applied configs for data profile type data scan.

data_quality_configs

DataQualityAppliedConfigs

Applied configs for data quality type data scan.

DataProfileAppliedConfigs

Applied configs for data profile type data scan job.

Fields
sampling_percent

float

The percentage of the records selected from the dataset for DataScan.

  • Value ranges between 0.0 and 100.0.
  • Value 0.0 or 100.0 imply that sampling was not applied.
row_filter_applied

bool

Boolean indicating whether a row filter was applied in the DataScan job.

column_filter_applied

bool

Boolean indicating whether a column filter was applied in the DataScan job.

DataProfileResult

Data profile result for data scan job.

Fields
row_count

int64

The count of rows processed in the data scan job.

DataQualityAppliedConfigs

Applied configs for data quality type data scan job.

Fields
sampling_percent

float

The percentage of the records selected from the dataset for DataScan.

  • Value ranges between 0.0 and 100.0.
  • Value 0.0 or 100.0 imply that sampling was not applied.
row_filter_applied

bool

Boolean indicating whether a row filter was applied in the DataScan job.

DataQualityResult

Data quality result for data scan job.

Fields
row_count

int64

The count of rows processed in the data scan job.

passed

bool

Whether the data quality result was pass or not.

dimension_passed

map<string, bool>

The result of each dimension for data quality result. The key of the map is the name of the dimension. The value is the bool value depicting whether the dimension result was pass or not.

score

float

The table-level data quality score for the data scan job.

The data quality score ranges between [0, 100] (up to two decimal points).

dimension_score

map<string, float>

The score of each dimension for data quality result. The key of the map is the name of the dimension. The value is the data quality score for the dimension.

The score ranges between [0, 100] (up to two decimal points).

column_score

map<string, float>

The score of each column scanned in the data scan job. The key of the map is the name of the column. The value is the data quality score for the column.

The score ranges between [0, 100] (up to two decimal points).

PostScanActionsResult

Post scan actions result for data scan job.

Fields
bigquery_export_result

BigQueryExportResult

The result of BigQuery export post scan action.

BigQueryExportResult

The result of BigQuery export post scan action.

Fields
state

State

Execution state for the BigQuery exporting.

message

string

Additional information about the BigQuery exporting.

State

Execution state for the exporting.

Enums
STATE_UNSPECIFIED The exporting state is unspecified.
SUCCEEDED The exporting completed successfully.
FAILED The exporting is no longer running due to an error.
SKIPPED The exporting is skipped due to no valid scan result to export (usually caused by scan failed).

ScanType

The type of the data scan.

Enums
SCAN_TYPE_UNSPECIFIED An unspecified data scan type.
DATA_PROFILE Data scan for data profile.
DATA_QUALITY Data scan for data quality.

Scope

The scope of job for the data scan.

Enums
SCOPE_UNSPECIFIED An unspecified scope type.
FULL Data scan runs on all of the data.
INCREMENTAL Data scan runs on incremental data.

State

The job state of the data scan.

Enums
STATE_UNSPECIFIED Unspecified job state.
STARTED Data scan job started.
SUCCEEDED Data scan job successfully completed.
FAILED Data scan job was unsuccessful.
CANCELLED Data scan job was cancelled.
CREATED Data scan job was createed.

Trigger

The trigger type for the data scan.

Enums
TRIGGER_UNSPECIFIED An unspecified trigger type.
ON_DEMAND Data scan triggers on demand.
SCHEDULE Data scan triggers as per schedule.

DataScanJob

A DataScanJob represents an instance of DataScan execution.

Fields
name

string

Output only. The relative resource name of the DataScanJob, of the form: projects/{project}/locations/{location_id}/dataScans/{datascan_id}/jobs/{job_id}, where project refers to a project_id or project_number and location_id refers to a GCP region.

uid

string

Output only. System generated globally unique ID for the DataScanJob.

start_time

Timestamp

Output only. The time when the DataScanJob was started.

end_time

Timestamp

Output only. The time when the DataScanJob ended.

state

State

Output only. Execution state for the DataScanJob.

message

string

Output only. Additional information about the current state.

type

DataScanType

Output only. The type of the parent DataScan.

Union field spec. Data Scan related setting. spec can be only one of the following:
data_quality_spec

DataQualitySpec

Output only. DataQualityScan related setting.

data_profile_spec

DataProfileSpec

Output only. DataProfileScan related setting.

Union field result. The result of the data scan. result can be only one of the following:
data_quality_result

DataQualityResult

Output only. The result of the data quality scan.

data_profile_result

DataProfileResult

Output only. The result of the data profile scan.

State

Execution state for the DataScanJob.

Enums
STATE_UNSPECIFIED The DataScanJob state is unspecified.
RUNNING The DataScanJob is running.
CANCELING The DataScanJob is canceling.
CANCELLED The DataScanJob cancellation was successful.
SUCCEEDED The DataScanJob completed successfully.
FAILED The DataScanJob is no longer running due to an error.
PENDING The DataScanJob has been created but not started to run yet.

DataScanType

The type of DataScan.

Enums
DATA_SCAN_TYPE_UNSPECIFIED The DataScan type is unspecified.
DATA_QUALITY Data Quality scan.
DATA_PROFILE Data Profile scan.

DataSource

The data source for DataScan.

Fields
Union field source. The source is required and immutable. Once it is set, it cannot be change to others. source can be only one of the following:
entity

string

Immutable. The Dataplex entity that represents the data source (e.g. BigQuery table) for DataScan, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/entities/{entity_id}.

resource

string

Immutable. The service-qualified full resource name of the cloud resource for a DataScan job to scan against. The field could be: BigQuery table of type "TABLE" for DataProfileScan/DataQualityScan Format: //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID

DataTaxonomy

DataTaxonomy represents a set of hierarchical DataAttributes resources, grouped with a common theme Eg: 'SensitiveDataTaxonomy' can have attributes to manage PII data. It is defined at project level.

Fields
name

string

Output only. The relative resource name of the DataTaxonomy, of the form: projects/{project_number}/locations/{location_id}/dataTaxonomies/{data_taxonomy_id}.

uid

string

Output only. System generated globally unique ID for the dataTaxonomy. This ID will be different if the DataTaxonomy is deleted and re-created with the same name.

create_time

Timestamp

Output only. The time when the DataTaxonomy was created.

update_time

Timestamp

Output only. The time when the DataTaxonomy was last updated.

description

string

Optional. Description of the DataTaxonomy.

display_name

string

Optional. User friendly display name.

labels

map<string, string>

Optional. User-defined labels for the DataTaxonomy.

attribute_count

int32

Output only. The number of attributes in the DataTaxonomy.

etag

string

This checksum is computed by the server based on the value of other fields, and may be sent on update and delete requests to ensure the client has an up-to-date value before proceeding.

class_count

int32

Output only. The number of classes in the DataTaxonomy.

DeleteAssetRequest

Delete asset request.

Fields
name

string

Required. The resource name of the asset: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/assets/{asset_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.assets.delete

DeleteContentRequest

Delete content request.

Fields
name

string

Required. The resource name of the content: projects/{project_id}/locations/{location_id}/lakes/{lake_id}/content/{content_id}

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.content.delete

DeleteDataAttributeBindingRequest

Delete DataAttributeBinding request.

Fields
name

string

Required. The resource name of the DataAttributeBinding: projects/{project_number}/locations/{location_id}/dataAttributeBindings/{data_attribute_binding_id}

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.dataAttributeBindings.delete
etag

string

Required. If the client provided etag value does not match the current etag value, the DeleteDataAttributeBindingRequest method returns an ABORTED error response. Etags must be used when calling the DeleteDataAttributeBinding.

DeleteDataAttributeRequest

Delete DataAttribute request.

Fields
name

string

Required. The resource name of the DataAttribute: projects/{project_number}/locations/{location_id}/dataTaxonomies/{dataTaxonomy}/attributes/{data_attribute_id}

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.dataAttributes.delete
etag

string

Optional. If the client provided etag value does not match the current etag value, the DeleteDataAttribute method returns an ABORTED error response.

DeleteDataScanRequest

Delete dataScan request.

Fields
name

string

Required. The resource name of the dataScan: projects/{project}/locations/{location_id}/dataScans/{data_scan_id} where project refers to a project_id or project_number and location_id refers to a GCP region.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.datascans.delete

DeleteDataTaxonomyRequest

Delete DataTaxonomy request.

Fields
name

string

Required. The resource name of the DataTaxonomy: projects/{project_number}/locations/{location_id}/dataTaxonomies/{data_taxonomy_id}

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.dataTaxonomies.delete
etag

string

Optional. If the client provided etag value does not match the current etag value,the DeleteDataTaxonomy method returns an ABORTED error.

DeleteEntityRequest

Delete a metadata entity request.

Fields
name

string

Required. The resource name of the entity: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/entities/{entity_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.entities.delete
etag

string

Required. The etag associated with the entity, which can be retrieved with a [GetEntity][] request.

DeleteEnvironmentRequest

Delete environment request.

Fields
name

string

Required. The resource name of the environment: projects/{project_id}/locations/{location_id}/lakes/{lake_id}/environments/{environment_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.environments.delete

DeleteLakeRequest

Delete lake request.

Fields
name

string

Required. The resource name of the lake: projects/{project_number}/locations/{location_id}/lakes/{lake_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.lakes.delete

DeletePartitionRequest

Delete metadata partition request.

Fields
name

string

Required. The resource name of the partition. format: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/entities/{entity_id}/partitions/{partition_value_path}. The {partition_value_path} segment consists of an ordered sequence of partition values separated by "/". All values must be provided.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.partitions.delete
etag
(deprecated)

string

Optional. The etag associated with the partition.

DeleteTaskRequest

Delete task request.

Fields
name

string

Required. The resource name of the task: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/task/{task_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.tasks.delete

DeleteZoneRequest

Delete zone request.

Fields
name

string

Required. The resource name of the zone: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.zones.delete

DiscoveryEvent

The payload associated with Discovery data processing.

Fields
message

string

The log message.

lake_id

string

The id of the associated lake.

zone_id

string

The id of the associated zone.

asset_id

string

The id of the associated asset.

data_location

string

The data location associated with the event.

type

EventType

The type of the event being logged.

Union field details. Additional details about the event. details can be only one of the following:
config

ConfigDetails

Details about discovery configuration in effect.

entity

EntityDetails

Details about the entity associated with the event.

partition

PartitionDetails

Details about the partition associated with the event.

action

ActionDetails

Details about the action associated with the event.

ActionDetails

Details about the action.

Fields
type

string

The type of action. Eg. IncompatibleDataSchema, InvalidDataFormat

ConfigDetails

Details about configuration events.

Fields
parameters

map<string, string>

A list of discovery configuration parameters in effect. The keys are the field paths within DiscoverySpec. Eg. includePatterns, excludePatterns, csvOptions.disableTypeInference, etc.

EntityDetails

Details about the entity.

Fields
entity

string

The name of the entity resource. The name is the fully-qualified resource name.

type

EntityType

The type of the entity resource.

EntityType

The type of the entity.

Enums
ENTITY_TYPE_UNSPECIFIED An unspecified event type.
TABLE Entities representing structured data.
FILESET Entities representing unstructured data.

EventType

The type of the event.

Enums
EVENT_TYPE_UNSPECIFIED An unspecified event type.
CONFIG An event representing discovery configuration in effect.
ENTITY_CREATED An event representing a metadata entity being created.
ENTITY_UPDATED An event representing a metadata entity being updated.
ENTITY_DELETED An event representing a metadata entity being deleted.
PARTITION_CREATED An event representing a partition being created.
PARTITION_UPDATED An event representing a partition being updated.
PARTITION_DELETED An event representing a partition being deleted.

PartitionDetails

Details about the partition.

Fields
partition

string

The name to the partition resource. The name is the fully-qualified resource name.

entity

string

The name to the containing entity resource. The name is the fully-qualified resource name.

type

EntityType

The type of the containing entity resource.

sampled_data_locations[]

string

The locations of the data items (e.g., a Cloud Storage objects) sampled for metadata inference.

Entity

Represents tables and fileset metadata contained within a zone.

Fields
name

string

Output only. The resource name of the entity, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/entities/{id}.

display_name

string

Optional. Display name must be shorter than or equal to 256 characters.

description

string

Optional. User friendly longer description text. Must be shorter than or equal to 1024 characters.

create_time

Timestamp

Output only. The time when the entity was created.

update_time

Timestamp

Output only. The time when the entity was last updated.

id

string

Required. A user-provided entity ID. It is mutable, and will be used as the published table name. Specifying a new ID in an update entity request will override the existing value. The ID must contain only letters (a-z, A-Z), numbers (0-9), and underscores, and consist of 256 or fewer characters.

etag

string

Optional. The etag associated with the entity, which can be retrieved with a [GetEntity][] request. Required for update and delete requests.

type

Type

Required. Immutable. The type of entity.

asset

string

Required. Immutable. The ID of the asset associated with the storage location containing the entity data. The entity must be with in the same zone with the asset.

data_path

string

Required. Immutable. The storage path of the entity data. For Cloud Storage data, this is the fully-qualified path to the entity, such as gs://bucket/path/to/data. For BigQuery data, this is the name of the table resource, such as projects/project_id/datasets/dataset_id/tables/table_id.

data_path_pattern

string

Optional. The set of items within the data path constituting the data in the entity, represented as a glob path. Example: gs://bucket/path/to/data/**/*.csv.

catalog_entry

string

Output only. The name of the associated Data Catalog entry.

system

StorageSystem

Required. Immutable. Identifies the storage system of the entity data.

format

StorageFormat

Required. Identifies the storage format of the entity data. It does not apply to entities with data stored in BigQuery.

compatibility

CompatibilityStatus

Output only. Metadata stores that the entity is compatible with.

access

StorageAccess

Output only. Identifies the access mechanism to the entity. Not user settable.

uid

string

Output only. System generated unique ID for the Entity. This ID will be different if the Entity is deleted and re-created with the same name.

schema

Schema

Required. The description of the data structure and layout. The schema is not included in list responses. It is only included in SCHEMA and FULL entity views of a GetEntity response.

CompatibilityStatus

Provides compatibility information for various metadata stores.

Fields
hive_metastore

Compatibility

Output only. Whether this entity is compatible with Hive Metastore.

bigquery

Compatibility

Output only. Whether this entity is compatible with BigQuery.

Compatibility

Provides compatibility information for a specific metadata store.

Fields
compatible

bool

Output only. Whether the entity is compatible and can be represented in the metadata store.

reason

string

Output only. Provides additional detail if the entity is incompatible with the metadata store.

Type

The type of entity.

Enums
TYPE_UNSPECIFIED Type unspecified.
TABLE Structured and semi-structured data.
FILESET Unstructured data.

Environment

Environment represents a user-visible compute infrastructure for analytics within a lake.

Fields
name

string

Output only. The relative resource name of the environment, of the form: projects/{project_id}/locations/{location_id}/lakes/{lake_id}/environment/{environment_id}

display_name

string

Optional. User friendly display name.

uid

string

Output only. System generated globally unique ID for the environment. This ID will be different if the environment is deleted and re-created with the same name.

create_time

Timestamp

Output only. Environment creation time.

update_time

Timestamp

Output only. The time when the environment was last updated.

labels

map<string, string>

Optional. User defined labels for the environment.

description

string

Optional. Description of the environment.

state

State

Output only. Current state of the environment.

infrastructure_spec

InfrastructureSpec

Required. Infrastructure specification for the Environment.

session_spec

SessionSpec

Optional. Configuration for sessions created for this environment.

session_status

SessionStatus

Output only. Status of sessions created for this environment.

endpoints

Endpoints

Output only. URI Endpoints to access sessions associated with the Environment.

Endpoints

URI Endpoints to access sessions associated with the Environment.

Fields
notebooks

string

Output only. URI to serve notebook APIs

sql

string

Output only. URI to serve SQL APIs

InfrastructureSpec

Configuration for the underlying infrastructure used to run workloads.

Fields
Union field resources. Hardware config resources can be only one of the following:
compute

ComputeResources

Optional. Compute resources needed for analyze interactive workloads.

Union field runtime. Software config runtime can be only one of the following:
os_image

OsImageRuntime

Required. Software Runtime Configuration for analyze interactive workloads.

ComputeResources

Compute resources associated with the analyze interactive workloads.

Fields
disk_size_gb

int32

Optional. Size in GB of the disk. Default is 100 GB.

node_count

int32

Optional. Total number of nodes in the sessions created for this environment.

max_node_count

int32

Optional. Max configurable nodes. If max_node_count > node_count, then auto-scaling is enabled.

OsImageRuntime

Software Runtime Configuration to run Analyze.

Fields
image_version

string

Required. Dataplex Image version.

java_libraries[]

string

Optional. List of Java jars to be included in the runtime environment. Valid input includes Cloud Storage URIs to Jar binaries. For example, gs://bucket-name/my/path/to/file.jar

python_packages[]

string

Optional. A list of python packages to be installed. Valid formats include Cloud Storage URI to a PIP installable library. For example, gs://bucket-name/my/path/to/lib.tar.gz

properties

map<string, string>

Optional. Spark properties to provide configuration for use in sessions created for this environment. The properties to set on daemon config files. Property keys are specified in prefix:property format. The prefix must be "spark".

SessionSpec

Configuration for sessions created for this environment.

Fields
max_idle_duration

Duration

Optional. The idle time configuration of the session. The session will be auto-terminated at the end of this period.

enable_fast_startup

bool

Optional. If True, this causes sessions to be pre-created and available for faster startup to enable interactive exploration use-cases. This defaults to False to avoid additional billed charges. These can only be set to True for the environment with name set to "default", and with default configuration.

SessionStatus

Status of sessions created for this environment.

Fields
active

bool

Output only. Queries over sessions to mark whether the environment is currently active or not

GetAssetRequest

Get asset request.

Fields
name

string

Required. The resource name of the asset: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/assets/{asset_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.assets.get

GetContentRequest

Get content request.

Fields
name

string

Required. The resource name of the content: projects/{project_id}/locations/{location_id}/lakes/{lake_id}/content/{content_id}

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.content.get
view

ContentView

Optional. Specify content view to make a partial request.

ContentView

Specifies whether the request should return the full or the partial representation.

Enums
CONTENT_VIEW_UNSPECIFIED Content view not specified. Defaults to BASIC. The API will default to the BASIC view.
BASIC Will not return the data_text field.
FULL Returns the complete proto.

GetDataAttributeBindingRequest

Get DataAttributeBinding request.

Fields
name

string

Required. The resource name of the DataAttributeBinding: projects/{project_number}/locations/{location_id}/dataAttributeBindings/{data_attribute_binding_id}

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.dataAttributeBindings.get

GetDataAttributeRequest

Get DataAttribute request.

Fields
name

string

Required. The resource name of the dataAttribute: projects/{project_number}/locations/{location_id}/dataTaxonomies/{dataTaxonomy}/attributes/{data_attribute_id}

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.dataAttributes.get

GetDataScanJobRequest

Get DataScanJob request.

Fields
name

string

Required. The resource name of the DataScanJob: projects/{project}/locations/{location_id}/dataScans/{data_scan_id}/jobs/{data_scan_job_id} where project refers to a project_id or project_number and location_id refers to a GCP region.

Authorization requires the following IAM permission on the specified resource name:

  • iam.permissions.none
view

DataScanJobView

Optional. Select the DataScanJob view to return. Defaults to BASIC.

DataScanJobView

DataScanJob view options.

Enums
DATA_SCAN_JOB_VIEW_UNSPECIFIED The API will default to the BASIC view.
BASIC Basic view that does not include spec and result.
FULL Include everything.

GetDataScanRequest

Get dataScan request.

Fields
name

string

Required. The resource name of the dataScan: projects/{project}/locations/{location_id}/dataScans/{data_scan_id} where project refers to a project_id or project_number and location_id refers to a GCP region.

Authorization requires the following IAM permission on the specified resource name:

  • iam.permissions.none
view

DataScanView

Optional. Select the DataScan view to return. Defaults to BASIC.

DataScanView

DataScan view options.

Enums
DATA_SCAN_VIEW_UNSPECIFIED The API will default to the BASIC view.
BASIC Basic view that does not include spec and result.
FULL Include everything.

GetDataTaxonomyRequest

Get DataTaxonomy request.

Fields
name

string

Required. The resource name of the DataTaxonomy: projects/{project_number}/locations/{location_id}/dataTaxonomies/{data_taxonomy_id}

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.dataTaxonomies.get

GetEntityRequest

Get metadata entity request.

Fields
name

string

Required. The resource name of the entity: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/entities/{entity_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.entities.get
view

EntityView

Optional. Used to select the subset of entity information to return. Defaults to BASIC.

EntityView

Entity views for get entity partial result.

Enums
ENTITY_VIEW_UNSPECIFIED The API will default to the BASIC view.
BASIC Minimal view that does not include the schema.
SCHEMA Include basic information and schema.
FULL Include everything. Currently, this is the same as the SCHEMA view.

GetEnvironmentRequest

Get environment request.

Fields
name

string

Required. The resource name of the environment: projects/{project_id}/locations/{location_id}/lakes/{lake_id}/environments/{environment_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.environments.get

GetJobRequest

Get job request.

Fields
name

string

Required. The resource name of the job: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/tasks/{task_id}/jobs/{job_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.tasks.get

GetLakeRequest

Get lake request.

Fields
name

string

Required. The resource name of the lake: projects/{project_number}/locations/{location_id}/lakes/{lake_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.lakes.get

GetPartitionRequest

Get metadata partition request.

Fields
name

string

Required. The resource name of the partition: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/entities/{entity_id}/partitions/{partition_value_path}. The {partition_value_path} segment consists of an ordered sequence of partition values separated by "/". All values must be provided.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.partitions.get

GetTaskRequest

Get task request.

Fields
name

string

Required. The resource name of the task: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/tasks/{tasks_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.tasks.get

GetZoneRequest

Get zone request.

Fields
name

string

Required. The resource name of the zone: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.zones.get

Job

A job represents an instance of a task.

Fields
name

string

Output only. The relative resource name of the job, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/tasks/{task_id}/jobs/{job_id}.

uid

string

Output only. System generated globally unique ID for the job.

start_time

Timestamp

Output only. The time when the job was started.

end_time

Timestamp

Output only. The time when the job ended.

state

State

Output only. Execution state for the job.

retry_count

uint32

Output only. The number of times the job has been retried (excluding the initial attempt).

service

Service

Output only. The underlying service running a job.

service_job

string

Output only. The full resource name for the job run under a particular service.

message

string

Output only. Additional information about the current state.

labels

map<string, string>

Output only. User-defined labels for the task.

trigger

Trigger

Output only. Job execution trigger.

execution_spec

ExecutionSpec

Output only. Spec related to how a task is executed.

Service

Enums
SERVICE_UNSPECIFIED Service used to run the job is unspecified.
DATAPROC Dataproc service is used to run this job.

State

Enums
STATE_UNSPECIFIED The job state is unknown.
RUNNING The job is running.
CANCELLING The job is cancelling.
CANCELLED The job cancellation was successful.
SUCCEEDED The job completed successfully.
FAILED The job is no longer running due to an error.
ABORTED The job was cancelled outside of Dataplex.

Trigger

Job execution trigger.

Enums
TRIGGER_UNSPECIFIED The trigger is unspecified.
TASK_CONFIG The job was triggered by Dataplex based on trigger spec from task definition.
RUN_REQUEST The job was triggered by the explicit call of Task API.

JobEvent

The payload associated with Job logs that contains events describing jobs that have run within a Lake.

Fields
message

string

The log message.

job_id

string

The unique id identifying the job.

start_time

Timestamp

The time when the job started running.

end_time

Timestamp

The time when the job ended running.

state

State

The job state on completion.

retries

int32

The number of retries.

type

Type

The type of the job.

service

Service

The service used to execute the job.

service_job

string

The reference to the job within the service.

execution_trigger

ExecutionTrigger

Job execution trigger.

ExecutionTrigger

Job Execution trigger.

Enums
EXECUTION_TRIGGER_UNSPECIFIED The job execution trigger is unspecified.
TASK_CONFIG The job was triggered by Dataplex based on trigger spec from task definition.
RUN_REQUEST The job was triggered by the explicit call of Task API.

Service

The service used to execute the job.

Enums
SERVICE_UNSPECIFIED Unspecified service.
DATAPROC Cloud Dataproc.

State

The completion status of the job.

Enums
STATE_UNSPECIFIED Unspecified job state.
SUCCEEDED Job successfully completed.
FAILED Job was unsuccessful.
CANCELLED Job was cancelled by the user.
ABORTED Job was cancelled or aborted via the service executing the job.

Type

The type of the job.

Enums
TYPE_UNSPECIFIED Unspecified job type.
SPARK Spark jobs.
NOTEBOOK Notebook jobs.

Lake

A lake is a centralized repository for managing enterprise data across the organization distributed across many cloud projects, and stored in a variety of storage services such as Google Cloud Storage and BigQuery. The resources attached to a lake are referred to as managed resources. Data within these managed resources can be structured or unstructured. A lake provides data admins with tools to organize, secure and manage their data at scale, and provides data scientists and data engineers an integrated experience to easily search, discover, analyze and transform data and associated metadata.

Fields
name

string

Output only. The relative resource name of the lake, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}.

display_name

string

Optional. User friendly display name.

uid

string

Output only. System generated globally unique ID for the lake. This ID will be different if the lake is deleted and re-created with the same name.

create_time

Timestamp

Output only. The time when the lake was created.

update_time

Timestamp

Output only. The time when the lake was last updated.

labels

map<string, string>

Optional. User-defined labels for the lake.

description

string

Optional. Description of the lake.

state

State

Output only. Current state of the lake.

service_account

string

Output only. Service account associated with this lake. This service account must be authorized to access or operate on resources managed by the lake.

metastore

Metastore

Optional. Settings to manage lake and Dataproc Metastore service instance association.

asset_status

AssetStatus

Output only. Aggregated status of the underlying assets of the lake.

metastore_status

MetastoreStatus

Output only. Metastore status of the lake.

Metastore

Settings to manage association of Dataproc Metastore with a lake.

Fields
service

string

Optional. A relative reference to the Dataproc Metastore (https://cloud.google.com/dataproc-metastore/docs) service associated with the lake: projects/{project_id}/locations/{location_id}/services/{service_id}

MetastoreStatus

Status of Lake and Dataproc Metastore service instance association.

Fields
state

State

Current state of association.

message

string

Additional information about the current status.

update_time

Timestamp

Last update time of the metastore status of the lake.

endpoint

string

The URI of the endpoint used to access the Metastore service.

State

Current state of association.

Enums
STATE_UNSPECIFIED Unspecified.
NONE A Metastore service instance is not associated with the lake.
READY A Metastore service instance is attached to the lake.
UPDATING Attach/detach is in progress.
ERROR Attach/detach could not be done due to errors.

ListActionsResponse

List actions response.

Fields
actions[]

Action

Actions under the given parent lake/zone/asset.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

ListAssetActionsRequest

List asset actions request.

Fields
parent

string

Required. The resource name of the parent asset: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/assets/{asset_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.assetActions.list
page_size

int32

Optional. Maximum number of actions to return. The service may return fewer than this value. If unspecified, at most 10 actions will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListAssetActions call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListAssetActions must match the call that provided the page token.

ListAssetsRequest

List assets request.

Fields
parent

string

Required. The resource name of the parent zone: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.assets.list
page_size

int32

Optional. Maximum number of asset to return. The service may return fewer than this value. If unspecified, at most 10 assets will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListAssets call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListAssets must match the call that provided the page token.

filter

string

Optional. Filter request.

order_by

string

Optional. Order by fields for the result.

ListAssetsResponse

List assets response.

Fields
assets[]

Asset

Asset under the given parent zone.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

ListContentRequest

List content request. Returns the BASIC Content view.

Fields
parent

string

Required. The resource name of the parent lake: projects/{project_id}/locations/{location_id}/lakes/{lake_id}

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.content.list
page_size

int32

Optional. Maximum number of content to return. The service may return fewer than this value. If unspecified, at most 10 content will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListContent call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListContent must match the call that provided the page token.

filter

string

Optional. Filter request. Filters are case-sensitive. The following formats are supported:

labels.key1 = "value1" labels:key1 type = "NOTEBOOK" type = "SQL_SCRIPT"

These restrictions can be coinjoined with AND, OR and NOT conjunctions.

ListContentResponse

List content response.

Fields
content[]

Content

Content under the given parent lake.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

ListDataAttributeBindingsRequest

List DataAttributeBindings request.

Fields
parent

string

Required. The resource name of the Location: projects/{project_number}/locations/{location_id}

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.dataAttributeBindings.list
page_size

int32

Optional. Maximum number of DataAttributeBindings to return. The service may return fewer than this value. If unspecified, at most 10 DataAttributeBindings will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListDataAttributeBindings call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListDataAttributeBindings must match the call that provided the page token.

filter

string

Optional. Filter request. Filter using resource: filter=resource:"resource-name" Filter using attribute: filter=attributes:"attribute-name" Filter using attribute in paths list: filter=paths.attributes:"attribute-name"

order_by

string

Optional. Order by fields for the result.

ListDataAttributeBindingsResponse

List DataAttributeBindings response.

Fields
data_attribute_bindings[]

DataAttributeBinding

DataAttributeBindings under the given parent Location.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

unreachable_locations[]

string

Locations that could not be reached.

ListDataAttributesRequest

List DataAttributes request.

Fields
parent

string

Required. The resource name of the DataTaxonomy: projects/{project_number}/locations/{location_id}/dataTaxonomies/{data_taxonomy_id}

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.dataAttributes.list
page_size

int32

Optional. Maximum number of DataAttributes to return. The service may return fewer than this value. If unspecified, at most 10 dataAttributes will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListDataAttributes call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListDataAttributes must match the call that provided the page token.

filter

string

Optional. Filter request.

order_by

string

Optional. Order by fields for the result.

ListDataAttributesResponse

List DataAttributes response.

Fields
data_attributes[]

DataAttribute

DataAttributes under the given parent DataTaxonomy.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

unreachable_locations[]

string

Locations that could not be reached.

ListDataScanJobsRequest

List DataScanJobs request.

Fields
parent

string

Required. The resource name of the parent environment: projects/{project}/locations/{location_id}/dataScans/{data_scan_id} where project refers to a project_id or project_number and location_id refers to a GCP region.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.datascans.get
page_size

int32

Optional. Maximum number of DataScanJobs to return. The service may return fewer than this value. If unspecified, at most 10 DataScanJobs will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListDataScanJobs call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListDataScanJobs must match the call that provided the page token.

filter

string

Optional. An expression for filtering the results of the ListDataScanJobs request.

If unspecified, all datascan jobs will be returned. Multiple filters can be applied (with AND, OR logical operators). Filters are case-sensitive.

Allowed fields are:

  • start_time
  • end_time

start_time and end_time expect RFC-3339 formatted strings (e.g. 2018-10-08T18:30:00-07:00).

For instance, 'start_time > 2018-10-08T00:00:00.123456789Z AND end_time < 2018-10-09T00:00:00.123456789Z' limits results to DataScanJobs between specified start and end times.

ListDataScanJobsResponse

List DataScanJobs response.

Fields
data_scan_jobs[]

DataScanJob

DataScanJobs (BASIC view only) under a given dataScan.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

ListDataScansRequest

List dataScans request.

Fields
parent

string

Required. The resource name of the parent location: projects/{project}/locations/{location_id} where project refers to a project_id or project_number and location_id refers to a GCP region.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.datascans.list
page_size

int32

Optional. Maximum number of dataScans to return. The service may return fewer than this value. If unspecified, at most 500 scans will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListDataScans call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListDataScans must match the call that provided the page token.

filter

string

Optional. Filter request.

order_by

string

Optional. Order by fields (name or create_time) for the result. If not specified, the ordering is undefined.

ListDataScansResponse

List dataScans response.

Fields
data_scans[]

DataScan

DataScans (BASIC view only) under the given parent location.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

unreachable[]

string

Locations that could not be reached.

ListDataTaxonomiesRequest

List DataTaxonomies request.

Fields
parent

string

Required. The resource name of the DataTaxonomy location, of the form: projects/{project_number}/locations/{location_id} where location_id refers to a GCP region.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.dataTaxonomies.list
page_size

int32

Optional. Maximum number of DataTaxonomies to return. The service may return fewer than this value. If unspecified, at most 10 DataTaxonomies will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListDataTaxonomies call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListDataTaxonomies must match the call that provided the page token.

filter

string

Optional. Filter request.

order_by

string

Optional. Order by fields for the result.

ListDataTaxonomiesResponse

List DataTaxonomies response.

Fields
data_taxonomies[]

DataTaxonomy

DataTaxonomies under the given parent location.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

unreachable_locations[]

string

Locations that could not be reached.

ListEntitiesRequest

List metadata entities request.

Fields
parent

string

Required. The resource name of the parent zone: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.entities.list
view

EntityView

Required. Specify the entity view to make a partial list request.

page_size

int32

Optional. Maximum number of entities to return. The service may return fewer than this value. If unspecified, 100 entities will be returned by default. The maximum value is 500; larger values will will be truncated to 500.

page_token

string

Optional. Page token received from a previous ListEntities call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListEntities must match the call that provided the page token.

filter

string

Optional. The following filter parameters can be added to the URL to limit the entities returned by the API:

  • Entity ID: ?filter="id=entityID"
  • Asset ID: ?filter="asset=assetID"
  • Data path ?filter="data_path=gs://my-bucket"
  • Is HIVE compatible: ?filter="hive_compatible=true"
  • Is BigQuery compatible: ?filter="bigquery_compatible=true"

EntityView

Entity views.

Enums
ENTITY_VIEW_UNSPECIFIED The default unset value. Return both table and fileset entities if unspecified.
TABLES Only list table entities.
FILESETS Only list fileset entities.

ListEntitiesResponse

List metadata entities response.

Fields
entities[]

Entity

Entities in the specified parent zone.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no remaining results in the list.

ListEnvironmentsRequest

List environments request.

Fields
parent

string

Required. The resource name of the parent lake: projects/{project_id}/locations/{location_id}/lakes/{lake_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.environments.list
page_size

int32

Optional. Maximum number of environments to return. The service may return fewer than this value. If unspecified, at most 10 environments will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListEnvironments call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListEnvironments must match the call that provided the page token.

filter

string

Optional. Filter request.

order_by

string

Optional. Order by fields for the result.

ListEnvironmentsResponse

List environments response.

Fields
environments[]

Environment

Environments under the given parent lake.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

ListJobsRequest

List jobs request.

Fields
parent

string

Required. The resource name of the parent environment: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/tasks/{task_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.tasks.get
page_size

int32

Optional. Maximum number of jobs to return. The service may return fewer than this value. If unspecified, at most 10 jobs will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListJobs call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListJobs must match the call that provided the page token.

ListJobsResponse

List jobs response.

Fields
jobs[]

Job

Jobs under a given task.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

ListLakeActionsRequest

List lake actions request.

Fields
parent

string

Required. The resource name of the parent lake: projects/{project_number}/locations/{location_id}/lakes/{lake_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.lakeActions.list
page_size

int32

Optional. Maximum number of actions to return. The service may return fewer than this value. If unspecified, at most 10 actions will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListLakeActions call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListLakeActions must match the call that provided the page token.

ListLakesRequest

List lakes request.

Fields
parent

string

Required. The resource name of the lake location, of the form: projects/{project_number}/locations/{location_id} where location_id refers to a GCP region.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.lakes.list
page_size

int32

Optional. Maximum number of Lakes to return. The service may return fewer than this value. If unspecified, at most 10 lakes will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListLakes call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListLakes must match the call that provided the page token.

filter

string

Optional. Filter request.

order_by

string

Optional. Order by fields for the result.

ListLakesResponse

List lakes response.

Fields
lakes[]

Lake

Lakes under the given parent location.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

unreachable_locations[]

string

Locations that could not be reached.

ListPartitionsRequest

List metadata partitions request.

Fields
parent

string

Required. The resource name of the parent entity: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/entities/{entity_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.partitions.list
page_size

int32

Optional. Maximum number of partitions to return. The service may return fewer than this value. If unspecified, 100 partitions will be returned by default. The maximum page size is 500; larger values will will be truncated to 500.

page_token

string

Optional. Page token received from a previous ListPartitions call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListPartitions must match the call that provided the page token.

filter

string

Optional. Filter the partitions returned to the caller using a key value pair expression. Supported operators and syntax:

  • logic operators: AND, OR
  • comparison operators: <, >, >=, <= ,=, !=
  • LIKE operators:
  • The right hand of a LIKE operator supports "." and "*" for wildcard searches, for example "value1 LIKE ".*oo.*"
  • parenthetical grouping: ( )

Sample filter expression: `?filter="key1 < value1 OR key2 > value2"

Notes:

  • Keys to the left of operators are case insensitive.
  • Partition results are sorted first by creation time, then by lexicographic order.
  • Up to 20 key value filter pairs are allowed, but due to performance considerations, only the first 10 will be used as a filter.

ListPartitionsResponse

List metadata partitions response.

Fields
partitions[]

Partition

Partitions under the specified parent entity.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no remaining results in the list.

ListSessionsRequest

List sessions request.

Fields
parent

string

Required. The resource name of the parent environment: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/environment/{environment_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.environments.get
page_size

int32

Optional. Maximum number of sessions to return. The service may return fewer than this value. If unspecified, at most 10 sessions will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListSessions call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListSessions must match the call that provided the page token.

filter

string

Optional. Filter request. The following mode filter is supported to return only the sessions belonging to the requester when the mode is USER and return sessions of all the users when the mode is ADMIN. When no filter is sent default to USER mode. NOTE: When the mode is ADMIN, the requester should have dataplex.environments.listAllSessions permission to list all sessions, in absence of the permission, the request fails.

mode = ADMIN | USER

ListSessionsResponse

List sessions response.

Fields
sessions[]

Session

Sessions under a given environment.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

ListTasksRequest

List tasks request.

Fields
parent

string

Required. The resource name of the parent lake: projects/{project_number}/locations/{location_id}/lakes/{lake_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.tasks.list
page_size

int32

Optional. Maximum number of tasks to return. The service may return fewer than this value. If unspecified, at most 10 tasks will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListZones call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListZones must match the call that provided the page token.

filter

string

Optional. Filter request.

order_by

string

Optional. Order by fields for the result.

ListTasksResponse

List tasks response.

Fields
tasks[]

Task

Tasks under the given parent lake.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

unreachable_locations[]

string

Locations that could not be reached.

ListZoneActionsRequest

List zone actions request.

Fields
parent

string

Required. The resource name of the parent zone: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.zoneActions.list
page_size

int32

Optional. Maximum number of actions to return. The service may return fewer than this value. If unspecified, at most 10 actions will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListZoneActions call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListZoneActions must match the call that provided the page token.

ListZonesRequest

List zones request.

Fields
parent

string

Required. The resource name of the parent lake: projects/{project_number}/locations/{location_id}/lakes/{lake_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.zones.list
page_size

int32

Optional. Maximum number of zones to return. The service may return fewer than this value. If unspecified, at most 10 zones will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListZones call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListZones must match the call that provided the page token.

filter

string

Optional. Filter request.

order_by

string

Optional. Order by fields for the result.

ListZonesResponse

List zones response.

Fields
zones[]

Zone

Zones under the given parent lake.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

OperationMetadata

Represents the metadata of a long-running operation.

Fields
create_time

Timestamp

Output only. The time the operation was created.

end_time

Timestamp

Output only. The time the operation finished running.

target

string

Output only. Server-defined resource path for the target of the operation.

verb

string

Output only. Name of the verb executed by the operation.

status_message

string

Output only. Human-readable status of the operation, if any.

requested_cancellation

bool

Output only. Identifies whether the user has requested cancellation of the operation. Operations that have successfully been cancelled have [Operation.error][] value with a google.rpc.Status.code of 1, corresponding to Code.CANCELLED.

api_version

string

Output only. API version used to start the operation.

Partition

Represents partition metadata contained within entity instances.

Fields
name

string

Output only. Partition values used in the HTTP URL must be double encoded. For example, url_encode(url_encode(value)) can be used to encode "US:CA/CA#Sunnyvale so that the request URL ends with "/partitions/US%253ACA/CA%2523Sunnyvale". The name field in the response retains the encoded format.

values[]

string

Required. Immutable. The set of values representing the partition, which correspond to the partition schema defined in the parent entity.

location

string

Required. Immutable. The location of the entity data within the partition, for example, gs://bucket/path/to/entity/key1=value1/key2=value2. Or projects/<project_id>/datasets/<dataset_id>/tables/<table_id>

etag
(deprecated)

string

Optional. The etag for this partition.

ResourceAccessSpec

ResourceAccessSpec holds the access control configuration to be enforced on the resources, for example, Cloud Storage bucket, BigQuery dataset, BigQuery table.

Fields
readers[]

string

Optional. The format of strings follows the pattern followed by IAM in the bindings. user:{email}, serviceAccount:{email} group:{email}. The set of principals to be granted reader role on the resource.

writers[]

string

Optional. The set of principals to be granted writer role on the resource.

owners[]

string

Optional. The set of principals to be granted owner role on the resource.

RunDataScanRequest

Run DataScan Request

Fields
name

string

Required. The resource name of the DataScan: projects/{project}/locations/{location_id}/dataScans/{data_scan_id}. where project refers to a project_id or project_number and location_id refers to a GCP region.

Only OnDemand data scans are allowed.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.datascans.run

RunDataScanResponse

Run DataScan Response.

Fields
job

DataScanJob

DataScanJob created by RunDataScan request.

RunTaskRequest

Fields
name

string

Required. The resource name of the task: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/tasks/{task_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.tasks.run
labels

map<string, string>

Optional. User-defined labels for the task. If the map is left empty, the task will run with existing labels from task definition. If the map contains an entry with a new key, the same will be added to existing set of labels. If the map contains an entry with an existing label key in task definition, the task will run with new label value for that entry. Clearing an existing label will require label value to be explicitly set to a hyphen "-". The label value cannot be empty.

args

map<string, string>

Optional. Execution spec arguments. If the map is left empty, the task will run with existing execution spec args from task definition. If the map contains an entry with a new key, the same will be added to existing set of args. If the map contains an entry with an existing arg key in task definition, the task will run with new arg value for that entry. Clearing an existing arg will require arg value to be explicitly set to a hyphen "-". The arg value cannot be empty.

RunTaskResponse

Fields
job

Job

Jobs created by RunTask API.

ScannedData

The data scanned during processing (e.g. in incremental DataScan)

Fields
Union field data_range. The range of scanned data data_range can be only one of the following:
incremental_field

IncrementalField

The range denoted by values of an incremental field

IncrementalField

A data range denoted by a pair of start/end values of a field.

Fields
field

string

The field that contains values which monotonically increases over time (e.g. a timestamp column).

start

string

Value that marks the start of the range.

end

string

Value that marks the end of the range.

Schema

Schema information describing the structure and layout of the data.

Fields
user_managed

bool

Required. Set to true if user-managed or false if managed by Dataplex. The default is false (managed by Dataplex).

  • Set to falseto enable Dataplex discovery to update the schema. including new data discovery, schema inference, and schema evolution. Users retain the ability to input and edit the schema. Dataplex treats schema input by the user as though produced by a previous Dataplex discovery operation, and it will evolve the schema and take action based on that treatment.

  • Set to true to fully manage the entity schema. This setting guarantees that Dataplex will not change schema fields.

fields[]

SchemaField

Optional. The sequence of fields describing data in table entities. Note: BigQuery SchemaFields are immutable.

partition_fields[]

PartitionField

Optional. The sequence of fields describing the partition structure in entities. If this field is empty, there are no partitions within the data.

partition_style

PartitionStyle

Optional. The structure of paths containing partition data within the entity.

Mode

Additional qualifiers to define field semantics.

Enums
MODE_UNSPECIFIED Mode unspecified.
REQUIRED The field has required semantics.
NULLABLE The field has optional semantics, and may be null.
REPEATED The field has repeated (0 or more) semantics, and is a list of values.

PartitionField

Represents a key field within the entity's partition structure. You could have up to 20 partition fields, but only the first 10 partitions have the filtering ability due to performance consideration. Note: Partition fields are immutable.

Fields
name

string

Required. Partition field name must consist of letters, numbers, and underscores only, with a maximum of length of 256 characters, and must begin with a letter or underscore..

type

Type

Required. Immutable. The type of field.

PartitionStyle

The structure of paths within the entity, which represent partitions.

Enums
PARTITION_STYLE_UNSPECIFIED PartitionStyle unspecified
HIVE_COMPATIBLE Partitions are hive-compatible. Examples: gs://bucket/path/to/table/dt=2019-10-31/lang=en, gs://bucket/path/to/table/dt=2019-10-31/lang=en/late.

SchemaField

Represents a column field within a table schema.

Fields
name

string

Required. The name of the field. Must contain only letters, numbers and underscores, with a maximum length of 767 characters, and must begin with a letter or underscore.

description

string

Optional. User friendly field description. Must be less than or equal to 1024 characters.

type

Type

Required. The type of field.

mode

Mode

Required. Additional field semantics.

fields[]

SchemaField

Optional. Any nested field for complex types.

Type

Type information for fields in schemas and partition schemas.

Enums
TYPE_UNSPECIFIED SchemaType unspecified.
BOOLEAN Boolean field.
BYTE Single byte numeric field.
INT16 16-bit numeric field.
INT32 32-bit numeric field.
INT64 64-bit numeric field.
FLOAT Floating point numeric field.
DOUBLE Double precision numeric field.
DECIMAL Real value numeric field.
STRING Sequence of characters field.
BINARY Sequence of bytes field.
TIMESTAMP Date and time field.
DATE Date field.
TIME Time field.
RECORD Structured field. Nested fields that define the structure of the map. If all nested fields are nullable, this field represents a union.
NULL Null field that does not have values.

Session

Represents an active analyze session running for a user.

Fields
name

string

Output only. The relative resource name of the content, of the form: projects/{project_id}/locations/{location_id}/lakes/{lake_id}/environment/{environment_id}/sessions/{session_id}

user_id

string

Output only. Email of user running the session.

create_time

Timestamp

Output only. Session start time.

state

State

Output only. State of Session

SessionEvent

These messages contain information about sessions within an environment. The monitored resource is 'Environment'.

Fields
message

string

The log message.

user_id

string

The information about the user that created the session. It will be the email address of the user.

session_id

string

Unique identifier for the session.

type

EventType

The type of the event.

event_succeeded

bool

The status of the event.

fast_startup_enabled

bool

If the session is associated with an environment with fast startup enabled, and was created before being assigned to a user.

unassigned_duration

Duration

The idle duration of a warm pooled session before it is assigned to user.

Union field detail. Additional information about the Query metadata. detail can be only one of the following:
query

QueryDetail

The execution details of the query.

EventType

The type of the event.

Enums
EVENT_TYPE_UNSPECIFIED An unspecified event type.
START Event when the session is assigned to a user.
STOP Event for stop of a session.
QUERY Query events in the session.
CREATE Event for creation of a cluster. It is not yet assigned to a user. This comes before START in the sequence

QueryDetail

Execution details of the query.

Fields
query_id

string

The unique Query id identifying the query.

query_text

string

The query text executed.

engine

Engine

Query Execution engine.

duration

Duration

Time taken for execution of the query.

result_size_bytes

int64

The size of results the query produced.

data_processed_bytes

int64

The data processed by the query.

Engine

Query Execution engine.

Enums
ENGINE_UNSPECIFIED An unspecified Engine type.
SPARK_SQL Spark-sql engine is specified in Query.
BIGQUERY BigQuery engine is specified in Query.

State

State of a resource.

Enums
STATE_UNSPECIFIED State is not specified.
ACTIVE Resource is active, i.e., ready to use.
CREATING Resource is under creation.
DELETING Resource is under deletion.
ACTION_REQUIRED Resource is active but has unresolved actions.

StorageAccess

Describes the access mechanism of the data within its storage location.

Fields
read

AccessMode

Output only. Describes the read access mechanism of the data. Not user settable.

AccessMode

Access Mode determines how data stored within the Entity is read.

Enums
ACCESS_MODE_UNSPECIFIED Access mode unspecified.
DIRECT Default. Data is accessed directly using storage APIs.
MANAGED Data is accessed through a managed interface using BigQuery APIs.

StorageFormat

Describes the format of the data within its storage location.

Fields
format

Format

Output only. The data format associated with the stored data, which represents content type values. The value is inferred from mime type.

compression_format

CompressionFormat

Optional. The compression type associated with the stored data. If unspecified, the data is uncompressed.

mime_type

string

Required. The mime type descriptor for the data. Must match the pattern {type}/{subtype}. Supported values:

  • application/x-parquet
  • application/x-avro
  • application/x-orc
  • application/x-tfrecord
  • application/x-parquet+iceberg
  • application/x-avro+iceberg
  • application/x-orc+iceberg
  • application/json
  • application/{subtypes}
  • text/csv
  • text/
  • image/{image subtype}
  • video/{video subtype}
  • audio/{audio subtype}
Union field options. Additional format-specific options. options can be only one of the following:
csv

CsvOptions

Optional. Additional information about CSV formatted data.

json

JsonOptions

Optional. Additional information about CSV formatted data.

iceberg

IcebergOptions

Optional. Additional information about iceberg tables.

CompressionFormat

The specific compressed file format of the data.

Enums
COMPRESSION_FORMAT_UNSPECIFIED CompressionFormat unspecified. Implies uncompressed data.
GZIP GZip compressed set of files.
BZIP2 BZip2 compressed set of files.

CsvOptions

Describes CSV and similar semi-structured data formats.

Fields
encoding

string

Optional. The character encoding of the data. Accepts "US-ASCII", "UTF-8", and "ISO-8859-1". Defaults to UTF-8 if unspecified.

header_rows

int32

Optional. The number of rows to interpret as header rows that should be skipped when reading data rows. Defaults to 0.

delimiter

string

Optional. The delimiter used to separate values. Defaults to ','.

quote

string

Optional. The character used to quote column values. Accepts '"' (double quotation mark) or ''' (single quotation mark). Defaults to '"' (double quotation mark) if unspecified.

Format

The specific file format of the data.

Enums
FORMAT_UNSPECIFIED Format unspecified.
PARQUET Parquet-formatted structured data.
AVRO Avro-formatted structured data.
ORC Orc-formatted structured data.
CSV Csv-formatted semi-structured data.
JSON Json-formatted semi-structured data.
IMAGE Image data formats (such as jpg and png).
AUDIO Audio data formats (such as mp3, and wav).
VIDEO Video data formats (such as mp4 and mpg).
TEXT Textual data formats (such as txt and xml).
TFRECORD TensorFlow record format.
OTHER Data that doesn't match a specific format.
UNKNOWN Data of an unknown format.

IcebergOptions

Describes Iceberg data format.

Fields
metadata_location

string

Optional. The location of where the iceberg metadata is present, must be within the table path

JsonOptions

Describes JSON data format.

Fields
encoding

string

Optional. The character encoding of the data. Accepts "US-ASCII", "UTF-8" and "ISO-8859-1". Defaults to UTF-8 if not specified.

StorageSystem

Identifies the cloud system that manages the data storage.

Enums
STORAGE_SYSTEM_UNSPECIFIED Storage system unspecified.
CLOUD_STORAGE The entity data is contained within a Cloud Storage bucket.
BIGQUERY The entity data is contained within a BigQuery dataset.

Task

A task represents a user-visible job.

Fields
name

string

Output only. The relative resource name of the task, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/ tasks/{task_id}.

uid

string

Output only. System generated globally unique ID for the task. This ID will be different if the task is deleted and re-created with the same name.

create_time

Timestamp

Output only. The time when the task was created.

update_time

Timestamp

Output only. The time when the task was last updated.

description

string

Optional. Description of the task.

display_name

string

Optional. User friendly display name.

state

State

Output only. Current state of the task.

labels

map<string, string>

Optional. User-defined labels for the task.

trigger_spec

TriggerSpec

Required. Spec related to how often and when a task should be triggered.

execution_spec

ExecutionSpec

Required. Spec related to how a task is executed.

execution_status

ExecutionStatus

Output only. Status of the latest task executions.

Union field config. Task template specific user-specified config. config can be only one of the following:
spark

SparkTaskConfig

Config related to running custom Spark tasks.

notebook

NotebookTaskConfig

Config related to running scheduled Notebooks.

ExecutionSpec

Execution related settings, like retry and service_account.

Fields
args

map<string, string>

Optional. The arguments to pass to the task. The args can use placeholders of the format ${placeholder} as part of key/value string. These will be interpolated before passing the args to the driver. Currently supported placeholders: - ${task_id} - ${job_time} To pass positional args, set the key as TASK_ARGS. The value should be a comma-separated string of all the positional arguments. To use a delimiter other than comma, refer to https://cloud.google.com/sdk/gcloud/reference/topic/escaping. In case of other keys being present in the args, then TASK_ARGS will be passed as the last argument.

service_account

string

Required. Service account to use to execute a task. If not provided, the default Compute service account for the project is used.

project

string

Optional. The project in which jobs are run. By default, the project containing the Lake is used. If a project is provided, the ExecutionSpec.service_account must belong to this project.

max_job_execution_lifetime

Duration

Optional. The maximum duration after which the job execution is expired.

kms_key

string

Optional. The Cloud KMS key to use for encryption, of the form: projects/{project_number}/locations/{location_id}/keyRings/{key-ring-name}/cryptoKeys/{key-name}.

ExecutionStatus

Status of the task execution (e.g. Jobs).

Fields
update_time

Timestamp

Output only. Last update time of the status.

latest_job

Job

Output only. latest job execution

InfrastructureSpec

Configuration for the underlying infrastructure used to run workloads.

Fields
Union field resources. Hardware config. resources can be only one of the following:
batch

BatchComputeResources

Compute resources needed for a Task when using Dataproc Serverless.

Union field runtime. Software config. runtime can be only one of the following:
container_image

ContainerImageRuntime

Container Image Runtime Configuration.

Union field network. Networking config. network can be only one of the following:
vpc_network

VpcNetwork

Vpc network.

BatchComputeResources

Batch compute resources associated with the task.

Fields
executors_count

int32

Optional. Total number of job executors. Executor Count should be between 2 and 100. [Default=2]

max_executors_count

int32

Optional. Max configurable executors. If max_executors_count > executors_count, then auto-scaling is enabled. Max Executor Count should be between 2 and 1000. [Default=1000]

ContainerImageRuntime

Container Image Runtime Configuration used with Batch execution.

Fields
image

string

Optional. Container image to use.

java_jars[]

string

Optional. A list of Java JARS to add to the classpath. Valid input includes Cloud Storage URIs to Jar binaries. For example, gs://bucket-name/my/path/to/file.jar

python_packages[]

string

Optional. A list of python packages to be installed. Valid formats include Cloud Storage URI to a PIP installable library. For example, gs://bucket-name/my/path/to/lib.tar.gz

properties

map<string, string>

Optional. Override to common configuration of open source components installed on the Dataproc cluster. The properties to set on daemon config files. Property keys are specified in prefix:property format, for example core:hadoop.tmp.dir. For more information, see Cluster properties.

VpcNetwork

Cloud VPC Network used to run the infrastructure.

Fields
network_tags[]

string

Optional. List of network tags to apply to the job.

Union field network_name. The Cloud VPC network identifier. network_name can be only one of the following:
network

string

Optional. The Cloud VPC network in which the job is run. By default, the Cloud VPC network named Default within the project is used.

sub_network

string

Optional. The Cloud VPC sub-network in which the job is run.

NotebookTaskConfig

Config for running scheduled notebooks.

Fields
notebook

string

Required. Path to input notebook. This can be the Cloud Storage URI of the notebook file or the path to a Notebook Content. The execution args are accessible as environment variables (TASK_key=value).

infrastructure_spec

InfrastructureSpec

Optional. Infrastructure specification for the execution.

file_uris[]

string

Optional. Cloud Storage URIs of files to be placed in the working directory of each executor.

archive_uris[]

string

Optional. Cloud Storage URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.

SparkTaskConfig

User-specified config for running a Spark task.

Fields
file_uris[]

string

Optional. Cloud Storage URIs of files to be placed in the working directory of each executor.

archive_uris[]

string

Optional. Cloud Storage URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.

infrastructure_spec

InfrastructureSpec

Optional. Infrastructure specification for the execution.

Union field driver. Required. The specification of the main method to call to drive the job. Specify either the jar file that contains the main class or the main class name. driver can be only one of the following:
main_jar_file_uri

string

The Cloud Storage URI of the jar file that contains the main class. The execution args are passed in as a sequence of named process arguments (--key=value).

main_class

string

The name of the driver's main class. The jar file that contains the class must be in the default CLASSPATH or specified in jar_file_uris. The execution args are passed in as a sequence of named process arguments (--key=value).

python_script_file

string

The Gcloud Storage URI of the main Python file to use as the driver. Must be a .py file. The execution args are passed in as a sequence of named process arguments (--key=value).

sql_script_file

string

A reference to a query file. This can be the Cloud Storage URI of the query file or it can the path to a SqlScript Content. The execution args are used to declare a set of script variables (set key="value";).

sql_script

string

The query text. The execution args are used to declare a set of script variables (set key="value";).

TriggerSpec

Task scheduling and trigger settings.

Fields
type

Type

Required. Immutable. Trigger type of the user-specified Task.

start_time

Timestamp

Optional. The first run of the task will be after this time. If not specified, the task will run shortly after being submitted if ON_DEMAND and based on the schedule if RECURRING.

disabled

bool

Optional. Prevent the task from executing. This does not cancel already running tasks. It is intended to temporarily disable RECURRING tasks.

max_retries

int32

Optional. Number of retry attempts before aborting. Set to zero to never attempt to retry a failed task.

Union field trigger. Trigger only applies for RECURRING tasks. trigger can be only one of the following:
schedule

string

Optional. Cron schedule (https://en.wikipedia.org/wiki/Cron) for running tasks periodically. To explicitly set a timezone to the cron tab, apply a prefix in the cron tab: "CRON_TZ=${IANA_TIME_ZONE}" or "TZ=${IANA_TIME_ZONE}". The ${IANA_TIME_ZONE} may only be a valid string from IANA time zone database. For example, CRON_TZ=America/New_York 1 * * * *, or TZ=America/New_York 1 * * * *. This field is required for RECURRING tasks.

Type

Determines how often and when the job will run.

Enums
TYPE_UNSPECIFIED Unspecified trigger type.
ON_DEMAND The task runs one-time shortly after Task Creation.
RECURRING The task is scheduled to run periodically.

Trigger

DataScan scheduling and trigger settings.

Fields

Union field mode. DataScan scheduling and trigger settings.

If not specified, the default is onDemand. mode can be only one of the following:

on_demand

OnDemand

The scan runs once via RunDataScan API.

schedule

Schedule

The scan is scheduled to run periodically.

OnDemand

This type has no fields.

The scan runs once via RunDataScan API.

Schedule

The scan is scheduled to run periodically.

Fields
cron

string

Required. Cron schedule for running scans periodically.

To explicitly set a timezone in the cron tab, apply a prefix in the cron tab: "CRON_TZ=${IANA_TIME_ZONE}" or "TZ=${IANA_TIME_ZONE}". The ${IANA_TIME_ZONE} may only be a valid string from IANA time zone database (wikipedia). For example, CRON_TZ=America/New_York 1 * * * *, or TZ=America/New_York 1 * * * *.

This field is required for Schedule scans.

UpdateAssetRequest

Update asset request.

Fields
update_mask

FieldMask

Required. Mask of fields to update.

asset

Asset

Required. Update description. Only fields specified in update_mask are updated.

Authorization requires the following IAM permission on the specified resource asset:

  • dataplex.assets.update
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

UpdateContentRequest

Update content request.

Fields
update_mask

FieldMask

Required. Mask of fields to update.

content

Content

Required. Update description. Only fields specified in update_mask are updated.

Authorization requires the following IAM permission on the specified resource content:

  • dataplex.content.update
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

UpdateDataAttributeBindingRequest

Update DataAttributeBinding request.

Fields
update_mask

FieldMask

Required. Mask of fields to update.

data_attribute_binding

DataAttributeBinding

Required. Only fields specified in update_mask are updated.

Authorization requires the following IAM permission on the specified resource dataAttributeBinding:

  • dataplex.dataAttributeBindings.update
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

UpdateDataAttributeRequest

Update DataAttribute request.

Fields
update_mask

FieldMask

Required. Mask of fields to update.

data_attribute

DataAttribute

Required. Only fields specified in update_mask are updated.

Authorization requires the following IAM permission on the specified resource dataAttribute:

  • dataplex.dataAttributes.update
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

UpdateDataScanRequest

Update dataScan request.

Fields
data_scan

DataScan

Required. DataScan resource to be updated.

Only fields specified in update_mask are updated.

Authorization requires the following IAM permission on the specified resource dataScan:

  • dataplex.datascans.update
update_mask

FieldMask

Required. Mask of fields to update.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

UpdateDataTaxonomyRequest

Update DataTaxonomy request.

Fields
update_mask

FieldMask

Required. Mask of fields to update.

data_taxonomy

DataTaxonomy

Required. Only fields specified in update_mask are updated.

Authorization requires the following IAM permission on the specified resource dataTaxonomy:

  • dataplex.dataTaxonomies.update
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

UpdateEntityRequest

Update a metadata entity request. The exiting entity will be fully replaced by the entity in the request. The entity ID is mutable. To modify the ID, use the current entity ID in the request URL and specify the new ID in the request body.

Fields
entity

Entity

Required. Update description.

Authorization requires the following IAM permission on the specified resource entity:

  • dataplex.entities.update
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

UpdateEnvironmentRequest

Update environment request.

Fields
update_mask

FieldMask

Required. Mask of fields to update.

environment

Environment

Required. Update description. Only fields specified in update_mask are updated.

Authorization requires the following IAM permission on the specified resource environment:

  • dataplex.environments.update
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

UpdateLakeRequest

Update lake request.

Fields
update_mask

FieldMask

Required. Mask of fields to update.

lake

Lake

Required. Update description. Only fields specified in update_mask are updated.

Authorization requires the following IAM permission on the specified resource lake:

  • dataplex.lakes.update
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

UpdateTaskRequest

Update task request.

Fields
update_mask

FieldMask

Required. Mask of fields to update.

task

Task

Required. Update description. Only fields specified in update_mask are updated.

Authorization requires the following IAM permission on the specified resource task:

  • dataplex.tasks.update
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

UpdateZoneRequest

Update zone request.

Fields
update_mask

FieldMask

Required. Mask of fields to update.

zone

Zone

Required. Update description. Only fields specified in update_mask are updated.

Authorization requires the following IAM permission on the specified resource zone:

  • dataplex.zones.update
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

Zone

A zone represents a logical group of related assets within a lake. A zone can be used to map to organizational structure or represent stages of data readiness from raw to curated. It provides managing behavior that is shared or inherited by all contained assets.

Fields
name

string

Output only. The relative resource name of the zone, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}.

display_name

string

Optional. User friendly display name.

uid

string

Output only. System generated globally unique ID for the zone. This ID will be different if the zone is deleted and re-created with the same name.

create_time

Timestamp

Output only. The time when the zone was created.

update_time

Timestamp

Output only. The time when the zone was last updated.

labels

map<string, string>

Optional. User defined labels for the zone.

description

string

Optional. Description of the zone.

state

State

Output only. Current state of the zone.

type

Type

Required. Immutable. The type of the zone.

discovery_spec

DiscoverySpec

Optional. Specification of the discovery feature applied to data in this zone.

resource_spec

ResourceSpec

Required. Specification of the resources that are referenced by the assets within this zone.

asset_status

AssetStatus

Output only. Aggregated status of the underlying assets of the zone.

DiscoverySpec

Settings to manage the metadata discovery and publishing in a zone.

Fields
enabled

bool

Required. Whether discovery is enabled.

include_patterns[]

string

Optional. The list of patterns to apply for selecting data to include during discovery if only a subset of the data should considered. For Cloud Storage bucket assets, these are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these are interpreted as patterns to match table names.

exclude_patterns[]

string

Optional. The list of patterns to apply for selecting data to exclude during discovery. For Cloud Storage bucket assets, these are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these are interpreted as patterns to match table names.

csv_options

CsvOptions

Optional. Configuration for CSV data.

json_options

JsonOptions

Optional. Configuration for Json data.

Union field trigger. Determines when discovery is triggered. trigger can be only one of the following:
schedule

string

Optional. Cron schedule (https://en.wikipedia.org/wiki/Cron) for running discovery periodically. Successive discovery runs must be scheduled at least 60 minutes apart. The default value is to run discovery every 60 minutes. To explicitly set a timezone to the cron tab, apply a prefix in the cron tab: "CRON_TZ=${IANA_TIME_ZONE}" or TZ=${IANA_TIME_ZONE}". The ${IANA_TIME_ZONE} may only be a valid string from IANA time zone database. For example, CRON_TZ=America/New_York 1 * * * *, or TZ=America/New_York 1 * * * *.

CsvOptions

Describe CSV and similar semi-structured data formats.

Fields
header_rows

int32

Optional. The number of rows to interpret as header rows that should be skipped when reading data rows.

delimiter

string

Optional. The delimiter being used to separate values. This defaults to ','.

encoding

string

Optional. The character encoding of the data. The default is UTF-8.

disable_type_inference

bool

Optional. Whether to disable the inference of data type for CSV data. If true, all columns will be registered as strings.

JsonOptions

Describe JSON data format.

Fields
encoding

string

Optional. The character encoding of the data. The default is UTF-8.

disable_type_inference

bool

Optional. Whether to disable the inference of data type for Json data. If true, all columns will be registered as their primitive types (strings, number or boolean).

ResourceSpec

Settings for resources attached as assets within a zone.

Fields
location_type

LocationType

Required. Immutable. The location type of the resources that are allowed to be attached to the assets within this zone.

LocationType

Location type of the resources attached to a zone.

Enums
LOCATION_TYPE_UNSPECIFIED Unspecified location type.
SINGLE_REGION Resources that are associated with a single region.
MULTI_REGION Resources that are associated with a multi-region location.

Type

Type of zone.

Enums
TYPE_UNSPECIFIED Zone type not specified.
RAW A zone that contains data that needs further processing before it is considered generally ready for consumption and analytics workloads.
CURATED A zone that contains data that is considered to be ready for broader consumption and analytics workloads. Curated structured data stored in Cloud Storage must conform to certain file formats (parquet, avro and orc) and organized in a hive-compatible directory layout.