Package google.cloud.dataplex.v1

Index

CatalogService

The primary resources offered by this service are EntryGroups, EntryTypes, AspectTypes, and Entries. They collectively let data administrators organize, manage, secure, and catalog data located across cloud projects in their organization in a variety of storage systems, including Cloud Storage and BigQuery.

CancelMetadataJob

rpc CancelMetadataJob(CancelMetadataJobRequest) returns (Empty)

Cancels a metadata job.

If you cancel a metadata import job that is in progress, the changes in the job might be partially applied. We recommend that you reset the state of the entry groups in your project by running another metadata job that reverts the changes from the canceled job.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • dataplex.metadataJobs.cancel

For more information, see the IAM documentation.

CreateAspectType

rpc CreateAspectType(CreateAspectTypeRequest) returns (Operation)

Creates an AspectType.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the parent resource:

  • dataplex.aspectTypes.create

For more information, see the IAM documentation.

CreateEntry

rpc CreateEntry(CreateEntryRequest) returns (Entry)

Creates an Entry.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permissions on the parent resource:

  • dataplex.aspectTypes.use
  • dataplex.entries.create
  • dataplex.entryGroups.useContactsAspect
  • dataplex.entryGroups.useGenericAspect
  • dataplex.entryGroups.useGenericEntry
  • dataplex.entryGroups.useOverviewAspect
  • dataplex.entryGroups.useSchemaAspect
  • dataplex.entryTypes.use

For more information, see the IAM documentation.

CreateEntryGroup

rpc CreateEntryGroup(CreateEntryGroupRequest) returns (Operation)

Creates an EntryGroup.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the parent resource:

  • dataplex.entryGroups.create

For more information, see the IAM documentation.

CreateEntryType

rpc CreateEntryType(CreateEntryTypeRequest) returns (Operation)

Creates an EntryType.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the parent resource:

  • dataplex.entryTypes.create

For more information, see the IAM documentation.

CreateMetadataJob

rpc CreateMetadataJob(CreateMetadataJobRequest) returns (Operation)

Creates a metadata job. For example, use a metadata job to import Dataplex Catalog entries and aspects from a third-party system into Dataplex.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the parent resource:

  • dataplex.metadataJobs.create

For more information, see the IAM documentation.

DeleteAspectType

rpc DeleteAspectType(DeleteAspectTypeRequest) returns (Operation)

Deletes an AspectType.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • dataplex.aspectTypes.delete

For more information, see the IAM documentation.

DeleteEntry

rpc DeleteEntry(DeleteEntryRequest) returns (Entry)

Deletes an Entry.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • dataplex.entries.delete

For more information, see the IAM documentation.

DeleteEntryGroup

rpc DeleteEntryGroup(DeleteEntryGroupRequest) returns (Operation)

Deletes an EntryGroup.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • dataplex.entryGroups.delete

For more information, see the IAM documentation.

DeleteEntryType

rpc DeleteEntryType(DeleteEntryTypeRequest) returns (Operation)

Deletes an EntryType.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • dataplex.entryTypes.delete

For more information, see the IAM documentation.

GetAspectType

rpc GetAspectType(GetAspectTypeRequest) returns (AspectType)

Gets an AspectType.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • dataplex.aspectTypes.get

For more information, see the IAM documentation.

GetEntry

rpc GetEntry(GetEntryRequest) returns (Entry)

Gets an Entry.

Caution: The BigQuery metadata that is stored in Dataplex Catalog is changing. For more information, see Changes to BigQuery metadata stored in Dataplex Catalog.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • dataplex.entries.get

For more information, see the IAM documentation.

GetEntryGroup

rpc GetEntryGroup(GetEntryGroupRequest) returns (EntryGroup)

Gets an EntryGroup.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • dataplex.entryGroups.get

For more information, see the IAM documentation.

GetEntryType

rpc GetEntryType(GetEntryTypeRequest) returns (EntryType)

Gets an EntryType.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • dataplex.entryTypes.get

For more information, see the IAM documentation.

GetMetadataJob

rpc GetMetadataJob(GetMetadataJobRequest) returns (MetadataJob)

Gets a metadata job.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • dataplex.metadataJobs.get

For more information, see the IAM documentation.

ListAspectTypes

rpc ListAspectTypes(ListAspectTypesRequest) returns (ListAspectTypesResponse)

Lists AspectType resources in a project and location.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the parent resource:

  • dataplex.aspectTypes.list

For more information, see the IAM documentation.

ListEntries

rpc ListEntries(ListEntriesRequest) returns (ListEntriesResponse)

Lists Entries within an EntryGroup.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the parent resource:

  • dataplex.entries.list

For more information, see the IAM documentation.

ListEntryGroups

rpc ListEntryGroups(ListEntryGroupsRequest) returns (ListEntryGroupsResponse)

Lists EntryGroup resources in a project and location.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the parent resource:

  • dataplex.entryGroups.list

For more information, see the IAM documentation.

ListEntryTypes

rpc ListEntryTypes(ListEntryTypesRequest) returns (ListEntryTypesResponse)

Lists EntryType resources in a project and location.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the parent resource:

  • dataplex.entryTypes.list

For more information, see the IAM documentation.

ListMetadataJobs

rpc ListMetadataJobs(ListMetadataJobsRequest) returns (ListMetadataJobsResponse)

Lists metadata jobs.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the parent resource:

  • dataplex.metadataJobs.list

For more information, see the IAM documentation.

LookupEntry

rpc LookupEntry(LookupEntryRequest) returns (Entry)

Looks up a single Entry by name using the permission on the source system.

Caution: The BigQuery metadata that is stored in Dataplex Catalog is changing. For more information, see Changes to BigQuery metadata stored in Dataplex Catalog.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

SearchEntries

rpc SearchEntries(SearchEntriesRequest) returns (SearchEntriesResponse)

Searches for Entries matching the given query and scope.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • dataplex.projects.search

For more information, see the IAM documentation.

UpdateAspectType

rpc UpdateAspectType(UpdateAspectTypeRequest) returns (Operation)

Updates an AspectType.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • dataplex.aspectTypes.update

For more information, see the IAM documentation.

UpdateEntry

rpc UpdateEntry(UpdateEntryRequest) returns (Entry)

Updates an Entry.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permissions on the name resource:

  • dataplex.aspectTypes.use
  • dataplex.entries.create
  • dataplex.entries.update
  • dataplex.entryGroups.useContactsAspect
  • dataplex.entryGroups.useGenericAspect
  • dataplex.entryGroups.useGenericEntry
  • dataplex.entryGroups.useOverviewAspect
  • dataplex.entryGroups.useSchemaAspect
  • dataplex.entryTypes.use

For more information, see the IAM documentation.

UpdateEntryGroup

rpc UpdateEntryGroup(UpdateEntryGroupRequest) returns (Operation)

Updates an EntryGroup.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • dataplex.entryGroups.update

For more information, see the IAM documentation.

UpdateEntryType

rpc UpdateEntryType(UpdateEntryTypeRequest) returns (Operation)

Updates an EntryType.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • dataplex.entryTypes.update

For more information, see the IAM documentation.

ContentService

ContentService manages Notebook and SQL Scripts for Dataplex.

CreateContent

rpc CreateContent(CreateContentRequest) returns (Content)

Create a content.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteContent

rpc DeleteContent(DeleteContentRequest) returns (Empty)

Delete a content.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetContent

rpc GetContent(GetContentRequest) returns (Content)

Get a content resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetIamPolicy

rpc GetIamPolicy(GetIamPolicyRequest) returns (Policy)

Gets the access control policy for a contentitem resource. A NOT_FOUND error is returned if the resource does not exist. An empty policy is returned if the resource exists but does not have a policy set on it.

Caller must have Google IAM dataplex.content.getIamPolicy permission on the resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListContent

rpc ListContent(ListContentRequest) returns (ListContentResponse)

List content.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

SetIamPolicy

rpc SetIamPolicy(SetIamPolicyRequest) returns (Policy)

Sets the access control policy on the specified contentitem resource. Replaces any existing policy.

Caller must have Google IAM dataplex.content.setIamPolicy permission on the resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

TestIamPermissions

rpc TestIamPermissions(TestIamPermissionsRequest) returns (TestIamPermissionsResponse)

Returns the caller's permissions on a resource. If the resource does not exist, an empty set of permissions is returned (a NOT_FOUND error is not returned).

A caller is not required to have Google IAM permission to make this request.

Note: This operation is designed to be used for building permission-aware UIs and command-line tools, not for authorization checking. This operation may "fail open" without warning.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

UpdateContent

rpc UpdateContent(UpdateContentRequest) returns (Content)

Update a content. Only supports full resource update.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DataScanService

DataScanService manages DataScan resources which can be configured to run various types of data scanning workload and generate enriched metadata (e.g. Data Profile, Data Quality) for the data source.

CreateDataScan

rpc CreateDataScan(CreateDataScanRequest) returns (Operation)

Creates a DataScan resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteDataScan

rpc DeleteDataScan(DeleteDataScanRequest) returns (Operation)

Deletes a DataScan resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GenerateDataQualityRules

rpc GenerateDataQualityRules(GenerateDataQualityRulesRequest) returns (GenerateDataQualityRulesResponse)

Generates recommended data quality rules based on the results of a data profiling scan.

Use the recommendations to build rules for a data quality scan.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetDataScan

rpc GetDataScan(GetDataScanRequest) returns (DataScan)

Gets a DataScan resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetDataScanJob

rpc GetDataScanJob(GetDataScanJobRequest) returns (DataScanJob)

Gets a DataScanJob resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListDataScanJobs

rpc ListDataScanJobs(ListDataScanJobsRequest) returns (ListDataScanJobsResponse)

Lists DataScanJobs under the given DataScan.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListDataScans

rpc ListDataScans(ListDataScansRequest) returns (ListDataScansResponse)

Lists DataScans.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

RunDataScan

rpc RunDataScan(RunDataScanRequest) returns (RunDataScanResponse)

Runs an on-demand execution of a DataScan

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

UpdateDataScan

rpc UpdateDataScan(UpdateDataScanRequest) returns (Operation)

Updates a DataScan resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DataTaxonomyService

DataTaxonomyService enables attribute-based governance. The resources currently offered include DataTaxonomy and DataAttribute.

CreateDataAttribute

rpc CreateDataAttribute(CreateDataAttributeRequest) returns (Operation)

Create a DataAttribute resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

CreateDataAttributeBinding

rpc CreateDataAttributeBinding(CreateDataAttributeBindingRequest) returns (Operation)

Create a DataAttributeBinding resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

CreateDataTaxonomy

rpc CreateDataTaxonomy(CreateDataTaxonomyRequest) returns (Operation)

Create a DataTaxonomy resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteDataAttribute

rpc DeleteDataAttribute(DeleteDataAttributeRequest) returns (Operation)

Deletes a Data Attribute resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteDataAttributeBinding

rpc DeleteDataAttributeBinding(DeleteDataAttributeBindingRequest) returns (Operation)

Deletes a DataAttributeBinding resource. All attributes within the DataAttributeBinding must be deleted before the DataAttributeBinding can be deleted.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteDataTaxonomy

rpc DeleteDataTaxonomy(DeleteDataTaxonomyRequest) returns (Operation)

Deletes a DataTaxonomy resource. All attributes within the DataTaxonomy must be deleted before the DataTaxonomy can be deleted.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetDataAttribute

rpc GetDataAttribute(GetDataAttributeRequest) returns (DataAttribute)

Retrieves a Data Attribute resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetDataAttributeBinding

rpc GetDataAttributeBinding(GetDataAttributeBindingRequest) returns (DataAttributeBinding)

Retrieves a DataAttributeBinding resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetDataTaxonomy

rpc GetDataTaxonomy(GetDataTaxonomyRequest) returns (DataTaxonomy)

Retrieves a DataTaxonomy resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListDataAttributeBindings

rpc ListDataAttributeBindings(ListDataAttributeBindingsRequest) returns (ListDataAttributeBindingsResponse)

Lists DataAttributeBinding resources in a project and location.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListDataAttributes

rpc ListDataAttributes(ListDataAttributesRequest) returns (ListDataAttributesResponse)

Lists Data Attribute resources in a DataTaxonomy.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListDataTaxonomies

rpc ListDataTaxonomies(ListDataTaxonomiesRequest) returns (ListDataTaxonomiesResponse)

Lists DataTaxonomy resources in a project and location.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

UpdateDataAttribute

rpc UpdateDataAttribute(UpdateDataAttributeRequest) returns (Operation)

Updates a DataAttribute resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

UpdateDataAttributeBinding

rpc UpdateDataAttributeBinding(UpdateDataAttributeBindingRequest) returns (Operation)

Updates a DataAttributeBinding resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

UpdateDataTaxonomy

rpc UpdateDataTaxonomy(UpdateDataTaxonomyRequest) returns (Operation)

Updates a DataTaxonomy resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DataplexService

Dataplex service provides data lakes as a service. The primary resources offered by this service are Lakes, Zones and Assets which collectively allow a data administrator to organize, manage, secure and catalog data across their organization located across cloud projects in a variety of storage systems including Cloud Storage and BigQuery.

CancelJob

rpc CancelJob(CancelJobRequest) returns (Empty)

Cancel jobs running for the task resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

CreateAsset

rpc CreateAsset(CreateAssetRequest) returns (Operation)

Creates an asset resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

CreateEnvironment

rpc CreateEnvironment(CreateEnvironmentRequest) returns (Operation)

Create an environment resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

CreateLake

rpc CreateLake(CreateLakeRequest) returns (Operation)

Creates a lake resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

CreateTask

rpc CreateTask(CreateTaskRequest) returns (Operation)

Creates a task resource within a lake.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

CreateZone

rpc CreateZone(CreateZoneRequest) returns (Operation)

Creates a zone resource within a lake.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteAsset

rpc DeleteAsset(DeleteAssetRequest) returns (Operation)

Deletes an asset resource. The referenced storage resource is detached (default) or deleted based on the associated Lifecycle policy.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteEnvironment

rpc DeleteEnvironment(DeleteEnvironmentRequest) returns (Operation)

Delete the environment resource. All the child resources must have been deleted before environment deletion can be initiated.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteLake

rpc DeleteLake(DeleteLakeRequest) returns (Operation)

Deletes a lake resource. All zones within the lake must be deleted before the lake can be deleted.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteTask

rpc DeleteTask(DeleteTaskRequest) returns (Operation)

Delete the task resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteZone

rpc DeleteZone(DeleteZoneRequest) returns (Operation)

Deletes a zone resource. All assets within a zone must be deleted before the zone can be deleted.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetAsset

rpc GetAsset(GetAssetRequest) returns (Asset)

Retrieves an asset resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetEnvironment

rpc GetEnvironment(GetEnvironmentRequest) returns (Environment)

Get environment resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetJob

rpc GetJob(GetJobRequest) returns (Job)

Get job resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetLake

rpc GetLake(GetLakeRequest) returns (Lake)

Retrieves a lake resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetTask

rpc GetTask(GetTaskRequest) returns (Task)

Get task resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetZone

rpc GetZone(GetZoneRequest) returns (Zone)

Retrieves a zone resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListAssetActions

rpc ListAssetActions(ListAssetActionsRequest) returns (ListActionsResponse)

Lists action resources in an asset.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListAssets

rpc ListAssets(ListAssetsRequest) returns (ListAssetsResponse)

Lists asset resources in a zone.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListEnvironments

rpc ListEnvironments(ListEnvironmentsRequest) returns (ListEnvironmentsResponse)

Lists environments under the given lake.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListJobs

rpc ListJobs(ListJobsRequest) returns (ListJobsResponse)

Lists Jobs under the given task.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListLakeActions

rpc ListLakeActions(ListLakeActionsRequest) returns (ListActionsResponse)

Lists action resources in a lake.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListLakes

rpc ListLakes(ListLakesRequest) returns (ListLakesResponse)

Lists lake resources in a project and location.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListSessions

rpc ListSessions(ListSessionsRequest) returns (ListSessionsResponse)

Lists session resources in an environment.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListTasks

rpc ListTasks(ListTasksRequest) returns (ListTasksResponse)

Lists tasks under the given lake.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListZoneActions

rpc ListZoneActions(ListZoneActionsRequest) returns (ListActionsResponse)

Lists action resources in a zone.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListZones

rpc ListZones(ListZonesRequest) returns (ListZonesResponse)

Lists zone resources in a lake.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

RunTask

rpc RunTask(RunTaskRequest) returns (RunTaskResponse)

Run an on demand execution of a Task.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • dataplex.tasks.run

For more information, see the IAM documentation.

UpdateAsset

rpc UpdateAsset(UpdateAssetRequest) returns (Operation)

Updates an asset resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

UpdateEnvironment

rpc UpdateEnvironment(UpdateEnvironmentRequest) returns (Operation)

Update the environment resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

UpdateLake

rpc UpdateLake(UpdateLakeRequest) returns (Operation)

Updates a lake resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

UpdateTask

rpc UpdateTask(UpdateTaskRequest) returns (Operation)

Update the task resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

UpdateZone

rpc UpdateZone(UpdateZoneRequest) returns (Operation)

Updates a zone resource.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

MetadataService

Metadata service manages metadata resources such as tables, filesets and partitions.

CreateEntity

rpc CreateEntity(CreateEntityRequest) returns (Entity)

Create a metadata entity.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

CreatePartition

rpc CreatePartition(CreatePartitionRequest) returns (Partition)

Create a metadata partition.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteEntity

rpc DeleteEntity(DeleteEntityRequest) returns (Empty)

Delete a metadata entity.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeletePartition

rpc DeletePartition(DeletePartitionRequest) returns (Empty)

Delete a metadata partition.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetEntity

rpc GetEntity(GetEntityRequest) returns (Entity)

Get a metadata entity.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetPartition

rpc GetPartition(GetPartitionRequest) returns (Partition)

Get a metadata partition of an entity.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListEntities

rpc ListEntities(ListEntitiesRequest) returns (ListEntitiesResponse)

List metadata entities in a zone.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListPartitions

rpc ListPartitions(ListPartitionsRequest) returns (ListPartitionsResponse)

List metadata partitions of an entity.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

UpdateEntity

rpc UpdateEntity(UpdateEntityRequest) returns (Entity)

Update a metadata entity. Only supports full resource update.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

Action

Action represents an issue requiring administrator action for resolution.

Fields
category

Category

The category of issue associated with the action.

issue

string

Detailed description of the issue requiring action.

detect_time

Timestamp

The time that the issue was detected.

name

string

Output only. The relative resource name of the action, of the form: projects/{project}/locations/{location}/lakes/{lake}/actions/{action} projects/{project}/locations/{location}/lakes/{lake}/zones/{zone}/actions/{action} projects/{project}/locations/{location}/lakes/{lake}/zones/{zone}/assets/{asset}/actions/{action}.

lake

string

Output only. The relative resource name of the lake, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}.

zone

string

Output only. The relative resource name of the zone, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}.

asset

string

Output only. The relative resource name of the asset, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/assets/{asset_id}.

data_locations[]

string

The list of data locations associated with this action. Cloud Storage locations are represented as URI paths(E.g. gs://bucket/table1/year=2020/month=Jan/). BigQuery locations refer to resource names(E.g. bigquery.googleapis.com/projects/project-id/datasets/dataset-id).

Union field details. Additional details about the action based on the action category. details can be only one of the following:
invalid_data_format

InvalidDataFormat

Details for issues related to invalid or unsupported data formats.

incompatible_data_schema

IncompatibleDataSchema

Details for issues related to incompatible schemas detected within data.

invalid_data_partition

InvalidDataPartition

Details for issues related to invalid or unsupported data partition structure.

missing_data

MissingData

Details for issues related to absence of data within managed resources.

missing_resource

MissingResource

Details for issues related to absence of a managed resource.

unauthorized_resource

UnauthorizedResource

Details for issues related to lack of permissions to access data resources.

failed_security_policy_apply

FailedSecurityPolicyApply

Details for issues related to applying security policy.

invalid_data_organization

InvalidDataOrganization

Details for issues related to invalid data arrangement.

Category

The category of issues.

Enums
CATEGORY_UNSPECIFIED Unspecified category.
RESOURCE_MANAGEMENT Resource management related issues.
SECURITY_POLICY Security policy related issues.
DATA_DISCOVERY Data and discovery related issues.

FailedSecurityPolicyApply

Failed to apply security policy to the managed resource(s) under a lake, zone or an asset. For a lake or zone resource, one or more underlying assets has a failure applying security policy to the associated managed resource.

Fields
asset

string

Resource name of one of the assets with failing security policy application. Populated for a lake or zone resource only.

IncompatibleDataSchema

Action details for incompatible schemas detected by discovery.

Fields
table

string

The name of the table containing invalid data.

existing_schema

string

The existing and expected schema of the table. The schema is provided as a JSON formatted structure listing columns and data types.

new_schema

string

The new and incompatible schema within the table. The schema is provided as a JSON formatted structured listing columns and data types.

sampled_data_locations[]

string

The list of data locations sampled and used for format/schema inference.

schema_change

SchemaChange

Whether the action relates to a schema that is incompatible or modified.

SchemaChange

Whether the action relates to a schema that is incompatible or modified.

Enums
SCHEMA_CHANGE_UNSPECIFIED Schema change unspecified.
INCOMPATIBLE Newly discovered schema is incompatible with existing schema.
MODIFIED Newly discovered schema has changed from existing schema for data in a curated zone.

InvalidDataFormat

Action details for invalid or unsupported data files detected by discovery.

Fields
sampled_data_locations[]

string

The list of data locations sampled and used for format/schema inference.

expected_format

string

The expected data format of the entity.

new_format

string

The new unexpected data format within the entity.

InvalidDataOrganization

This type has no fields.

Action details for invalid data arrangement.

InvalidDataPartition

Action details for invalid or unsupported partitions detected by discovery.

Fields
expected_structure

PartitionStructure

The issue type of InvalidDataPartition.

PartitionStructure

The expected partition structure.

Enums
PARTITION_STRUCTURE_UNSPECIFIED PartitionStructure unspecified.
CONSISTENT_KEYS Consistent hive-style partition definition (both raw and curated zone).
HIVE_STYLE_KEYS Hive style partition definition (curated zone only).

MissingData

This type has no fields.

Action details for absence of data detected by discovery.

MissingResource

This type has no fields.

Action details for resource references in assets that cannot be located.

UnauthorizedResource

This type has no fields.

Action details for unauthorized resource issues raised to indicate that the service account associated with the lake instance is not authorized to access or manage the resource associated with an asset.

Aspect

An aspect is a single piece of metadata describing an entry.

Fields
aspect_type

string

Output only. The resource name of the type used to create this Aspect.

path

string

Output only. The path in the entry under which the aspect is attached.

create_time

Timestamp

Output only. The time when the Aspect was created.

update_time

Timestamp

Output only. The time when the Aspect was last updated.

data

Struct

Required. The content of the aspect, according to its aspect type schema. The maximum size of the field is 120KB (encoded as UTF-8).

aspect_source

AspectSource

Optional. Information related to the source system of the aspect.

AspectSource

Information related to the source system of the aspect.

Fields
create_time

Timestamp

The time the aspect was created in the source system.

update_time

Timestamp

The time the aspect was last updated in the source system.

data_version

string

The version of the data format used to produce this data. This field is used to indicated when the underlying data format changes (e.g., schema modifications, changes to the source URL format definition, etc).

AspectType

AspectType is a template for creating Aspects, and represents the JSON-schema for a given Entry, for example, BigQuery Table Schema.

Fields
name

string

Output only. The relative resource name of the AspectType, of the form: projects/{project_number}/locations/{location_id}/aspectTypes/{aspect_type_id}.

uid

string

Output only. System generated globally unique ID for the AspectType. If you delete and recreate the AspectType with the same name, then this ID will be different.

create_time

Timestamp

Output only. The time when the AspectType was created.

update_time

Timestamp

Output only. The time when the AspectType was last updated.

description

string

Optional. Description of the AspectType.

display_name

string

Optional. User friendly display name.

labels

map<string, string>

Optional. User-defined labels for the AspectType.

etag

string

The service computes this checksum. The client may send it on update and delete requests to ensure it has an up-to-date value before proceeding.

authorization

Authorization

Immutable. Defines the Authorization for this type.

metadata_template

MetadataTemplate

Required. MetadataTemplate of the aspect.

transfer_status

TransferStatus

Output only. Denotes the transfer status of the Aspect Type. It is unspecified for Aspect Types created from Dataplex API.

Authorization

Autorization for an AspectType.

Fields
alternate_use_permission

string

Immutable. The IAM permission grantable on the EntryGroup to allow access to instantiate Aspects of Dataplex owned AspectTypes, only settable for Dataplex owned Types.

MetadataTemplate

MetadataTemplate definition for an AspectType.

Fields
index

int32

Optional. Index is used to encode Template messages. The value of index can range between 1 and 2,147,483,647. Index must be unique within all fields in a Template. (Nested Templates can reuse indexes). Once a Template is defined, the index cannot be changed, because it identifies the field in the actual storage format. Index is a mandatory field, but it is optional for top level fields, and map/array "values" definitions.

name

string

Required. The name of the field.

type

string

Required. The datatype of this field. The following values are supported:

Primitive types:

  • string
  • integer
  • boolean
  • double
  • datetime. Must be of the format RFC3339 UTC "Zulu" (Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z").

Complex types:

  • enum
  • array
  • map
  • record
record_fields[]

MetadataTemplate

Optional. Field definition. You must specify it if the type is record. It defines the nested fields.

enum_values[]

EnumValue

Optional. The list of values for an enum type. You must define it if the type is enum.

map_items

MetadataTemplate

Optional. If the type is map, set map_items. map_items can refer to a primitive field or a complex (record only) field. To specify a primitive field, you only need to set name and type in the nested MetadataTemplate. The recommended value for the name field is item, as this isn't used in the actual payload.

array_items

MetadataTemplate

Optional. If the type is array, set array_items. array_items can refer to a primitive field or a complex (record only) field. To specify a primitive field, you only need to set name and type in the nested MetadataTemplate. The recommended value for the name field is item, as this isn't used in the actual payload.

type_id

string

Optional. You can use type id if this definition of the field needs to be reused later. The type id must be unique across the entire template. You can only specify it if the field type is record.

type_ref

string

Optional. A reference to another field definition (not an inline definition). The value must be equal to the value of an id field defined elsewhere in the MetadataTemplate. Only fields with record type can refer to other fields.

constraints

Constraints

Optional. Specifies the constraints on this field.

annotations

Annotations

Optional. Specifies annotations on this field.

Annotations

Definition of the annotations of a field.

Fields
deprecated

string

Optional. Marks a field as deprecated. You can include a deprecation message.

display_name

string

Optional. Display name for a field.

description

string

Optional. Description for a field.

display_order

int32

Optional. Display order for a field. You can use this to reorder where a field is rendered.

string_type

string

Optional. You can use String Type annotations to specify special meaning to string fields. The following values are supported:

  • richText: The field must be interpreted as a rich text field.
  • url: A fully qualified URL link.
  • resource: A service qualified resource reference.
string_values[]

string

Optional. Suggested hints for string fields. You can use them to suggest values to users through console.

Constraints

Definition of the constraints of a field.

Fields
required

bool

Optional. Marks this field as optional or required.

EnumValue

Definition of Enumvalue, to be used for enum fields.

Fields
index

int32

Required. Index for the enum value. It can't be modified.

name

string

Required. Name of the enumvalue. This is the actual value that the aspect can contain.

deprecated

string

Optional. You can set this message if you need to deprecate an enum value.

Asset

An asset represents a cloud resource that is being managed within a lake as a member of a zone.

Fields
name

string

Output only. The relative resource name of the asset, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/assets/{asset_id}.

display_name

string

Optional. User friendly display name.

uid

string

Output only. System generated globally unique ID for the asset. This ID will be different if the asset is deleted and re-created with the same name.

create_time

Timestamp

Output only. The time when the asset was created.

update_time

Timestamp

Output only. The time when the asset was last updated.

labels

map<string, string>

Optional. User defined labels for the asset.

description

string

Optional. Description of the asset.

state

State

Output only. Current state of the asset.

resource_spec

ResourceSpec

Required. Specification of the resource that is referenced by this asset.

resource_status

ResourceStatus

Output only. Status of the resource referenced by this asset.

security_status

SecurityStatus

Output only. Status of the security policy applied to resource referenced by this asset.

discovery_spec

DiscoverySpec

Optional. Specification of the discovery feature applied to data referenced by this asset. When this spec is left unset, the asset will use the spec set on the parent zone.

discovery_status

DiscoveryStatus

Output only. Status of the discovery feature applied to data referenced by this asset.

DiscoverySpec

Settings to manage the metadata discovery and publishing for an asset.

Fields
enabled

bool

Optional. Whether discovery is enabled.

include_patterns[]

string

Optional. The list of patterns to apply for selecting data to include during discovery if only a subset of the data should considered. For Cloud Storage bucket assets, these are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these are interpreted as patterns to match table names.

exclude_patterns[]

string

Optional. The list of patterns to apply for selecting data to exclude during discovery. For Cloud Storage bucket assets, these are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these are interpreted as patterns to match table names.

csv_options

CsvOptions

Optional. Configuration for CSV data.

json_options

JsonOptions

Optional. Configuration for Json data.

Union field trigger. Determines when discovery is triggered. trigger can be only one of the following:
schedule

string

Optional. Cron schedule (https://en.wikipedia.org/wiki/Cron) for running discovery periodically. Successive discovery runs must be scheduled at least 60 minutes apart. The default value is to run discovery every 60 minutes. To explicitly set a timezone to the cron tab, apply a prefix in the cron tab: "CRON_TZ=${IANA_TIME_ZONE}" or TZ=${IANA_TIME_ZONE}". The ${IANA_TIME_ZONE} may only be a valid string from IANA time zone database. For example, CRON_TZ=America/New_York 1 * * * *, or TZ=America/New_York 1 * * * *.

CsvOptions

Describe CSV and similar semi-structured data formats.

Fields
header_rows

int32

Optional. The number of rows to interpret as header rows that should be skipped when reading data rows.

delimiter

string

Optional. The delimiter being used to separate values. This defaults to ','.

encoding

string

Optional. The character encoding of the data. The default is UTF-8.

disable_type_inference

bool

Optional. Whether to disable the inference of data type for CSV data. If true, all columns will be registered as strings.

JsonOptions

Describe JSON data format.

Fields
encoding

string

Optional. The character encoding of the data. The default is UTF-8.

disable_type_inference

bool

Optional. Whether to disable the inference of data type for Json data. If true, all columns will be registered as their primitive types (strings, number or boolean).

DiscoveryStatus

Status of discovery for an asset.

Fields
state

State

The current status of the discovery feature.

message

string

Additional information about the current state.

update_time

Timestamp

Last update time of the status.

last_run_time

Timestamp

The start time of the last discovery run.

stats

Stats

Data Stats of the asset reported by discovery.

last_run_duration

Duration

The duration of the last discovery run.

State

Current state of discovery.

Enums
STATE_UNSPECIFIED State is unspecified.
SCHEDULED Discovery for the asset is scheduled.
IN_PROGRESS Discovery for the asset is running.
PAUSED Discovery for the asset is currently paused (e.g. due to a lack of available resources). It will be automatically resumed.
DISABLED Discovery for the asset is disabled.

Stats

The aggregated data statistics for the asset reported by discovery.

Fields
data_items

int64

The count of data items within the referenced resource.

data_size

int64

The number of stored data bytes within the referenced resource.

tables

int64

The count of table entities within the referenced resource.

filesets

int64

The count of fileset entities within the referenced resource.

ResourceSpec

Identifies the cloud resource that is referenced by this asset.

Fields
name

string

Immutable. Relative name of the cloud resource that contains the data that is being managed within a lake. For example: projects/{project_number}/buckets/{bucket_id} projects/{project_number}/datasets/{dataset_id}

type

Type

Required. Immutable. Type of resource.

read_access_mode

AccessMode

Optional. Determines how read permissions are handled for each asset and their associated tables. Only available to storage buckets assets.

AccessMode

Access Mode determines how data stored within the resource is read. This is only applicable to storage bucket assets.

Enums
ACCESS_MODE_UNSPECIFIED Access mode unspecified.
DIRECT Default. Data is accessed directly using storage APIs.
MANAGED Data is accessed through a managed interface using BigQuery APIs.

Type

Type of resource.

Enums
TYPE_UNSPECIFIED Type not specified.
STORAGE_BUCKET Cloud Storage bucket.
BIGQUERY_DATASET BigQuery dataset.

ResourceStatus

Status of the resource referenced by an asset.

Fields
state

State

The current state of the managed resource.

message

string

Additional information about the current state.

update_time

Timestamp

Last update time of the status.

managed_access_identity

string

Output only. Service account associated with the BigQuery Connection.

State

The state of a resource.

Enums
STATE_UNSPECIFIED State unspecified.
READY Resource does not have any errors.
ERROR Resource has errors.

SecurityStatus

Security policy status of the asset. Data security policy, i.e., readers, writers & owners, should be specified in the lake/zone/asset IAM policy.

Fields
state

State

The current state of the security policy applied to the attached resource.

message

string

Additional information about the current state.

update_time

Timestamp

Last update time of the status.

State

The state of the security policy.

Enums
STATE_UNSPECIFIED State unspecified.
READY Security policy has been successfully applied to the attached resource.
APPLYING Security policy is in the process of being applied to the attached resource.
ERROR Security policy could not be applied to the attached resource due to errors.

AssetStatus

Aggregated status of the underlying assets of a lake or zone.

Fields
update_time

Timestamp

Last update time of the status.

active_assets

int32

Number of active assets.

security_policy_applying_assets

int32

Number of assets that are in process of updating the security policy on attached resources.

CancelJobRequest

Cancel task jobs.

Fields
name

string

Required. The resource name of the job: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/task/{task_id}/job/{job_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.tasks.cancel

CancelMetadataJobRequest

Cancel metadata job request.

Fields
name

string

Required. The resource name of the job, in the format projects/{project_id_or_number}/locations/{location_id}/metadataJobs/{metadata_job_id}

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.metadataJobs.cancel

Content

Content represents a user-visible notebook or a sql script

Fields
name

string

Output only. The relative resource name of the content, of the form: projects/{project_id}/locations/{location_id}/lakes/{lake_id}/content/{content_id}

uid

string

Output only. System generated globally unique ID for the content. This ID will be different if the content is deleted and re-created with the same name.

path

string

Required. The path for the Content file, represented as directory structure. Unique within a lake. Limited to alphanumerics, hyphens, underscores, dots and slashes.

create_time

Timestamp

Output only. Content creation time.

update_time

Timestamp

Output only. The time when the content was last updated.

labels

map<string, string>

Optional. User defined labels for the content.

description

string

Optional. Description of the content.

Union field data. Only returned in GetContent requests and not in ListContent request. data can be only one of the following:
data_text

string

Required. Content data in string format.

Union field content. Types of content content can be only one of the following:
sql_script

SqlScript

Sql Script related configurations.

notebook

Notebook

Notebook related configurations.

Notebook

Configuration for Notebook content.

Fields
kernel_type

KernelType

Required. Kernel Type of the notebook.

KernelType

Kernel Type of the Jupyter notebook.

Enums
KERNEL_TYPE_UNSPECIFIED Kernel Type unspecified.
PYTHON3 Python 3 Kernel.

SqlScript

Configuration for the Sql Script content.

Fields
engine

QueryEngine

Required. Query Engine to be used for the Sql Query.

QueryEngine

Query Engine Type of the SQL Script.

Enums
QUERY_ENGINE_UNSPECIFIED Value was unspecified.
SPARK Spark SQL Query.

CreateAspectTypeRequest

Create AspectType Request.

Fields
parent

string

Required. The resource name of the AspectType, of the form: projects/{project_number}/locations/{location_id} where location_id refers to a Google Cloud region.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.aspectTypes.create
aspect_type_id

string

Required. AspectType identifier.

aspect_type

AspectType

Required. AspectType Resource.

validate_only

bool

Optional. The service validates the request without performing any mutations. The default is false.

CreateAssetRequest

Create asset request.

Fields
parent

string

Required. The resource name of the parent zone: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.assets.create
asset_id

string

Required. Asset identifier. This ID will be used to generate names such as table names when publishing metadata to Hive Metastore and BigQuery. * Must contain only lowercase letters, numbers and hyphens. * Must start with a letter. * Must end with a number or a letter. * Must be between 1-63 characters. * Must be unique within the zone.

asset

Asset

Required. Asset resource.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreateContentRequest

Create content request.

Fields
parent

string

Required. The resource name of the parent lake: projects/{project_id}/locations/{location_id}/lakes/{lake_id}

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.content.create
content

Content

Required. Content resource.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreateDataAttributeBindingRequest

Create DataAttributeBinding request.

Fields
parent

string

Required. The resource name of the parent data taxonomy projects/{project_number}/locations/{location_id}

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.dataAttributeBindings.create
data_attribute_binding_id

string

Required. DataAttributeBinding identifier. * Must contain only lowercase letters, numbers and hyphens. * Must start with a letter. * Must be between 1-63 characters. * Must end with a number or a letter. * Must be unique within the Location.

data_attribute_binding

DataAttributeBinding

Required. DataAttributeBinding resource.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreateDataAttributeRequest

Create DataAttribute request.

Fields
parent

string

Required. The resource name of the parent data taxonomy projects/{project_number}/locations/{location_id}/dataTaxonomies/{data_taxonomy_id}

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.dataAttributes.create
data_attribute_id

string

Required. DataAttribute identifier. * Must contain only lowercase letters, numbers and hyphens. * Must start with a letter. * Must be between 1-63 characters. * Must end with a number or a letter. * Must be unique within the DataTaxonomy.

data_attribute

DataAttribute

Required. DataAttribute resource.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreateDataScanRequest

Create dataScan request.

Fields
parent

string

Required. The resource name of the parent location: projects/{project}/locations/{location_id} where project refers to a project_id or project_number and location_id refers to a GCP region.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.datascans.create
data_scan

DataScan

Required. DataScan resource.

data_scan_id

string

Required. DataScan identifier.

  • Must contain only lowercase letters, numbers and hyphens.
  • Must start with a letter.
  • Must end with a number or a letter.
  • Must be between 1-63 characters.
  • Must be unique within the customer project / location.
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreateDataTaxonomyRequest

Create DataTaxonomy request.

Fields
parent

string

Required. The resource name of the data taxonomy location, of the form: projects/{project_number}/locations/{location_id} where location_id refers to a GCP region.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.dataTaxonomies.create
data_taxonomy_id

string

Required. DataTaxonomy identifier. * Must contain only lowercase letters, numbers and hyphens. * Must start with a letter. * Must be between 1-63 characters. * Must end with a number or a letter. * Must be unique within the Project.

data_taxonomy

DataTaxonomy

Required. DataTaxonomy resource.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreateEntityRequest

Create a metadata entity request.

Fields
parent

string

Required. The resource name of the parent zone: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.entities.create
entity

Entity

Required. Entity resource.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreateEntryGroupRequest

Create EntryGroup Request.

Fields
parent

string

Required. The resource name of the entryGroup, of the form: projects/{project_number}/locations/{location_id} where location_id refers to a GCP region.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.entryGroups.create
entry_group_id

string

Required. EntryGroup identifier.

entry_group

EntryGroup

Required. EntryGroup Resource.

validate_only

bool

Optional. The service validates the request without performing any mutations. The default is false.

CreateEntryRequest

Create Entry request.

Fields
parent

string

Required. The resource name of the parent Entry Group: projects/{project}/locations/{location}/entryGroups/{entry_group}.

entry_id

string

Required. Entry identifier. It has to be unique within an Entry Group.

Entries corresponding to Google Cloud resources use an Entry ID format based on full resource names. The format is a full resource name of the resource without the prefix double slashes in the API service name part of the full resource name. This allows retrieval of entries using their associated resource name.

For example, if the full resource name of a resource is //library.googleapis.com/shelves/shelf1/books/book2, then the suggested entry_id is library.googleapis.com/shelves/shelf1/books/book2.

It is also suggested to follow the same convention for entries corresponding to resources from providers or systems other than Google Cloud.

The maximum size of the field is 4000 characters.

entry

Entry

Required. Entry resource.

CreateEntryTypeRequest

Create EntryType Request.

Fields
parent

string

Required. The resource name of the EntryType, of the form: projects/{project_number}/locations/{location_id} where location_id refers to a Google Cloud region.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.entryTypes.create
entry_type_id

string

Required. EntryType identifier.

entry_type

EntryType

Required. EntryType Resource.

validate_only

bool

Optional. The service validates the request without performing any mutations. The default is false.

CreateEnvironmentRequest

Create environment request.

Fields
parent

string

Required. The resource name of the parent lake: projects/{project_id}/locations/{location_id}/lakes/{lake_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.environments.create
environment_id

string

Required. Environment identifier. * Must contain only lowercase letters, numbers and hyphens. * Must start with a letter. * Must be between 1-63 characters. * Must end with a number or a letter. * Must be unique within the lake.

environment

Environment

Required. Environment resource.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreateLakeRequest

Create lake request.

Fields
parent

string

Required. The resource name of the lake location, of the form: projects/{project_number}/locations/{location_id} where location_id refers to a GCP region.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.lakes.create
lake_id

string

Required. Lake identifier. This ID will be used to generate names such as database and dataset names when publishing metadata to Hive Metastore and BigQuery. * Must contain only lowercase letters, numbers and hyphens. * Must start with a letter. * Must end with a number or a letter. * Must be between 1-63 characters. * Must be unique within the customer project / location.

lake

Lake

Required. Lake resource

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreateMetadataJobRequest

Create metadata job request.

Fields
parent

string

Required. The resource name of the parent location, in the format projects/{project_id_or_number}/locations/{location_id}

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.metadataJobs.create
metadata_job

MetadataJob

Required. The metadata job resource.

metadata_job_id

string

Optional. The metadata job ID. If not provided, a unique ID is generated with the prefix metadata-job-.

CreatePartitionRequest

Create metadata partition request.

Fields
parent

string

Required. The resource name of the parent zone: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/entities/{entity_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.partitions.create
partition

Partition

Required. Partition resource.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreateTaskRequest

Create task request.

Fields
parent

string

Required. The resource name of the parent lake: projects/{project_number}/locations/{location_id}/lakes/{lake_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.tasks.create
task_id

string

Required. Task identifier.

task

Task

Required. Task resource.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

CreateZoneRequest

Create zone request.

Fields
parent

string

Required. The resource name of the parent lake: projects/{project_number}/locations/{location_id}/lakes/{lake_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.zones.create
zone_id

string

Required. Zone identifier. This ID will be used to generate names such as database and dataset names when publishing metadata to Hive Metastore and BigQuery. * Must contain only lowercase letters, numbers and hyphens. * Must start with a letter. * Must end with a number or a letter. * Must be between 1-63 characters. * Must be unique across all lakes from all locations in a project. * Must not be one of the reserved IDs (i.e. "default", "global-temp")

zone

Zone

Required. Zone resource.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

DataAccessSpec

DataAccessSpec holds the access control configuration to be enforced on data stored within resources (eg: rows, columns in BigQuery Tables). When associated with data, the data is only accessible to principals explicitly granted access through the DataAccessSpec. Principals with access to the containing resource are not implicitly granted access.

Fields
readers[]

string

Optional. The format of strings follows the pattern followed by IAM in the bindings. user:{email}, serviceAccount:{email} group:{email}. The set of principals to be granted reader role on data stored within resources.

DataAttribute

Denotes one dataAttribute in a dataTaxonomy, for example, PII. DataAttribute resources can be defined in a hierarchy. A single dataAttribute resource can contain specs of multiple types

PII
  - ResourceAccessSpec :
                - readers :foo@bar.com
  - DataAccessSpec :
                - readers :bar@foo.com
Fields
name

string

Output only. The relative resource name of the dataAttribute, of the form: projects/{project_number}/locations/{location_id}/dataTaxonomies/{dataTaxonomy}/attributes/{data_attribute_id}.

uid

string

Output only. System generated globally unique ID for the DataAttribute. This ID will be different if the DataAttribute is deleted and re-created with the same name.

create_time

Timestamp

Output only. The time when the DataAttribute was created.

update_time

Timestamp

Output only. The time when the DataAttribute was last updated.

description

string

Optional. Description of the DataAttribute.

display_name

string

Optional. User friendly display name.

labels

map<string, string>

Optional. User-defined labels for the DataAttribute.

parent_id

string

Optional. The ID of the parent DataAttribute resource, should belong to the same data taxonomy. Circular dependency in parent chain is not valid. Maximum depth of the hierarchy allowed is 4. [a -> b -> c -> d -> e, depth = 4]

attribute_count

int32

Output only. The number of child attributes present for this attribute.

etag

string

This checksum is computed by the server based on the value of other fields, and may be sent on update and delete requests to ensure the client has an up-to-date value before proceeding.

resource_access_spec

ResourceAccessSpec

Optional. Specified when applied to a resource (eg: Cloud Storage bucket, BigQuery dataset, BigQuery table).

data_access_spec

DataAccessSpec

Optional. Specified when applied to data stored on the resource (eg: rows, columns in BigQuery Tables).

DataAttributeBinding

DataAttributeBinding represents binding of attributes to resources. Eg: Bind 'CustomerInfo' entity with 'PII' attribute.

Fields
name

string

Output only. The relative resource name of the Data Attribute Binding, of the form: projects/{project_number}/locations/{location}/dataAttributeBindings/{data_attribute_binding_id}

uid

string

Output only. System generated globally unique ID for the DataAttributeBinding. This ID will be different if the DataAttributeBinding is deleted and re-created with the same name.

create_time

Timestamp

Output only. The time when the DataAttributeBinding was created.

update_time

Timestamp

Output only. The time when the DataAttributeBinding was last updated.

description

string

Optional. Description of the DataAttributeBinding.

display_name

string

Optional. User friendly display name.

labels

map<string, string>

Optional. User-defined labels for the DataAttributeBinding.

etag

string

This checksum is computed by the server based on the value of other fields, and may be sent on update and delete requests to ensure the client has an up-to-date value before proceeding. Etags must be used when calling the DeleteDataAttributeBinding and the UpdateDataAttributeBinding method.

attributes[]

string

Optional. List of attributes to be associated with the resource, provided in the form: projects/{project}/locations/{location}/dataTaxonomies/{dataTaxonomy}/attributes/{data_attribute_id}

paths[]

Path

Optional. The list of paths for items within the associated resource (eg. columns and partitions within a table) along with attribute bindings.

Union field resource_reference. The reference to the resource that is associated to attributes, or the query to match resources and associate attributes. resource_reference can be only one of the following:
resource

string

Optional. Immutable. The resource name of the resource that is associated to attributes. Presently, only entity resource is supported in the form: projects/{project}/locations/{location}/lakes/{lake}/zones/{zone}/entities/{entity_id} Must belong in the same project and region as the attribute binding, and there can only exist one active binding for a resource.

Path

Represents a subresource of the given resource, and associated bindings with it. Currently supported subresources are column and partition schema fields within a table.

Fields
name

string

Required. The name identifier of the path. Nested columns should be of the form: 'address.city'.

attributes[]

string

Optional. List of attributes to be associated with the path of the resource, provided in the form: projects/{project}/locations/{location}/dataTaxonomies/{dataTaxonomy}/attributes/{data_attribute_id}

DataDiscoveryResult

The output of a data discovery scan.

Fields
bigquery_publishing

BigQueryPublishing

Output only. Configuration for metadata publishing.

BigQueryPublishing

Describes BigQuery publishing configurations.

Fields
dataset

string

Output only. The BigQuery dataset to publish to. It takes the form projects/{project_id}/datasets/{dataset_id}. If not set, the service creates a default publishing dataset.

DataDiscoverySpec

Spec for a data discovery scan.

Fields
bigquery_publishing_config

BigQueryPublishingConfig

Optional. Configuration for metadata publishing.

Union field resource_config. The configurations of the data discovery scan resource. resource_config can be only one of the following:
storage_config

StorageConfig

Cloud Storage related configurations.

BigQueryPublishingConfig

Describes BigQuery publishing configurations.

Fields
table_type

TableType

Optional. Determines whether to publish discovered tables as BigLake external tables or non-BigLake external tables.

connection

string

Optional. The BigQuery connection used to create BigLake tables. Must be in the form projects/{project_id}/locations/{location_id}/connections/{connection_id}

TableType

Determines how discovered tables are published.

Enums
TABLE_TYPE_UNSPECIFIED Table type unspecified.
EXTERNAL Default. Discovered tables are published as BigQuery external tables whose data is accessed using the credentials of the user querying the table.
BIGLAKE Discovered tables are published as BigLake external tables whose data is accessed using the credentials of the associated BigQuery connection.

StorageConfig

Configurations related to Cloud Storage as the data source.

Fields
include_patterns[]

string

Optional. Defines the data to include during discovery when only a subset of the data should be considered. Provide a list of patterns that identify the data to include. For Cloud Storage bucket assets, these patterns are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these patterns are interpreted as patterns to match table names.

exclude_patterns[]

string

Optional. Defines the data to exclude during discovery. Provide a list of patterns that identify the data to exclude. For Cloud Storage bucket assets, these patterns are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these patterns are interpreted as patterns to match table names.

csv_options

CsvOptions

Optional. Configuration for CSV data.

json_options

JsonOptions

Optional. Configuration for JSON data.

CsvOptions

Describes CSV and similar semi-structured data formats.

Fields
header_rows

int32

Optional. The number of rows to interpret as header rows that should be skipped when reading data rows.

delimiter

string

Optional. The delimiter that is used to separate values. The default is , (comma).

encoding

string

Optional. The character encoding of the data. The default is UTF-8.

type_inference_disabled

bool

Optional. Whether to disable the inference of data types for CSV data. If true, all columns are registered as strings.

quote

string

Optional. The character used to quote column values. Accepts " (double quotation mark) or ' (single quotation mark). If unspecified, defaults to " (double quotation mark).

JsonOptions

Describes JSON data format.

Fields
encoding

string

Optional. The character encoding of the data. The default is UTF-8.

type_inference_disabled

bool

Optional. Whether to disable the inference of data types for JSON data. If true, all columns are registered as their primitive types (strings, number, or boolean).

DataProfileResult

DataProfileResult defines the output of DataProfileScan. Each field of the table will have field type specific profile result.

Fields
row_count

int64

The count of rows scanned.

profile

Profile

The profile information per field.

scanned_data

ScannedData

The data scanned for this result.

post_scan_actions_result

PostScanActionsResult

Output only. The result of post scan actions.

PostScanActionsResult

The result of post scan actions of DataProfileScan job.

Fields
bigquery_export_result

BigQueryExportResult

Output only. The result of BigQuery export post scan action.

BigQueryExportResult

The result of BigQuery export post scan action.

Fields
state

State

Output only. Execution state for the BigQuery exporting.

message

string

Output only. Additional information about the BigQuery exporting.

State

Execution state for the exporting.

Enums
STATE_UNSPECIFIED The exporting state is unspecified.
SUCCEEDED The exporting completed successfully.
FAILED The exporting is no longer running due to an error.
SKIPPED The exporting is skipped due to no valid scan result to export (usually caused by scan failed).

Profile

Contains name, type, mode and field type specific profile information.

Fields
fields[]

Field

List of fields with structural and profile information for each field.

Field

A field within a table.

Fields
name

string

The name of the field.

type

string

The data type retrieved from the schema of the data source. For instance, for a BigQuery native table, it is the BigQuery Table Schema. For a Dataplex Entity, it is the Entity Schema.

mode

string

The mode of the field. Possible values include:

  • REQUIRED, if it is a required field.
  • NULLABLE, if it is an optional field.
  • REPEATED, if it is a repeated field.
profile

ProfileInfo

Profile information for the corresponding field.

ProfileInfo

The profile information for each field type.

Fields
null_ratio

double

Ratio of rows with null value against total scanned rows.

distinct_ratio

double

Ratio of rows with distinct values against total scanned rows. Not available for complex non-groupable field type, including RECORD, ARRAY, GEOGRAPHY, and JSON, as well as fields with REPEATABLE mode.

top_n_values[]

TopNValue

The list of top N non-null values, frequency and ratio with which they occur in the scanned data. N is 10 or equal to the number of distinct values in the field, whichever is smaller. Not available for complex non-groupable field type, including RECORD, ARRAY, GEOGRAPHY, and JSON, as well as fields with REPEATABLE mode.

Union field field_info. Structural and profile information for specific field type. Not available, if mode is REPEATABLE. field_info can be only one of the following:
string_profile

StringFieldInfo

String type field information.

integer_profile

IntegerFieldInfo

Integer type field information.

double_profile

DoubleFieldInfo

Double type field information.

DoubleFieldInfo

The profile information for a double type field.

Fields
average

double

Average of non-null values in the scanned data. NaN, if the field has a NaN.

standard_deviation

double

Standard deviation of non-null values in the scanned data. NaN, if the field has a NaN.

min

double

Minimum of non-null values in the scanned data. NaN, if the field has a NaN.

quartiles[]

double

A quartile divides the number of data points into four parts, or quarters, of more-or-less equal size. Three main quartiles used are: The first quartile (Q1) splits off the lowest 25% of data from the highest 75%. It is also known as the lower or 25th empirical quartile, as 25% of the data is below this point. The second quartile (Q2) is the median of a data set. So, 50% of the data lies below this point. The third quartile (Q3) splits off the highest 25% of data from the lowest 75%. It is known as the upper or 75th empirical quartile, as 75% of the data lies below this point. Here, the quartiles is provided as an ordered list of quartile values for the scanned data, occurring in order Q1, median, Q3.

max

double

Maximum of non-null values in the scanned data. NaN, if the field has a NaN.

IntegerFieldInfo

The profile information for an integer type field.

Fields
average

double

Average of non-null values in the scanned data. NaN, if the field has a NaN.

standard_deviation

double

Standard deviation of non-null values in the scanned data. NaN, if the field has a NaN.

min

int64

Minimum of non-null values in the scanned data. NaN, if the field has a NaN.

quartiles[]

int64

A quartile divides the number of data points into four parts, or quarters, of more-or-less equal size. Three main quartiles used are: The first quartile (Q1) splits off the lowest 25% of data from the highest 75%. It is also known as the lower or 25th empirical quartile, as 25% of the data is below this point. The second quartile (Q2) is the median of a data set. So, 50% of the data lies below this point. The third quartile (Q3) splits off the highest 25% of data from the lowest 75%. It is known as the upper or 75th empirical quartile, as 75% of the data lies below this point. Here, the quartiles is provided as an ordered list of approximate quartile values for the scanned data, occurring in order Q1, median, Q3.

max

int64

Maximum of non-null values in the scanned data. NaN, if the field has a NaN.

StringFieldInfo

The profile information for a string type field.

Fields
min_length

int64

Minimum length of non-null values in the scanned data.

max_length

int64

Maximum length of non-null values in the scanned data.

average_length

double

Average length of non-null values in the scanned data.

TopNValue

Top N non-null values in the scanned data.

Fields
value

string

String value of a top N non-null value.

count

int64

Count of the corresponding value in the scanned data.

ratio

double

Ratio of the corresponding value in the field against the total number of rows in the scanned data.

DataProfileSpec

DataProfileScan related setting.

Fields
sampling_percent

float

Optional. The percentage of the records to be selected from the dataset for DataScan.

  • Value can range between 0.0 and 100.0 with up to 3 significant decimal digits.
  • Sampling is not applied if sampling_percent is not specified, 0 or 100.
row_filter

string

Optional. A filter applied to all rows in a single DataScan job. The filter needs to be a valid SQL expression for a WHERE clause in BigQuery standard SQL syntax. Example: col1 >= 0 AND col2 < 10

post_scan_actions

PostScanActions

Optional. Actions to take upon job completion..

include_fields

SelectedFields

Optional. The fields to include in data profile.

If not specified, all fields at the time of profile scan job execution are included, except for ones listed in exclude_fields.

exclude_fields

SelectedFields

Optional. The fields to exclude from data profile.

If specified, the fields will be excluded from data profile, regardless of include_fields value.

PostScanActions

The configuration of post scan actions of DataProfileScan job.

Fields
bigquery_export

BigQueryExport

Optional. If set, results will be exported to the provided BigQuery table.

BigQueryExport

The configuration of BigQuery export post scan action.

Fields
results_table

string

Optional. The BigQuery table to export DataProfileScan results to. Format: //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID

SelectedFields

The specification for fields to include or exclude in data profile scan.

Fields
field_names[]

string

Optional. Expected input is a list of fully qualified names of fields as in the schema.

Only top-level field names for nested fields are supported. For instance, if 'x' is of nested field type, listing 'x' is supported but 'x.y.z' is not supported. Here 'y' and 'y.z' are nested fields of 'x'.

DataQualityColumnResult

DataQualityColumnResult provides a more detailed, per-column view of the results.

Fields
column

string

Output only. The column specified in the DataQualityRule.

score

float

Output only. The column-level data quality score for this data scan job if and only if the 'column' field is set.

The score ranges between between [0, 100] (up to two decimal points).

DataQualityDimension

A dimension captures data quality intent about a defined subset of the rules specified.

Fields
name

string

The dimension name a rule belongs to. Supported dimensions are ["COMPLETENESS", "ACCURACY", "CONSISTENCY", "VALIDITY", "UNIQUENESS", "FRESHNESS", "VOLUME"]

DataQualityDimensionResult

DataQualityDimensionResult provides a more detailed, per-dimension view of the results.

Fields
dimension

DataQualityDimension

Output only. The dimension config specified in the DataQualitySpec, as is.

passed

bool

Whether the dimension passed or failed.

score

float

Output only. The dimension-level data quality score for this data scan job if and only if the 'dimension' field is set.

The score ranges between [0, 100] (up to two decimal points).

DataQualityResult

The output of a DataQualityScan.

Fields
passed

bool

Overall data quality result -- true if all rules passed.

dimensions[]

DataQualityDimensionResult

A list of results at the dimension level.

A dimension will have a corresponding DataQualityDimensionResult if and only if there is at least one rule with the 'dimension' field set to it.

columns[]

DataQualityColumnResult

Output only. A list of results at the column level.

A column will have a corresponding DataQualityColumnResult if and only if there is at least one rule with the 'column' field set to it.

rules[]

DataQualityRuleResult

A list of all the rules in a job, and their results.

row_count

int64

The count of rows processed.

scanned_data

ScannedData

The data scanned for this result.

post_scan_actions_result

PostScanActionsResult

Output only. The result of post scan actions.

score

float

Output only. The overall data quality score.

The score ranges between [0, 100] (up to two decimal points).

PostScanActionsResult

The result of post scan actions of DataQualityScan job.

Fields
bigquery_export_result

BigQueryExportResult

Output only. The result of BigQuery export post scan action.

BigQueryExportResult

The result of BigQuery export post scan action.

Fields
state

State

Output only. Execution state for the BigQuery exporting.

message

string

Output only. Additional information about the BigQuery exporting.

State

Execution state for the exporting.

Enums
STATE_UNSPECIFIED The exporting state is unspecified.
SUCCEEDED The exporting completed successfully.
FAILED The exporting is no longer running due to an error.
SKIPPED The exporting is skipped due to no valid scan result to export (usually caused by scan failed).

DataQualityRule

A rule captures data quality intent about a data source.

Fields
column

string

Optional. The unnested column which this rule is evaluated against.

ignore_null

bool

Optional. Rows with null values will automatically fail a rule, unless ignore_null is true. In that case, such null rows are trivially considered passing.

This field is only valid for the following type of rules:

  • RangeExpectation
  • RegexExpectation
  • SetExpectation
  • UniquenessExpectation
dimension

string

Required. The dimension a rule belongs to. Results are also aggregated at the dimension level. Supported dimensions are ["COMPLETENESS", "ACCURACY", "CONSISTENCY", "VALIDITY", "UNIQUENESS", "FRESHNESS", "VOLUME"]

threshold

double

Optional. The minimum ratio of passing_rows / total_rows required to pass this rule, with a range of [0.0, 1.0].

0 indicates default value (i.e. 1.0).

This field is only valid for row-level type rules.

name

string

Optional. A mutable name for the rule.

  • The name must contain only letters (a-z, A-Z), numbers (0-9), or hyphens (-).
  • The maximum length is 63 characters.
  • Must start with a letter.
  • Must end with a number or a letter.
description

string

Optional. Description of the rule.

  • The maximum length is 1,024 characters.
suspended

bool

Optional. Whether the Rule is active or suspended. Default is false.

Union field rule_type. The rule-specific configuration. rule_type can be only one of the following:
range_expectation

RangeExpectation

Row-level rule which evaluates whether each column value lies between a specified range.

non_null_expectation

NonNullExpectation

Row-level rule which evaluates whether each column value is null.

set_expectation

SetExpectation

Row-level rule which evaluates whether each column value is contained by a specified set.

regex_expectation

RegexExpectation

Row-level rule which evaluates whether each column value matches a specified regex.

uniqueness_expectation

UniquenessExpectation

Row-level rule which evaluates whether each column value is unique.

statistic_range_expectation

StatisticRangeExpectation

Aggregate rule which evaluates whether the column aggregate statistic lies between a specified range.

row_condition_expectation

RowConditionExpectation

Row-level rule which evaluates whether each row in a table passes the specified condition.

table_condition_expectation

TableConditionExpectation

Aggregate rule which evaluates whether the provided expression is true for a table.

sql_assertion

SqlAssertion

Aggregate rule which evaluates the number of rows returned for the provided statement. If any rows are returned, this rule fails.

NonNullExpectation

This type has no fields.

Evaluates whether each column value is null.

RangeExpectation

Evaluates whether each column value lies between a specified range.

Fields
min_value

string

Optional. The minimum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.

max_value

string

Optional. The maximum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.

strict_min_enabled

bool

Optional. Whether each value needs to be strictly greater than ('>') the minimum, or if equality is allowed.

Only relevant if a min_value has been defined. Default = false.

strict_max_enabled

bool

Optional. Whether each value needs to be strictly lesser than ('<') the maximum, or if equality is allowed.

Only relevant if a max_value has been defined. Default = false.

RegexExpectation

Evaluates whether each column value matches a specified regex.

Fields
regex

string

Optional. A regular expression the column value is expected to match.

RowConditionExpectation

Evaluates whether each row passes the specified condition.

The SQL expression needs to use BigQuery standard SQL syntax and should produce a boolean value per row as the result.

Example: col1 >= 0 AND col2 < 10

Fields
sql_expression

string

Optional. The SQL expression.

SetExpectation

Evaluates whether each column value is contained by a specified set.

Fields
values[]

string

Optional. Expected values for the column value.

SqlAssertion

A SQL statement that is evaluated to return rows that match an invalid state. If any rows are are returned, this rule fails.

The SQL statement must use BigQuery standard SQL syntax, and must not contain any semicolons.

You can use the data reference parameter ${data()} to reference the source table with all of its precondition filters applied. Examples of precondition filters include row filters, incremental data filters, and sampling. For more information, see Data reference parameter.

Example: SELECT * FROM ${data()} WHERE price < 0

Fields
sql_statement

string

Optional. The SQL statement.

StatisticRangeExpectation

Evaluates whether the column aggregate statistic lies between a specified range.

Fields
statistic

ColumnStatistic

Optional. The aggregate metric to evaluate.

min_value

string

Optional. The minimum column statistic value allowed for a row to pass this validation.

At least one of min_value and max_value need to be provided.

max_value

string

Optional. The maximum column statistic value allowed for a row to pass this validation.

At least one of min_value and max_value need to be provided.

strict_min_enabled

bool

Optional. Whether column statistic needs to be strictly greater than ('>') the minimum, or if equality is allowed.

Only relevant if a min_value has been defined. Default = false.

strict_max_enabled

bool

Optional. Whether column statistic needs to be strictly lesser than ('<') the maximum, or if equality is allowed.

Only relevant if a max_value has been defined. Default = false.

ColumnStatistic

The list of aggregate metrics a rule can be evaluated against.

Enums
STATISTIC_UNDEFINED Unspecified statistic type
MEAN Evaluate the column mean
MIN Evaluate the column min
MAX Evaluate the column max

TableConditionExpectation

Evaluates whether the provided expression is true.

The SQL expression needs to use BigQuery standard SQL syntax and should produce a scalar boolean result.

Example: MIN(col1) >= 0

Fields
sql_expression

string

Optional. The SQL expression.

UniquenessExpectation

This type has no fields.

Evaluates whether the column has duplicates.

DataQualityRuleResult

DataQualityRuleResult provides a more detailed, per-rule view of the results.

Fields
rule

DataQualityRule

The rule specified in the DataQualitySpec, as is.

passed

bool

Whether the rule passed or failed.

evaluated_count

int64

The number of rows a rule was evaluated against.

This field is only valid for row-level type rules.

Evaluated count can be configured to either

  • include all rows (default) - with null rows automatically failing rule evaluation, or
  • exclude null rows from the evaluated_count, by setting ignore_nulls = true.
passed_count

int64

The number of rows which passed a rule evaluation.

This field is only valid for row-level type rules.

null_count

int64

The number of rows with null values in the specified column.

pass_ratio

double

The ratio of passed_count / evaluated_count.

This field is only valid for row-level type rules.

failing_rows_query

string

The query to find rows that did not pass this rule.

This field is only valid for row-level type rules.

assertion_row_count

int64

Output only. The number of rows returned by the SQL statement in a SQL assertion rule.

This field is only valid for SQL assertion rules.

DataQualityScanRuleResult

Information about the result of a data quality rule for data quality scan. The monitored resource is 'DataScan'.

Fields
job_id

string

Identifier of the specific data scan job this log entry is for.

data_source

string

The data source of the data scan (e.g. BigQuery table name).

column

string

The column which this rule is evaluated against.

rule_name

string

The name of the data quality rule.

rule_type

RuleType

The type of the data quality rule.

evalution_type

EvaluationType

The evaluation type of the data quality rule.

rule_dimension

string

The dimension of the data quality rule.

threshold_percent

double

The passing threshold ([0.0, 100.0]) of the data quality rule.

result

Result

The result of the data quality rule.

evaluated_row_count

int64

The number of rows evaluated against the data quality rule. This field is only valid for rules of PER_ROW evaluation type.

passed_row_count

int64

The number of rows which passed a rule evaluation. This field is only valid for rules of PER_ROW evaluation type.

null_row_count

int64

The number of rows with null values in the specified column.

assertion_row_count

int64

The number of rows returned by the SQL statement in a SQL assertion rule. This field is only valid for SQL assertion rules.

EvaluationType

The evaluation type of the data quality rule.

Enums
EVALUATION_TYPE_UNSPECIFIED An unspecified evaluation type.
PER_ROW The rule evaluation is done at per row level.
AGGREGATE The rule evaluation is done for an aggregate of rows.

Result

Whether the data quality rule passed or failed.

Enums
RESULT_UNSPECIFIED An unspecified result.
PASSED The data quality rule passed.
FAILED The data quality rule failed.

RuleType

The type of the data quality rule.

Enums
RULE_TYPE_UNSPECIFIED An unspecified rule type.
NON_NULL_EXPECTATION See DataQualityRule.NonNullExpectation.
RANGE_EXPECTATION See DataQualityRule.RangeExpectation.
REGEX_EXPECTATION See DataQualityRule.RegexExpectation.
ROW_CONDITION_EXPECTATION See DataQualityRule.RowConditionExpectation.
SET_EXPECTATION See DataQualityRule.SetExpectation.
STATISTIC_RANGE_EXPECTATION See DataQualityRule.StatisticRangeExpectation.
TABLE_CONDITION_EXPECTATION See DataQualityRule.TableConditionExpectation.
UNIQUENESS_EXPECTATION See DataQualityRule.UniquenessExpectation.
SQL_ASSERTION See DataQualityRule.SqlAssertion.

DataQualitySpec

DataQualityScan related setting.

Fields
rules[]

DataQualityRule

Required. The list of rules to evaluate against a data source. At least one rule is required.

sampling_percent

float

Optional. The percentage of the records to be selected from the dataset for DataScan.

  • Value can range between 0.0 and 100.0 with up to 3 significant decimal digits.
  • Sampling is not applied if sampling_percent is not specified, 0 or 100.
row_filter

string

Optional. A filter applied to all rows in a single DataScan job. The filter needs to be a valid SQL expression for a WHERE clause in BigQuery standard SQL syntax. Example: col1 >= 0 AND col2 < 10

post_scan_actions

PostScanActions

Optional. Actions to take upon job completion.

PostScanActions

The configuration of post scan actions of DataQualityScan.

Fields
bigquery_export

BigQueryExport

Optional. If set, results will be exported to the provided BigQuery table.

notification_report

NotificationReport

Optional. If set, results will be sent to the provided notification receipts upon triggers.

BigQueryExport

The configuration of BigQuery export post scan action.

Fields
results_table

string

Optional. The BigQuery table to export DataQualityScan results to. Format: //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID

JobEndTrigger

This type has no fields.

This trigger is triggered whenever a scan job run ends, regardless of the result.

JobFailureTrigger

This type has no fields.

This trigger is triggered when the scan job itself fails, regardless of the result.

NotificationReport

The configuration of notification report post scan action.

Fields
recipients

Recipients

Required. The recipients who will receive the notification report.

score_threshold_trigger

ScoreThresholdTrigger

Optional. If set, report will be sent when score threshold is met.

job_failure_trigger

JobFailureTrigger

Optional. If set, report will be sent when a scan job fails.

job_end_trigger

JobEndTrigger

Optional. If set, report will be sent when a scan job ends.

Recipients

The individuals or groups who are designated to receive notifications upon triggers.

Fields
emails[]

string

Optional. The email recipients who will receive the DataQualityScan results report.

ScoreThresholdTrigger

This trigger is triggered when the DQ score in the job result is less than a specified input score.

Fields
score_threshold

float

Optional. The score range is in [0,100].

DataScan

Represents a user-visible job which provides the insights for the related data source.

For example:

  • Data Quality: generates queries based on the rules and runs against the data to get data quality check results.
  • Data Profile: analyzes the data in table(s) and generates insights about the structure, content and relationships (such as null percent, cardinality, min/max/mean, etc).
Fields
name

string

Output only. The relative resource name of the scan, of the form: projects/{project}/locations/{location_id}/dataScans/{datascan_id}, where project refers to a project_id or project_number and location_id refers to a GCP region.

uid

string

Output only. System generated globally unique ID for the scan. This ID will be different if the scan is deleted and re-created with the same name.

description

string

Optional. Description of the scan.

  • Must be between 1-1024 characters.
display_name

string

Optional. User friendly display name.

  • Must be between 1-256 characters.
labels

map<string, string>

Optional. User-defined labels for the scan.

state

State

Output only. Current state of the DataScan.

create_time

Timestamp

Output only. The time when the scan was created.

update_time

Timestamp

Output only. The time when the scan was last updated.

data

DataSource

Required. The data source for DataScan.

execution_spec

ExecutionSpec

Optional. DataScan execution settings.

If not specified, the fields in it will use their default values.

execution_status

ExecutionStatus

Output only. Status of the data scan execution.

type

DataScanType

Output only. The type of DataScan.

Union field spec. Data scan related setting. The settings are required and immutable. After you configure the settings for one type of data scan, you can't change the data scan to a different type of data scan. spec can be only one of the following:
data_quality_spec

DataQualitySpec

Settings for a data quality scan.

data_profile_spec

DataProfileSpec

Settings for a data profile scan.

data_discovery_spec

DataDiscoverySpec

Settings for a data discovery scan.

Union field result. The result of the data scan. result can be only one of the following:
data_quality_result

DataQualityResult

Output only. The result of a data quality scan.

data_profile_result

DataProfileResult

Output only. The result of a data profile scan.

data_discovery_result

DataDiscoveryResult

Output only. The result of a data discovery scan.

ExecutionSpec

DataScan execution settings.

Fields
trigger

Trigger

Optional. Spec related to how often and when a scan should be triggered.

If not specified, the default is OnDemand, which means the scan will not run until the user calls RunDataScan API.

Union field incremental. Spec related to incremental scan of the data

When an option is selected for incremental scan, it cannot be unset or changed. If not specified, a data scan will run for all data in the table. incremental can be only one of the following:

field

string

Immutable. The unnested field (of type Date or Timestamp) that contains values which monotonically increase over time.

If not specified, a data scan will run for all data in the table.

ExecutionStatus

Status of the data scan execution.

Fields
latest_job_start_time

Timestamp

The time when the latest DataScanJob started.

latest_job_end_time

Timestamp

The time when the latest DataScanJob ended.

latest_job_create_time

Timestamp

Optional. The time when the DataScanJob execution was created.

DataScanEvent

These messages contain information about the execution of a datascan. The monitored resource is 'DataScan' Next ID: 13

Fields
data_source

string

The data source of the data scan

job_id

string

The identifier of the specific data scan job this log entry is for.

create_time

Timestamp

The time when the data scan job was created.

start_time

Timestamp

The time when the data scan job started to run.

end_time

Timestamp

The time when the data scan job finished.

type

ScanType

The type of the data scan.

state

State

The status of the data scan job.

message

string

The message describing the data scan job event.

spec_version

string

A version identifier of the spec which was used to execute this job.

trigger

Trigger

The trigger type of the data scan job.

scope

Scope

The scope of the data scan (e.g. full, incremental).

post_scan_actions_result

PostScanActionsResult

The result of post scan actions.

Union field result. The result of the data scan job. result can be only one of the following:
data_profile

DataProfileResult

Data profile result for data profile type data scan.

data_quality

DataQualityResult

Data quality result for data quality type data scan.

Union field appliedConfigs. The applied configs in the data scan job. appliedConfigs can be only one of the following:
data_profile_configs

DataProfileAppliedConfigs

Applied configs for data profile type data scan.

data_quality_configs

DataQualityAppliedConfigs

Applied configs for data quality type data scan.

DataProfileAppliedConfigs

Applied configs for data profile type data scan job.

Fields
sampling_percent

float

The percentage of the records selected from the dataset for DataScan.

  • Value ranges between 0.0 and 100.0.
  • Value 0.0 or 100.0 imply that sampling was not applied.
row_filter_applied

bool

Boolean indicating whether a row filter was applied in the DataScan job.

column_filter_applied

bool

Boolean indicating whether a column filter was applied in the DataScan job.

DataProfileResult

Data profile result for data scan job.

Fields
row_count

int64

The count of rows processed in the data scan job.

DataQualityAppliedConfigs

Applied configs for data quality type data scan job.

Fields
sampling_percent

float

The percentage of the records selected from the dataset for DataScan.

  • Value ranges between 0.0 and 100.0.
  • Value 0.0 or 100.0 imply that sampling was not applied.
row_filter_applied

bool

Boolean indicating whether a row filter was applied in the DataScan job.

DataQualityResult

Data quality result for data scan job.

Fields
row_count

int64

The count of rows processed in the data scan job.

passed

bool

Whether the data quality result was pass or not.

dimension_passed

map<string, bool>

The result of each dimension for data quality result. The key of the map is the name of the dimension. The value is the bool value depicting whether the dimension result was pass or not.

score

float

The table-level data quality score for the data scan job.

The data quality score ranges between [0, 100] (up to two decimal points).

dimension_score

map<string, float>

The score of each dimension for data quality result. The key of the map is the name of the dimension. The value is the data quality score for the dimension.

The score ranges between [0, 100] (up to two decimal points).

column_score

map<string, float>

The score of each column scanned in the data scan job. The key of the map is the name of the column. The value is the data quality score for the column.

The score ranges between [0, 100] (up to two decimal points).

PostScanActionsResult

Post scan actions result for data scan job.

Fields
bigquery_export_result

BigQueryExportResult

The result of BigQuery export post scan action.

BigQueryExportResult

The result of BigQuery export post scan action.

Fields
state

State

Execution state for the BigQuery exporting.

message

string

Additional information about the BigQuery exporting.

State

Execution state for the exporting.

Enums
STATE_UNSPECIFIED The exporting state is unspecified.
SUCCEEDED The exporting completed successfully.
FAILED The exporting is no longer running due to an error.
SKIPPED The exporting is skipped due to no valid scan result to export (usually caused by scan failed).

ScanType

The type of the data scan.

Enums
SCAN_TYPE_UNSPECIFIED An unspecified data scan type.
DATA_PROFILE Data scan for data profile.
DATA_QUALITY Data scan for data quality.

Scope

The scope of job for the data scan.

Enums
SCOPE_UNSPECIFIED An unspecified scope type.
FULL Data scan runs on all of the data.
INCREMENTAL Data scan runs on incremental data.

State

The job state of the data scan.

Enums
STATE_UNSPECIFIED Unspecified job state.
STARTED Data scan job started.
SUCCEEDED Data scan job successfully completed.
FAILED Data scan job was unsuccessful.
CANCELLED Data scan job was cancelled.
CREATED Data scan job was createed.

Trigger

The trigger type for the data scan.

Enums
TRIGGER_UNSPECIFIED An unspecified trigger type.
ON_DEMAND Data scan triggers on demand.
SCHEDULE Data scan triggers as per schedule.

DataScanJob

A DataScanJob represents an instance of DataScan execution.

Fields
name

string

Output only. The relative resource name of the DataScanJob, of the form: projects/{project}/locations/{location_id}/dataScans/{datascan_id}/jobs/{job_id}, where project refers to a project_id or project_number and location_id refers to a GCP region.

uid

string

Output only. System generated globally unique ID for the DataScanJob.

create_time

Timestamp

Output only. The time when the DataScanJob was created.

start_time

Timestamp

Output only. The time when the DataScanJob was started.

end_time

Timestamp

Output only. The time when the DataScanJob ended.

state

State

Output only. Execution state for the DataScanJob.

message

string

Output only. Additional information about the current state.

type

DataScanType

Output only. The type of the parent DataScan.

Union field spec. Data scan related setting. spec can be only one of the following:
data_quality_spec

DataQualitySpec

Output only. Settings for a data quality scan.

data_profile_spec

DataProfileSpec

Output only. Settings for a data profile scan.

data_discovery_spec

DataDiscoverySpec

Output only. Settings for a data discovery scan.

Union field result. The result of the data scan. result can be only one of the following:
data_quality_result

DataQualityResult

Output only. The result of a data quality scan.

data_profile_result

DataProfileResult

Output only. The result of a data profile scan.

data_discovery_result

DataDiscoveryResult

Output only. The result of a data discovery scan.

State

Execution state for the DataScanJob.

Enums
STATE_UNSPECIFIED The DataScanJob state is unspecified.
RUNNING The DataScanJob is running.
CANCELING The DataScanJob is canceling.
CANCELLED The DataScanJob cancellation was successful.
SUCCEEDED The DataScanJob completed successfully.
FAILED The DataScanJob is no longer running due to an error.
PENDING The DataScanJob has been created but not started to run yet.

DataScanType

The type of data scan.

Enums
DATA_SCAN_TYPE_UNSPECIFIED The data scan type is unspecified.
DATA_QUALITY Data quality scan.
DATA_PROFILE Data profile scan.
DATA_DISCOVERY Data discovery scan.

DataSource

The data source for DataScan.

Fields
Union field source. The source is required and immutable. Once it is set, it cannot be change to others. source can be only one of the following:
entity

string

Immutable. The Dataplex entity that represents the data source (e.g. BigQuery table) for DataScan, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/entities/{entity_id}.

resource

string

Immutable. The service-qualified full resource name of the cloud resource for a DataScan job to scan against. The field could be: BigQuery table of type "TABLE" for DataProfileScan/DataQualityScan Format: //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID

DataTaxonomy

DataTaxonomy represents a set of hierarchical DataAttributes resources, grouped with a common theme Eg: 'SensitiveDataTaxonomy' can have attributes to manage PII data. It is defined at project level.

Fields
name

string

Output only. The relative resource name of the DataTaxonomy, of the form: projects/{project_number}/locations/{location_id}/dataTaxonomies/{data_taxonomy_id}.

uid

string

Output only. System generated globally unique ID for the dataTaxonomy. This ID will be different if the DataTaxonomy is deleted and re-created with the same name.

create_time

Timestamp

Output only. The time when the DataTaxonomy was created.

update_time

Timestamp

Output only. The time when the DataTaxonomy was last updated.

description

string

Optional. Description of the DataTaxonomy.

display_name

string

Optional. User friendly display name.

labels

map<string, string>

Optional. User-defined labels for the DataTaxonomy.

attribute_count

int32

Output only. The number of attributes in the DataTaxonomy.

etag

string

This checksum is computed by the server based on the value of other fields, and may be sent on update and delete requests to ensure the client has an up-to-date value before proceeding.

class_count

int32

Output only. The number of classes in the DataTaxonomy.

DeleteAspectTypeRequest

Delele AspectType Request.

Fields
name

string

Required. The resource name of the AspectType: projects/{project_number}/locations/{location_id}/aspectTypes/{aspect_type_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.aspectTypes.delete
etag

string

Optional. If the client provided etag value does not match the current etag value, the DeleteAspectTypeRequest method returns an ABORTED error response.

DeleteAssetRequest

Delete asset request.

Fields
name

string

Required. The resource name of the asset: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/assets/{asset_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.assets.delete

DeleteContentRequest

Delete content request.

Fields
name

string

Required. The resource name of the content: projects/{project_id}/locations/{location_id}/lakes/{lake_id}/content/{content_id}

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.content.delete

DeleteDataAttributeBindingRequest

Delete DataAttributeBinding request.

Fields
name

string

Required. The resource name of the DataAttributeBinding: projects/{project_number}/locations/{location_id}/dataAttributeBindings/{data_attribute_binding_id}

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.dataAttributeBindings.delete
etag

string

Required. If the client provided etag value does not match the current etag value, the DeleteDataAttributeBindingRequest method returns an ABORTED error response. Etags must be used when calling the DeleteDataAttributeBinding.

DeleteDataAttributeRequest

Delete DataAttribute request.

Fields
name

string

Required. The resource name of the DataAttribute: projects/{project_number}/locations/{location_id}/dataTaxonomies/{dataTaxonomy}/attributes/{data_attribute_id}

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.dataAttributes.delete
etag

string

Optional. If the client provided etag value does not match the current etag value, the DeleteDataAttribute method returns an ABORTED error response.

DeleteDataScanRequest

Delete dataScan request.

Fields
name

string

Required. The resource name of the dataScan: projects/{project}/locations/{location_id}/dataScans/{data_scan_id} where project refers to a project_id or project_number and location_id refers to a GCP region.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.datascans.delete

DeleteDataTaxonomyRequest

Delete DataTaxonomy request.

Fields
name

string

Required. The resource name of the DataTaxonomy: projects/{project_number}/locations/{location_id}/dataTaxonomies/{data_taxonomy_id}

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.dataTaxonomies.delete
etag

string

Optional. If the client provided etag value does not match the current etag value,the DeleteDataTaxonomy method returns an ABORTED error.

DeleteEntityRequest

Delete a metadata entity request.

Fields
name

string

Required. The resource name of the entity: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/entities/{entity_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.entities.delete
etag

string

Required. The etag associated with the entity, which can be retrieved with a [GetEntity][] request.

DeleteEntryGroupRequest

Delete EntryGroup Request.

Fields
name

string

Required. The resource name of the EntryGroup: projects/{project_number}/locations/{location_id}/entryGroups/{entry_group_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.entryGroups.delete
etag

string

Optional. If the client provided etag value does not match the current etag value, the DeleteEntryGroupRequest method returns an ABORTED error response.

DeleteEntryRequest

Delete Entry request.

Fields
name

string

Required. The resource name of the Entry: projects/{project}/locations/{location}/entryGroups/{entry_group}/entries/{entry}.

DeleteEntryTypeRequest

Delele EntryType Request.

Fields
name

string

Required. The resource name of the EntryType: projects/{project_number}/locations/{location_id}/entryTypes/{entry_type_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.entryTypes.delete
etag

string

Optional. If the client provided etag value does not match the current etag value, the DeleteEntryTypeRequest method returns an ABORTED error response.

DeleteEnvironmentRequest

Delete environment request.

Fields
name

string

Required. The resource name of the environment: projects/{project_id}/locations/{location_id}/lakes/{lake_id}/environments/{environment_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.environments.delete

DeleteLakeRequest

Delete lake request.

Fields
name

string

Required. The resource name of the lake: projects/{project_number}/locations/{location_id}/lakes/{lake_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.lakes.delete

DeletePartitionRequest

Delete metadata partition request.

Fields
name

string

Required. The resource name of the partition. format: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/entities/{entity_id}/partitions/{partition_value_path}. The {partition_value_path} segment consists of an ordered sequence of partition values separated by "/". All values must be provided.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.partitions.delete
etag
(deprecated)

string

Optional. The etag associated with the partition.

DeleteTaskRequest

Delete task request.

Fields
name

string

Required. The resource name of the task: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/task/{task_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.tasks.delete

DeleteZoneRequest

Delete zone request.

Fields
name

string

Required. The resource name of the zone: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.zones.delete

DiscoveryEvent

The payload associated with Discovery data processing.

Fields
message

string

The log message.

lake_id

string

The id of the associated lake.

zone_id

string

The id of the associated zone.

asset_id

string

The id of the associated asset.

data_location

string

The data location associated with the event.

datascan_id

string

The id of the associated datascan for standalone discovery.

type

EventType

The type of the event being logged.

Union field details. Additional details about the event. details can be only one of the following:
config

ConfigDetails

Details about discovery configuration in effect.

entity

EntityDetails

Details about the entity associated with the event.

partition

PartitionDetails

Details about the partition associated with the event.

action

ActionDetails

Details about the action associated with the event.

table

TableDetails

Details about the BigQuery table publishing associated with the event.

ActionDetails

Details about the action.

Fields
type

string

The type of action. Eg. IncompatibleDataSchema, InvalidDataFormat

issue

string

The human readable issue associated with the action.

ConfigDetails

Details about configuration events.

Fields
parameters

map<string, string>

A list of discovery configuration parameters in effect. The keys are the field paths within DiscoverySpec. Eg. includePatterns, excludePatterns, csvOptions.disableTypeInference, etc.

EntityDetails

Details about the entity.

Fields
entity

string

The name of the entity resource. The name is the fully-qualified resource name.

type

EntityType

The type of the entity resource.

EntityType

The type of the entity.

Enums
ENTITY_TYPE_UNSPECIFIED An unspecified event type.
TABLE Entities representing structured data.
FILESET Entities representing unstructured data.

EventType

The type of the event.

Enums
EVENT_TYPE_UNSPECIFIED An unspecified event type.
CONFIG An event representing discovery configuration in effect.
ENTITY_CREATED An event representing a metadata entity being created.
ENTITY_UPDATED An event representing a metadata entity being updated.
ENTITY_DELETED An event representing a metadata entity being deleted.
PARTITION_CREATED An event representing a partition being created.
PARTITION_UPDATED An event representing a partition being updated.
PARTITION_DELETED An event representing a partition being deleted.

PartitionDetails

Details about the partition.

Fields
partition

string

The name to the partition resource. The name is the fully-qualified resource name.

entity

string

The name to the containing entity resource. The name is the fully-qualified resource name.

type

EntityType

The type of the containing entity resource.

sampled_data_locations[]

string

The locations of the data items (e.g., a Cloud Storage objects) sampled for metadata inference.

TableDetails

Details about the published table.

Fields
table

string

The fully-qualified resource name of the table resource.

type

TableType

The type of the table resource.

TableType

The type of the published table.

Enums
TABLE_TYPE_UNSPECIFIED An unspecified table type.
EXTERNAL_TABLE External table type.
BIGLAKE_TABLE BigLake table type.
OBJECT_TABLE Object table type for unstructured data.

Entity

Represents tables and fileset metadata contained within a zone.

Fields
name

string

Output only. The resource name of the entity, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/entities/{id}.

display_name

string

Optional. Display name must be shorter than or equal to 256 characters.

description

string

Optional. User friendly longer description text. Must be shorter than or equal to 1024 characters.

create_time

Timestamp

Output only. The time when the entity was created.

update_time

Timestamp

Output only. The time when the entity was last updated.

id

string

Required. A user-provided entity ID. It is mutable, and will be used as the published table name. Specifying a new ID in an update entity request will override the existing value. The ID must contain only letters (a-z, A-Z), numbers (0-9), and underscores, and consist of 256 or fewer characters.

etag

string

Optional. The etag associated with the entity, which can be retrieved with a [GetEntity][] request. Required for update and delete requests.

type

Type

Required. Immutable. The type of entity.

asset

string

Required. Immutable. The ID of the asset associated with the storage location containing the entity data. The entity must be with in the same zone with the asset.

data_path

string

Required. Immutable. The storage path of the entity data. For Cloud Storage data, this is the fully-qualified path to the entity, such as gs://bucket/path/to/data. For BigQuery data, this is the name of the table resource, such as projects/project_id/datasets/dataset_id/tables/table_id.

data_path_pattern

string

Optional. The set of items within the data path constituting the data in the entity, represented as a glob path. Example: gs://bucket/path/to/data/**/*.csv.

catalog_entry

string

Output only. The name of the associated Data Catalog entry.

system

StorageSystem

Required. Immutable. Identifies the storage system of the entity data.

format

StorageFormat

Required. Identifies the storage format of the entity data. It does not apply to entities with data stored in BigQuery.

compatibility

CompatibilityStatus

Output only. Metadata stores that the entity is compatible with.

access

StorageAccess

Output only. Identifies the access mechanism to the entity. Not user settable.

uid

string

Output only. System generated unique ID for the Entity. This ID will be different if the Entity is deleted and re-created with the same name.

schema

Schema

Required. The description of the data structure and layout. The schema is not included in list responses. It is only included in SCHEMA and FULL entity views of a GetEntity response.

CompatibilityStatus

Provides compatibility information for various metadata stores.

Fields
hive_metastore

Compatibility

Output only. Whether this entity is compatible with Hive Metastore.

bigquery

Compatibility

Output only. Whether this entity is compatible with BigQuery.

Compatibility

Provides compatibility information for a specific metadata store.

Fields
compatible

bool

Output only. Whether the entity is compatible and can be represented in the metadata store.

reason

string

Output only. Provides additional detail if the entity is incompatible with the metadata store.

Type

The type of entity.

Enums
TYPE_UNSPECIFIED Type unspecified.
TABLE Structured and semi-structured data.
FILESET Unstructured data.

Entry

An entry is a representation of a data resource that can be described by various metadata.

Fields
name

string

Identifier. The relative resource name of the entry, in the format projects/{project_id_or_number}/locations/{location_id}/entryGroups/{entry_group_id}/entries/{entry_id}.

entry_type

string

Required. Immutable. The relative resource name of the entry type that was used to create this entry, in the format projects/{project_id_or_number}/locations/{location_id}/entryTypes/{entry_type_id}.

create_time

Timestamp

Output only. The time when the entry was created in Dataplex.

update_time

Timestamp

Output only. The time when the entry was last updated in Dataplex.

aspects

map<string, Aspect>

Optional. The aspects that are attached to the entry. Depending on how the aspect is attached to the entry, the format of the aspect key can be one of the following:

  • If the aspect is attached directly to the entry: {project_id_or_number}.{location_id}.{aspect_type_id}
  • If the aspect is attached to an entry's path: {project_id_or_number}.{location_id}.{aspect_type_id}@{path}
parent_entry

string

Optional. Immutable. The resource name of the parent entry.

fully_qualified_name

string

Optional. A name for the entry that can be referenced by an external system. For more information, see Fully qualified names. The maximum size of the field is 4000 characters.

entry_source

EntrySource

Optional. Information related to the source system of the data resource that is represented by the entry.

EntryGroup

An Entry Group represents a logical grouping of one or more Entries.

Fields
name

string

Output only. The relative resource name of the EntryGroup, in the format projects/{project_id_or_number}/locations/{location_id}/entryGroups/{entry_group_id}.

uid

string

Output only. System generated globally unique ID for the EntryGroup. If you delete and recreate the EntryGroup with the same name, this ID will be different.

create_time

Timestamp

Output only. The time when the EntryGroup was created.

update_time

Timestamp

Output only. The time when the EntryGroup was last updated.

description

string

Optional. Description of the EntryGroup.

display_name

string

Optional. User friendly display name.

labels

map<string, string>

Optional. User-defined labels for the EntryGroup.

etag

string

This checksum is computed by the service, and might be sent on update and delete requests to ensure the client has an up-to-date value before proceeding.

transfer_status

TransferStatus

Output only. Denotes the transfer status of the Entry Group. It is unspecified for Entry Group created from Dataplex API.

EntrySource

Information related to the source system of the data resource that is represented by the entry.

Fields
resource

string

The name of the resource in the source system. Maximum length is 4,000 characters.

system

string

The name of the source system. Maximum length is 64 characters.

platform

string

The platform containing the source system. Maximum length is 64 characters.

display_name

string

A user-friendly display name. Maximum length is 500 characters.

description

string

A description of the data resource. Maximum length is 2,000 characters.

labels

map<string, string>

User-defined labels. The maximum size of keys and values is 128 characters each.

ancestors[]

Ancestor

Immutable. The entries representing the ancestors of the data resource in the source system.

create_time

Timestamp

The time when the resource was created in the source system.

update_time

Timestamp

The time when the resource was last updated in the source system. If the entry exists in the system and its EntrySource has update_time populated, further updates to the EntrySource of the entry must provide incremental updates to its update_time.

location

string

Output only. Location of the resource in the source system. You can search the entry by this location. By default, this should match the location of the entry group containing this entry. A different value allows capturing the source location for data external to Google Cloud.

Ancestor

Information about individual items in the hierarchy that is associated with the data resource.

Fields
name

string

Optional. The name of the ancestor resource.

type

string

Optional. The type of the ancestor resource.

EntryType

Entry Type is a template for creating Entries.

Fields
name

string

Output only. The relative resource name of the EntryType, of the form: projects/{project_number}/locations/{location_id}/entryTypes/{entry_type_id}.

uid

string

Output only. System generated globally unique ID for the EntryType. This ID will be different if the EntryType is deleted and re-created with the same name.

create_time

Timestamp

Output only. The time when the EntryType was created.

update_time

Timestamp

Output only. The time when the EntryType was last updated.

description

string

Optional. Description of the EntryType.

display_name

string

Optional. User friendly display name.

labels

map<string, string>

Optional. User-defined labels for the EntryType.

etag

string

Optional. This checksum is computed by the service, and might be sent on update and delete requests to ensure the client has an up-to-date value before proceeding.

type_aliases[]

string

Optional. Indicates the classes this Entry Type belongs to, for example, TABLE, DATABASE, MODEL.

platform

string

Optional. The platform that Entries of this type belongs to.

system

string

Optional. The system that Entries of this type belongs to. Examples include CloudSQL, MariaDB etc

required_aspects[]

AspectInfo

AspectInfo for the entry type.

authorization

Authorization

Immutable. Authorization defined for this type.

AspectInfo

Fields
type

string

Required aspect type for the entry type.

Authorization

Authorization for an Entry Type.

Fields
alternate_use_permission

string

Immutable. The IAM permission grantable on the Entry Group to allow access to instantiate Entries of Dataplex owned Entry Types, only settable for Dataplex owned Types.

EntryView

View for controlling which parts of an entry are to be returned.

Enums
ENTRY_VIEW_UNSPECIFIED Unspecified EntryView. Defaults to FULL.
BASIC Returns entry only, without aspects.
FULL Returns all required aspects as well as the keys of all non-required aspects.
CUSTOM Returns aspects matching custom fields in GetEntryRequest. If the number of aspects exceeds 100, the first 100 will be returned.
ALL Returns all aspects. If the number of aspects exceeds 100, the first 100 will be returned.

Environment

Environment represents a user-visible compute infrastructure for analytics within a lake.

Fields
name

string

Output only. The relative resource name of the environment, of the form: projects/{project_id}/locations/{location_id}/lakes/{lake_id}/environment/{environment_id}

display_name

string

Optional. User friendly display name.

uid

string

Output only. System generated globally unique ID for the environment. This ID will be different if the environment is deleted and re-created with the same name.

create_time

Timestamp

Output only. Environment creation time.

update_time

Timestamp

Output only. The time when the environment was last updated.

labels

map<string, string>

Optional. User defined labels for the environment.

description

string

Optional. Description of the environment.

state

State

Output only. Current state of the environment.

infrastructure_spec

InfrastructureSpec

Required. Infrastructure specification for the Environment.

session_spec

SessionSpec

Optional. Configuration for sessions created for this environment.

session_status

SessionStatus

Output only. Status of sessions created for this environment.

endpoints

Endpoints

Output only. URI Endpoints to access sessions associated with the Environment.

Endpoints

URI Endpoints to access sessions associated with the Environment.

Fields
notebooks

string

Output only. URI to serve notebook APIs

sql

string

Output only. URI to serve SQL APIs

InfrastructureSpec

Configuration for the underlying infrastructure used to run workloads.

Fields
Union field resources. Hardware config resources can be only one of the following:
compute

ComputeResources

Optional. Compute resources needed for analyze interactive workloads.

Union field runtime. Software config runtime can be only one of the following:
os_image

OsImageRuntime

Required. Software Runtime Configuration for analyze interactive workloads.

ComputeResources

Compute resources associated with the analyze interactive workloads.

Fields
disk_size_gb

int32

Optional. Size in GB of the disk. Default is 100 GB.

node_count

int32

Optional. Total number of nodes in the sessions created for this environment.

max_node_count

int32

Optional. Max configurable nodes. If max_node_count > node_count, then auto-scaling is enabled.

OsImageRuntime

Software Runtime Configuration to run Analyze.

Fields
image_version

string

Required. Dataplex Image version.

java_libraries[]

string

Optional. List of Java jars to be included in the runtime environment. Valid input includes Cloud Storage URIs to Jar binaries. For example, gs://bucket-name/my/path/to/file.jar

python_packages[]

string

Optional. A list of python packages to be installed. Valid formats include Cloud Storage URI to a PIP installable library. For example, gs://bucket-name/my/path/to/lib.tar.gz

properties

map<string, string>

Optional. Spark properties to provide configuration for use in sessions created for this environment. The properties to set on daemon config files. Property keys are specified in prefix:property format. The prefix must be "spark".

SessionSpec

Configuration for sessions created for this environment.

Fields
max_idle_duration

Duration

Optional. The idle time configuration of the session. The session will be auto-terminated at the end of this period.

enable_fast_startup

bool

Optional. If True, this causes sessions to be pre-created and available for faster startup to enable interactive exploration use-cases. This defaults to False to avoid additional billed charges. These can only be set to True for the environment with name set to "default", and with default configuration.

SessionStatus

Status of sessions created for this environment.

Fields
active

bool

Output only. Queries over sessions to mark whether the environment is currently active or not

GenerateDataQualityRulesRequest

Request details for generating data quality rule recommendations.

Fields
name

string

Required. The name must be one of the following:

  • The name of a data scan with at least one successful, completed data profiling job
  • The name of a successful, completed data profiling job (a data scan job where the job type is data profiling)

GenerateDataQualityRulesResponse

Response details for data quality rule recommendations.

Fields
rule[]

DataQualityRule

The data quality rules that Dataplex generates based on the results of a data profiling scan.

GetAspectTypeRequest

Get AspectType request.

Fields
name

string

Required. The resource name of the AspectType: projects/{project_number}/locations/{location_id}/aspectTypes/{aspect_type_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.aspectTypes.get

GetAssetRequest

Get asset request.

Fields
name

string

Required. The resource name of the asset: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/assets/{asset_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.assets.get

GetContentRequest

Get content request.

Fields
name

string

Required. The resource name of the content: projects/{project_id}/locations/{location_id}/lakes/{lake_id}/content/{content_id}

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.content.get
view

ContentView

Optional. Specify content view to make a partial request.

ContentView

Specifies whether the request should return the full or the partial representation.

Enums
CONTENT_VIEW_UNSPECIFIED Content view not specified. Defaults to BASIC. The API will default to the BASIC view.
BASIC Will not return the data_text field.
FULL Returns the complete proto.

GetDataAttributeBindingRequest

Get DataAttributeBinding request.

Fields
name

string

Required. The resource name of the DataAttributeBinding: projects/{project_number}/locations/{location_id}/dataAttributeBindings/{data_attribute_binding_id}

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.dataAttributeBindings.get

GetDataAttributeRequest

Get DataAttribute request.

Fields
name

string

Required. The resource name of the dataAttribute: projects/{project_number}/locations/{location_id}/dataTaxonomies/{dataTaxonomy}/attributes/{data_attribute_id}

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.dataAttributes.get

GetDataScanJobRequest

Get DataScanJob request.

Fields
name

string

Required. The resource name of the DataScanJob: projects/{project}/locations/{location_id}/dataScans/{data_scan_id}/jobs/{data_scan_job_id} where project refers to a project_id or project_number and location_id refers to a GCP region.

Authorization requires the following IAM permission on the specified resource name:

  • iam.permissions.none
view

DataScanJobView

Optional. Select the DataScanJob view to return. Defaults to BASIC.

DataScanJobView

DataScanJob view options.

Enums
DATA_SCAN_JOB_VIEW_UNSPECIFIED The API will default to the BASIC view.
BASIC Basic view that does not include spec and result.
FULL Include everything.

GetDataScanRequest

Get dataScan request.

Fields
name

string

Required. The resource name of the dataScan: projects/{project}/locations/{location_id}/dataScans/{data_scan_id} where project refers to a project_id or project_number and location_id refers to a GCP region.

Authorization requires the following IAM permission on the specified resource name:

  • iam.permissions.none
view

DataScanView

Optional. Select the DataScan view to return. Defaults to BASIC.

DataScanView

DataScan view options.

Enums
DATA_SCAN_VIEW_UNSPECIFIED The API will default to the BASIC view.
BASIC Basic view that does not include spec and result.
FULL Include everything.

GetDataTaxonomyRequest

Get DataTaxonomy request.

Fields
name

string

Required. The resource name of the DataTaxonomy: projects/{project_number}/locations/{location_id}/dataTaxonomies/{data_taxonomy_id}

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.dataTaxonomies.get

GetEntityRequest

Get metadata entity request.

Fields
name

string

Required. The resource name of the entity: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/entities/{entity_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.entities.get
view

EntityView

Optional. Used to select the subset of entity information to return. Defaults to BASIC.

EntityView

Entity views for get entity partial result.

Enums
ENTITY_VIEW_UNSPECIFIED The API will default to the BASIC view.
BASIC Minimal view that does not include the schema.
SCHEMA Include basic information and schema.
FULL Include everything. Currently, this is the same as the SCHEMA view.

GetEntryGroupRequest

Get EntryGroup request.

Fields
name

string

Required. The resource name of the EntryGroup: projects/{project_number}/locations/{location_id}/entryGroups/{entry_group_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.entryGroups.get

GetEntryRequest

Get Entry request.

Fields
name

string

Required. The resource name of the Entry: projects/{project}/locations/{location}/entryGroups/{entry_group}/entries/{entry}.

view

EntryView

Optional. View to control which parts of an entry the service should return.

aspect_types[]

string

Optional. Limits the aspects returned to the provided aspect types. It only works for CUSTOM view.

paths[]

string

Optional. Limits the aspects returned to those associated with the provided paths within the Entry. It only works for CUSTOM view.

GetEntryTypeRequest

Get EntryType request.

Fields
name

string

Required. The resource name of the EntryType: projects/{project_number}/locations/{location_id}/entryTypes/{entry_type_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.entryTypes.get

GetEnvironmentRequest

Get environment request.

Fields
name

string

Required. The resource name of the environment: projects/{project_id}/locations/{location_id}/lakes/{lake_id}/environments/{environment_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.environments.get

GetJobRequest

Get job request.

Fields
name

string

Required. The resource name of the job: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/tasks/{task_id}/jobs/{job_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.tasks.get

GetLakeRequest

Get lake request.

Fields
name

string

Required. The resource name of the lake: projects/{project_number}/locations/{location_id}/lakes/{lake_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.lakes.get

GetMetadataJobRequest

Get metadata job request.

Fields
name

string

Required. The resource name of the metadata job, in the format projects/{project_id_or_number}/locations/{location_id}/metadataJobs/{metadata_job_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.metadataJobs.get

GetPartitionRequest

Get metadata partition request.

Fields
name

string

Required. The resource name of the partition: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/entities/{entity_id}/partitions/{partition_value_path}. The {partition_value_path} segment consists of an ordered sequence of partition values separated by "/". All values must be provided.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.partitions.get

GetTaskRequest

Get task request.

Fields
name

string

Required. The resource name of the task: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/tasks/{tasks_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.tasks.get

GetZoneRequest

Get zone request.

Fields
name

string

Required. The resource name of the zone: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.zones.get

GovernanceEvent

Payload associated with Governance related log events.

Fields
message

string

The log message.

event_type

EventType

The type of the event.

entity

Entity

Entity resource information if the log event is associated with a specific entity.

Entity

Information about Entity resource that the log event is associated with.

Fields
entity

string

The Entity resource the log event is associated with. Format: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/entities/{entity_id}

entity_type

EntityType

Type of entity.

EntityType

Type of entity.

Enums
ENTITY_TYPE_UNSPECIFIED An unspecified Entity type.
TABLE Table entity type.
FILESET Fileset entity type.

EventType

Type of governance log event.

Enums
EVENT_TYPE_UNSPECIFIED An unspecified event type.
RESOURCE_IAM_POLICY_UPDATE Resource IAM policy update event.
BIGQUERY_TABLE_CREATE BigQuery table create event.
BIGQUERY_TABLE_UPDATE BigQuery table update event.
BIGQUERY_TABLE_DELETE BigQuery table delete event.
BIGQUERY_CONNECTION_CREATE BigQuery connection create event.
BIGQUERY_CONNECTION_UPDATE BigQuery connection update event.
BIGQUERY_CONNECTION_DELETE BigQuery connection delete event.
BIGQUERY_TAXONOMY_CREATE BigQuery taxonomy created.
BIGQUERY_POLICY_TAG_CREATE BigQuery policy tag created.
BIGQUERY_POLICY_TAG_DELETE BigQuery policy tag deleted.
BIGQUERY_POLICY_TAG_SET_IAM_POLICY BigQuery set iam policy for policy tag.
ACCESS_POLICY_UPDATE Access policy update event.
GOVERNANCE_RULE_MATCHED_RESOURCES Number of resources matched with particular Query.
GOVERNANCE_RULE_SEARCH_LIMIT_EXCEEDS Rule processing exceeds the allowed limit.
GOVERNANCE_RULE_ERRORS Rule processing errors.
GOVERNANCE_RULE_PROCESSING Governance rule processing Event.

ImportItem

An object that describes the values that you want to set for an entry and its attached aspects when you import metadata. Used when you run a metadata import job. See CreateMetadataJob.

You provide a collection of import items in a metadata import file. For more information about how to create a metadata import file, see Metadata import file.

Fields
entry

Entry

Information about an entry and its attached aspects.

update_mask

FieldMask

The fields to update, in paths that are relative to the Entry resource. Separate each field with a comma.

In FULL entry sync mode, Dataplex includes the paths of all of the fields for an entry that can be modified, including aspects. This means that Dataplex replaces the existing entry with the entry in the metadata import file. All modifiable fields are updated, regardless of the fields that are listed in the update mask, and regardless of whether a field is present in the entry object.

The update_mask field is ignored when an entry is created or re-created.

Dataplex also determines which entries and aspects to modify by comparing the values and timestamps that you provide in the metadata import file with the values and timestamps that exist in your project. For more information, see Comparison logic.

aspect_keys[]

string

The aspects to modify. Supports the following syntaxes:

  • {aspect_type_reference}: matches aspects that belong to the specified aspect type and are attached directly to the entry.
  • {aspect_type_reference}@{path}: matches aspects that belong to the specified aspect type and path.
  • {aspect_type_reference}@*: matches aspects that belong to the specified aspect type for all paths.

Replace {aspect_type_reference} with a reference to the aspect type, in the format {project_id_or_number}.{location_id}.{aspect_type_id}.

If you leave this field empty, it is treated as specifying exactly those aspects that are present within the specified entry.

In FULL entry sync mode, Dataplex implicitly adds the keys for all of the required aspects of an entry.

Job

A job represents an instance of a task.

Fields
name

string

Output only. The relative resource name of the job, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/tasks/{task_id}/jobs/{job_id}.

uid

string

Output only. System generated globally unique ID for the job.

start_time

Timestamp

Output only. The time when the job was started.

end_time

Timestamp

Output only. The time when the job ended.

state

State

Output only. Execution state for the job.

retry_count

uint32

Output only. The number of times the job has been retried (excluding the initial attempt).

service

Service

Output only. The underlying service running a job.

service_job

string

Output only. The full resource name for the job run under a particular service.

message

string

Output only. Additional information about the current state.

labels

map<string, string>

Output only. User-defined labels for the task.

trigger

Trigger

Output only. Job execution trigger.

execution_spec

ExecutionSpec

Output only. Spec related to how a task is executed.

Service

Enums
SERVICE_UNSPECIFIED Service used to run the job is unspecified.
DATAPROC Dataproc service is used to run this job.

State

Enums
STATE_UNSPECIFIED The job state is unknown.
RUNNING The job is running.
CANCELLING The job is cancelling.
CANCELLED The job cancellation was successful.
SUCCEEDED The job completed successfully.
FAILED The job is no longer running due to an error.
ABORTED The job was cancelled outside of Dataplex.

Trigger

Job execution trigger.

Enums
TRIGGER_UNSPECIFIED The trigger is unspecified.
TASK_CONFIG The job was triggered by Dataplex based on trigger spec from task definition.
RUN_REQUEST The job was triggered by the explicit call of Task API.

JobEvent

The payload associated with Job logs that contains events describing jobs that have run within a Lake.

Fields
message

string

The log message.

job_id

string

The unique id identifying the job.

start_time

Timestamp

The time when the job started running.

end_time

Timestamp

The time when the job ended running.

state

State

The job state on completion.

retries

int32

The number of retries.

type

Type

The type of the job.

service

Service

The service used to execute the job.

service_job

string

The reference to the job within the service.

execution_trigger

ExecutionTrigger

Job execution trigger.

ExecutionTrigger

Job Execution trigger.

Enums
EXECUTION_TRIGGER_UNSPECIFIED The job execution trigger is unspecified.
TASK_CONFIG The job was triggered by Dataplex based on trigger spec from task definition.
RUN_REQUEST The job was triggered by the explicit call of Task API.

Service

The service used to execute the job.

Enums
SERVICE_UNSPECIFIED Unspecified service.
DATAPROC Cloud Dataproc.

State

The completion status of the job.

Enums
STATE_UNSPECIFIED Unspecified job state.
SUCCEEDED Job successfully completed.
FAILED Job was unsuccessful.
CANCELLED Job was cancelled by the user.
ABORTED Job was cancelled or aborted via the service executing the job.

Type

The type of the job.

Enums
TYPE_UNSPECIFIED Unspecified job type.
SPARK Spark jobs.
NOTEBOOK Notebook jobs.

Lake

A lake is a centralized repository for managing enterprise data across the organization distributed across many cloud projects, and stored in a variety of storage services such as Google Cloud Storage and BigQuery. The resources attached to a lake are referred to as managed resources. Data within these managed resources can be structured or unstructured. A lake provides data admins with tools to organize, secure and manage their data at scale, and provides data scientists and data engineers an integrated experience to easily search, discover, analyze and transform data and associated metadata.

Fields
name

string

Output only. The relative resource name of the lake, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}.

display_name

string

Optional. User friendly display name.

uid

string

Output only. System generated globally unique ID for the lake. This ID will be different if the lake is deleted and re-created with the same name.

create_time

Timestamp

Output only. The time when the lake was created.

update_time

Timestamp

Output only. The time when the lake was last updated.

labels

map<string, string>

Optional. User-defined labels for the lake.

description

string

Optional. Description of the lake.

state

State

Output only. Current state of the lake.

service_account

string

Output only. Service account associated with this lake. This service account must be authorized to access or operate on resources managed by the lake.

metastore

Metastore

Optional. Settings to manage lake and Dataproc Metastore service instance association.

asset_status

AssetStatus

Output only. Aggregated status of the underlying assets of the lake.

metastore_status

MetastoreStatus

Output only. Metastore status of the lake.

Metastore

Settings to manage association of Dataproc Metastore with a lake.

Fields
service

string

Optional. A relative reference to the Dataproc Metastore (https://cloud.google.com/dataproc-metastore/docs) service associated with the lake: projects/{project_id}/locations/{location_id}/services/{service_id}

MetastoreStatus

Status of Lake and Dataproc Metastore service instance association.

Fields
state

State

Current state of association.

message

string

Additional information about the current status.

update_time

Timestamp

Last update time of the metastore status of the lake.

endpoint

string

The URI of the endpoint used to access the Metastore service.

State

Current state of association.

Enums
STATE_UNSPECIFIED Unspecified.
NONE A Metastore service instance is not associated with the lake.
READY A Metastore service instance is attached to the lake.
UPDATING Attach/detach is in progress.
ERROR Attach/detach could not be done due to errors.

ListActionsResponse

List actions response.

Fields
actions[]

Action

Actions under the given parent lake/zone/asset.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

ListAspectTypesRequest

List AspectTypes request.

Fields
parent

string

Required. The resource name of the AspectType location, of the form: projects/{project_number}/locations/{location_id} where location_id refers to a Google Cloud region.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.aspectTypes.list
page_size

int32

Optional. Maximum number of AspectTypes to return. The service may return fewer than this value. If unspecified, the service returns at most 10 AspectTypes. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListAspectTypes call. Provide this to retrieve the subsequent page. When paginating, all other parameters you provide to ListAspectTypes must match the call that provided the page token.

filter

string

Optional. Filter request. Filters are case-sensitive. The service supports the following formats:

  • labels.key1 = "value1"
  • labels:key1
  • name = "value"

These restrictions can be conjoined with AND, OR, and NOT conjunctions.

order_by

string

Optional. Orders the result by name or create_time fields. If not specified, the ordering is undefined.

ListAspectTypesResponse

List AspectTypes response.

Fields
aspect_types[]

AspectType

AspectTypes under the given parent location.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

unreachable_locations[]

string

Locations that the service couldn't reach.

ListAssetActionsRequest

List asset actions request.

Fields
parent

string

Required. The resource name of the parent asset: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/assets/{asset_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.assetActions.list
page_size

int32

Optional. Maximum number of actions to return. The service may return fewer than this value. If unspecified, at most 10 actions will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListAssetActions call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListAssetActions must match the call that provided the page token.

ListAssetsRequest

List assets request.

Fields
parent

string

Required. The resource name of the parent zone: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.assets.list
page_size

int32

Optional. Maximum number of asset to return. The service may return fewer than this value. If unspecified, at most 10 assets will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListAssets call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListAssets must match the call that provided the page token.

filter

string

Optional. Filter request.

order_by

string

Optional. Order by fields for the result.

ListAssetsResponse

List assets response.

Fields
assets[]

Asset

Asset under the given parent zone.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

ListContentRequest

List content request. Returns the BASIC Content view.

Fields
parent

string

Required. The resource name of the parent lake: projects/{project_id}/locations/{location_id}/lakes/{lake_id}

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.content.list
page_size

int32

Optional. Maximum number of content to return. The service may return fewer than this value. If unspecified, at most 10 content will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListContent call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListContent must match the call that provided the page token.

filter

string

Optional. Filter request. Filters are case-sensitive. The following formats are supported:

labels.key1 = "value1" labels:key1 type = "NOTEBOOK" type = "SQL_SCRIPT"

These restrictions can be coinjoined with AND, OR and NOT conjunctions.

ListContentResponse

List content response.

Fields
content[]

Content

Content under the given parent lake.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

ListDataAttributeBindingsRequest

List DataAttributeBindings request.

Fields
parent

string

Required. The resource name of the Location: projects/{project_number}/locations/{location_id}

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.dataAttributeBindings.list
page_size

int32

Optional. Maximum number of DataAttributeBindings to return. The service may return fewer than this value. If unspecified, at most 10 DataAttributeBindings will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListDataAttributeBindings call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListDataAttributeBindings must match the call that provided the page token.

filter

string

Optional. Filter request. Filter using resource: filter=resource:"resource-name" Filter using attribute: filter=attributes:"attribute-name" Filter using attribute in paths list: filter=paths.attributes:"attribute-name"

order_by

string

Optional. Order by fields for the result.

ListDataAttributeBindingsResponse

List DataAttributeBindings response.

Fields
data_attribute_bindings[]

DataAttributeBinding

DataAttributeBindings under the given parent Location.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

unreachable_locations[]

string

Locations that could not be reached.

ListDataAttributesRequest

List DataAttributes request.

Fields
parent

string

Required. The resource name of the DataTaxonomy: projects/{project_number}/locations/{location_id}/dataTaxonomies/{data_taxonomy_id}

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.dataAttributes.list
page_size

int32

Optional. Maximum number of DataAttributes to return. The service may return fewer than this value. If unspecified, at most 10 dataAttributes will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListDataAttributes call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListDataAttributes must match the call that provided the page token.

filter

string

Optional. Filter request.

order_by

string

Optional. Order by fields for the result.

ListDataAttributesResponse

List DataAttributes response.

Fields
data_attributes[]

DataAttribute

DataAttributes under the given parent DataTaxonomy.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

unreachable_locations[]

string

Locations that could not be reached.

ListDataScanJobsRequest

List DataScanJobs request.

Fields
parent

string

Required. The resource name of the parent environment: projects/{project}/locations/{location_id}/dataScans/{data_scan_id} where project refers to a project_id or project_number and location_id refers to a GCP region.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.datascans.get
page_size

int32

Optional. Maximum number of DataScanJobs to return. The service may return fewer than this value. If unspecified, at most 10 DataScanJobs will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListDataScanJobs call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListDataScanJobs must match the call that provided the page token.

filter

string

Optional. An expression for filtering the results of the ListDataScanJobs request.

If unspecified, all datascan jobs will be returned. Multiple filters can be applied (with AND, OR logical operators). Filters are case-sensitive.

Allowed fields are:

  • start_time
  • end_time

start_time and end_time expect RFC-3339 formatted strings (e.g. 2018-10-08T18:30:00-07:00).

For instance, 'start_time > 2018-10-08T00:00:00.123456789Z AND end_time < 2018-10-09T00:00:00.123456789Z' limits results to DataScanJobs between specified start and end times.

ListDataScanJobsResponse

List DataScanJobs response.

Fields
data_scan_jobs[]

DataScanJob

DataScanJobs (BASIC view only) under a given dataScan.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

ListDataScansRequest

List dataScans request.

Fields
parent

string

Required. The resource name of the parent location: projects/{project}/locations/{location_id} where project refers to a project_id or project_number and location_id refers to a GCP region.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.datascans.list
page_size

int32

Optional. Maximum number of dataScans to return. The service may return fewer than this value. If unspecified, at most 500 scans will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListDataScans call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListDataScans must match the call that provided the page token.

filter

string

Optional. Filter request.

order_by

string

Optional. Order by fields (name or create_time) for the result. If not specified, the ordering is undefined.

ListDataScansResponse

List dataScans response.

Fields
data_scans[]

DataScan

DataScans (BASIC view only) under the given parent location.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

unreachable[]

string

Locations that could not be reached.

ListDataTaxonomiesRequest

List DataTaxonomies request.

Fields
parent

string

Required. The resource name of the DataTaxonomy location, of the form: projects/{project_number}/locations/{location_id} where location_id refers to a GCP region.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.dataTaxonomies.list
page_size

int32

Optional. Maximum number of DataTaxonomies to return. The service may return fewer than this value. If unspecified, at most 10 DataTaxonomies will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListDataTaxonomies call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListDataTaxonomies must match the call that provided the page token.

filter

string

Optional. Filter request.

order_by

string

Optional. Order by fields for the result.

ListDataTaxonomiesResponse

List DataTaxonomies response.

Fields
data_taxonomies[]

DataTaxonomy

DataTaxonomies under the given parent location.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

unreachable_locations[]

string

Locations that could not be reached.

ListEntitiesRequest

List metadata entities request.

Fields
parent

string

Required. The resource name of the parent zone: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.entities.list
view

EntityView

Required. Specify the entity view to make a partial list request.

page_size

int32

Optional. Maximum number of entities to return. The service may return fewer than this value. If unspecified, 100 entities will be returned by default. The maximum value is 500; larger values will will be truncated to 500.

page_token

string

Optional. Page token received from a previous ListEntities call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListEntities must match the call that provided the page token.

filter

string

Optional. The following filter parameters can be added to the URL to limit the entities returned by the API:

  • Entity ID: ?filter="id=entityID"
  • Asset ID: ?filter="asset=assetID"
  • Data path ?filter="data_path=gs://my-bucket"
  • Is HIVE compatible: ?filter="hive_compatible=true"
  • Is BigQuery compatible: ?filter="bigquery_compatible=true"

EntityView

Entity views.

Enums
ENTITY_VIEW_UNSPECIFIED The default unset value. Return both table and fileset entities if unspecified.
TABLES Only list table entities.
FILESETS Only list fileset entities.

ListEntitiesResponse

List metadata entities response.

Fields
entities[]

Entity

Entities in the specified parent zone.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no remaining results in the list.

ListEntriesRequest

List Entries request.

Fields
parent

string

Required. The resource name of the parent Entry Group: projects/{project}/locations/{location}/entryGroups/{entry_group}.

page_size

int32

Optional. Number of items to return per page. If there are remaining results, the service returns a next_page_token. If unspecified, the service returns at most 10 Entries. The maximum value is 100; values above 100 will be coerced to 100.

page_token

string

Optional. Page token received from a previous ListEntries call. Provide this to retrieve the subsequent page.

filter

string

Optional. A filter on the entries to return. Filters are case-sensitive. You can filter the request by the following fields:

  • entry_type
  • entry_source.display_name

The comparison operators are =, !=, <, >, <=, >=. The service compares strings according to lexical order.

You can use the logical operators AND, OR, NOT in the filter.

You can use Wildcard "*", but for entry_type you need to provide the full project id or number.

Example filter expressions:

  • "entry_source.display_name=AnExampleDisplayName"
  • "entry_type=projects/example-project/locations/global/entryTypes/example-entry_type"
  • "entry_type=projects/example-project/locations/us/entryTypes/a* OR entry_type=projects/another-project/locations/*"
  • "NOT entry_source.display_name=AnotherExampleDisplayName"

ListEntriesResponse

List Entries response.

Fields
entries[]

Entry

The list of entries under the given parent location.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

ListEntryGroupsRequest

List entryGroups request.

Fields
parent

string

Required. The resource name of the entryGroup location, of the form: projects/{project_number}/locations/{location_id} where location_id refers to a Google Cloud region.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.entryGroups.list
page_size

int32

Optional. Maximum number of EntryGroups to return. The service may return fewer than this value. If unspecified, the service returns at most 10 EntryGroups. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListEntryGroups call. Provide this to retrieve the subsequent page. When paginating, all other parameters you provide to ListEntryGroups must match the call that provided the page token.

filter

string

Optional. Filter request.

order_by

string

Optional. Order by fields for the result.

ListEntryGroupsResponse

List entry groups response.

Fields
entry_groups[]

EntryGroup

Entry groups under the given parent location.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

unreachable_locations[]

string

Locations that the service couldn't reach.

ListEntryTypesRequest

List EntryTypes request

Fields
parent

string

Required. The resource name of the EntryType location, of the form: projects/{project_number}/locations/{location_id} where location_id refers to a Google Cloud region.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.entryTypes.list
page_size

int32

Optional. Maximum number of EntryTypes to return. The service may return fewer than this value. If unspecified, the service returns at most 10 EntryTypes. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListEntryTypes call. Provide this to retrieve the subsequent page. When paginating, all other parameters you provided to ListEntryTypes must match the call that provided the page token.

filter

string

Optional. Filter request. Filters are case-sensitive. The service supports the following formats:

  • labels.key1 = "value1"
  • labels:key1
  • name = "value"

These restrictions can be conjoined with AND, OR, and NOT conjunctions.

order_by

string

Optional. Orders the result by name or create_time fields. If not specified, the ordering is undefined.

ListEntryTypesResponse

List EntryTypes response.

Fields
entry_types[]

EntryType

EntryTypes under the given parent location.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

unreachable_locations[]

string

Locations that the service couldn't reach.

ListEnvironmentsRequest

List environments request.

Fields
parent

string

Required. The resource name of the parent lake: projects/{project_id}/locations/{location_id}/lakes/{lake_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.environments.list
page_size

int32

Optional. Maximum number of environments to return. The service may return fewer than this value. If unspecified, at most 10 environments will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListEnvironments call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListEnvironments must match the call that provided the page token.

filter

string

Optional. Filter request.

order_by

string

Optional. Order by fields for the result.

ListEnvironmentsResponse

List environments response.

Fields
environments[]

Environment

Environments under the given parent lake.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

ListJobsRequest

List jobs request.

Fields
parent

string

Required. The resource name of the parent environment: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/tasks/{task_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.tasks.get
page_size

int32

Optional. Maximum number of jobs to return. The service may return fewer than this value. If unspecified, at most 10 jobs will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListJobs call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListJobs must match the call that provided the page token.

ListJobsResponse

List jobs response.

Fields
jobs[]

Job

Jobs under a given task.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

ListLakeActionsRequest

List lake actions request.

Fields
parent

string

Required. The resource name of the parent lake: projects/{project_number}/locations/{location_id}/lakes/{lake_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.lakeActions.list
page_size

int32

Optional. Maximum number of actions to return. The service may return fewer than this value. If unspecified, at most 10 actions will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListLakeActions call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListLakeActions must match the call that provided the page token.

ListLakesRequest

List lakes request.

Fields
parent

string

Required. The resource name of the lake location, of the form: projects/{project_number}/locations/{location_id} where location_id refers to a GCP region.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.lakes.list
page_size

int32

Optional. Maximum number of Lakes to return. The service may return fewer than this value. If unspecified, at most 10 lakes will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListLakes call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListLakes must match the call that provided the page token.

filter

string

Optional. Filter request.

order_by

string

Optional. Order by fields for the result.

ListLakesResponse

List lakes response.

Fields
lakes[]

Lake

Lakes under the given parent location.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

unreachable_locations[]

string

Locations that could not be reached.

ListMetadataJobsRequest

List metadata jobs request.

Fields
parent

string

Required. The resource name of the parent location, in the format projects/{project_id_or_number}/locations/{location_id}

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.metadataJobs.list
page_size

int32

Optional. The maximum number of metadata jobs to return. The service might return fewer jobs than this value. If unspecified, at most 10 jobs are returned. The maximum value is 1,000.

page_token

string

Optional. The page token received from a previous ListMetadataJobs call. Provide this token to retrieve the subsequent page of results. When paginating, all other parameters that are provided to the ListMetadataJobs request must match the call that provided the page token.

filter

string

Optional. Filter request. Filters are case-sensitive. The service supports the following formats:

  • labels.key1 = "value1"
  • labels:key1
  • name = "value"

You can combine filters with AND, OR, and NOT operators.

order_by

string

Optional. The field to sort the results by, either name or create_time. If not specified, the ordering is undefined.

ListMetadataJobsResponse

List metadata jobs response.

Fields
metadata_jobs[]

MetadataJob

Metadata jobs under the specified parent location.

next_page_token

string

A token to retrieve the next page of results. If there are no more results in the list, the value is empty.

unreachable_locations[]

string

Locations that the service couldn't reach.

ListPartitionsRequest

List metadata partitions request.

Fields
parent

string

Required. The resource name of the parent entity: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/entities/{entity_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.partitions.list
page_size

int32

Optional. Maximum number of partitions to return. The service may return fewer than this value. If unspecified, 100 partitions will be returned by default. The maximum page size is 500; larger values will will be truncated to 500.

page_token

string

Optional. Page token received from a previous ListPartitions call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListPartitions must match the call that provided the page token.

filter

string

Optional. Filter the partitions returned to the caller using a key value pair expression. Supported operators and syntax:

  • logic operators: AND, OR
  • comparison operators: <, >, >=, <= ,=, !=
  • LIKE operators:
  • The right hand of a LIKE operator supports "." and "*" for wildcard searches, for example "value1 LIKE ".*oo.*"
  • parenthetical grouping: ( )

Sample filter expression: `?filter="key1 < value1 OR key2 > value2"

Notes:

  • Keys to the left of operators are case insensitive.
  • Partition results are sorted first by creation time, then by lexicographic order.
  • Up to 20 key value filter pairs are allowed, but due to performance considerations, only the first 10 will be used as a filter.

ListPartitionsResponse

List metadata partitions response.

Fields
partitions[]

Partition

Partitions under the specified parent entity.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no remaining results in the list.

ListSessionsRequest

List sessions request.

Fields
parent

string

Required. The resource name of the parent environment: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/environment/{environment_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.environments.get
page_size

int32

Optional. Maximum number of sessions to return. The service may return fewer than this value. If unspecified, at most 10 sessions will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListSessions call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListSessions must match the call that provided the page token.

filter

string

Optional. Filter request. The following mode filter is supported to return only the sessions belonging to the requester when the mode is USER and return sessions of all the users when the mode is ADMIN. When no filter is sent default to USER mode. NOTE: When the mode is ADMIN, the requester should have dataplex.environments.listAllSessions permission to list all sessions, in absence of the permission, the request fails.

mode = ADMIN | USER

ListSessionsResponse

List sessions response.

Fields
sessions[]

Session

Sessions under a given environment.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

ListTasksRequest

List tasks request.

Fields
parent

string

Required. The resource name of the parent lake: projects/{project_number}/locations/{location_id}/lakes/{lake_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.tasks.list
page_size

int32

Optional. Maximum number of tasks to return. The service may return fewer than this value. If unspecified, at most 10 tasks will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListZones call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListZones must match the call that provided the page token.

filter

string

Optional. Filter request.

order_by

string

Optional. Order by fields for the result.

ListTasksResponse

List tasks response.

Fields
tasks[]

Task

Tasks under the given parent lake.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

unreachable_locations[]

string

Locations that could not be reached.

ListZoneActionsRequest

List zone actions request.

Fields
parent

string

Required. The resource name of the parent zone: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.zoneActions.list
page_size

int32

Optional. Maximum number of actions to return. The service may return fewer than this value. If unspecified, at most 10 actions will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListZoneActions call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListZoneActions must match the call that provided the page token.

ListZonesRequest

List zones request.

Fields
parent

string

Required. The resource name of the parent lake: projects/{project_number}/locations/{location_id}/lakes/{lake_id}.

Authorization requires the following IAM permission on the specified resource parent:

  • dataplex.zones.list
page_size

int32

Optional. Maximum number of zones to return. The service may return fewer than this value. If unspecified, at most 10 zones will be returned. The maximum value is 1000; values above 1000 will be coerced to 1000.

page_token

string

Optional. Page token received from a previous ListZones call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListZones must match the call that provided the page token.

filter

string

Optional. Filter request.

order_by

string

Optional. Order by fields for the result.

ListZonesResponse

List zones response.

Fields
zones[]

Zone

Zones under the given parent lake.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

LookupEntryRequest

Lookup Entry request using permissions in the source system.

Fields
name

string

Required. The project to which the request should be attributed in the following form: projects/{project}/locations/{location}.

view

EntryView

Optional. View to control which parts of an entry the service should return.

aspect_types[]

string

Optional. Limits the aspects returned to the provided aspect types. It only works for CUSTOM view.

paths[]

string

Optional. Limits the aspects returned to those associated with the provided paths within the Entry. It only works for CUSTOM view.

entry

string

Required. The resource name of the Entry: projects/{project}/locations/{location}/entryGroups/{entry_group}/entries/{entry}.

MetadataJob

A metadata job resource.

Fields
name

string

Output only. Identifier. The name of the resource that the configuration is applied to, in the format projects/{project_number}/locations/{location_id}/metadataJobs/{metadata_job_id}.

uid

string

Output only. A system-generated, globally unique ID for the metadata job. If the metadata job is deleted and then re-created with the same name, this ID is different.

create_time

Timestamp

Output only. The time when the metadata job was created.

update_time

Timestamp

Output only. The time when the metadata job was updated.

labels

map<string, string>

Optional. User-defined labels.

type

Type

Required. Metadata job type.

status

Status

Output only. Metadata job status.

Union field spec.

spec can be only one of the following:

import_spec

ImportJobSpec

Import job specification.

Union field result.

result can be only one of the following:

import_result

ImportJobResult

Output only. Import job result.

ImportJobResult

Results from a metadata import job.

Fields
deleted_entries

int64

Output only. The total number of entries that were deleted.

updated_entries

int64

Output only. The total number of entries that were updated.

created_entries

int64

Output only. The total number of entries that were created.

unchanged_entries

int64

Output only. The total number of entries that were unchanged.

recreated_entries

int64

Output only. The total number of entries that were recreated.

update_time

Timestamp

Output only. The time when the status was updated.

ImportJobSpec

Job specification for a metadata import job

Fields
source_storage_uri

string

Optional. The URI of a Cloud Storage bucket or folder (beginning with gs:// and ending with /) that contains the metadata import files for this job.

A metadata import file defines the values to set for each of the entries and aspects in a metadata job. For more information about how to create a metadata import file and the file requirements, see Metadata import file.

You can provide multiple metadata import files in the same metadata job. The bucket or folder must contain at least one metadata import file, in JSON Lines format (either .json or .jsonl file extension).

In FULL entry sync mode, don't save the metadata import file in a folder named SOURCE_STORAGE_URI/deletions/.

Caution: If the metadata import file contains no data, all entries and aspects that belong to the job's scope are deleted.

source_create_time

Timestamp

Optional. The time when the process that created the metadata import files began.

scope

ImportJobScope

Required. A boundary on the scope of impact that the metadata import job can have.

entry_sync_mode

SyncMode

Required. The sync mode for entries. Only FULL mode is supported for entries. All entries in the job's scope are modified. If an entry exists in Dataplex but isn't included in the metadata import file, the entry is deleted when you run the metadata job.

aspect_sync_mode

SyncMode

Required. The sync mode for aspects. Only INCREMENTAL mode is supported for aspects. An aspect is modified only if the metadata import file includes a reference to the aspect in the update_mask field and the aspect_keys field.

log_level

LogLevel

Optional. The level of logs to write to Cloud Logging for this job.

Debug-level logs provide highly-detailed information for troubleshooting, but their increased verbosity could incur additional costs that might not be merited for all jobs.

If unspecified, defaults to INFO.

ImportJobScope

A boundary on the scope of impact that the metadata import job can have.

Fields
entry_groups[]

string

Required. The entry group that is in scope for the import job, specified as a relative resource name in the format projects/{project_number_or_id}/locations/{location_id}/entryGroups/{entry_group_id}. Only entries that belong to the specified entry group are affected by the job.

Must contain exactly one element. The entry group and the job must be in the same location.

entry_types[]

string

Required. The entry types that are in scope for the import job, specified as relative resource names in the format projects/{project_number_or_id}/locations/{location_id}/entryTypes/{entry_type_id}. The job modifies only the entries that belong to these entry types.

If the metadata import file attempts to modify an entry whose type isn't included in this list, the import job is halted before modifying any entries or aspects.

The location of an entry type must either match the location of the job, or the entry type must be global.

aspect_types[]

string

Optional. The aspect types that are in scope for the import job, specified as relative resource names in the format projects/{project_number_or_id}/locations/{location_id}/aspectTypes/{aspect_type_id}. The job modifies only the aspects that belong to these aspect types.

If the metadata import file attempts to modify an aspect whose type isn't included in this list, the import job is halted before modifying any entries or aspects.

The location of an aspect type must either match the location of the job, or the aspect type must be global.

LogLevel

The level of logs to write to Cloud Logging for this job.

Enums
LOG_LEVEL_UNSPECIFIED Log level unspecified.
DEBUG

Debug-level logging. Captures detailed logs for each import item. Use debug-level logging to troubleshoot issues with specific import items. For example, use debug-level logging to identify resources that are missing from the job scope, entries or aspects that don't conform to the associated entry type or aspect type, or other misconfigurations with the metadata import file.

Depending on the size of your metadata job and the number of logs that are generated, debug-level logging might incur additional costs.

INFO Info-level logging. Captures logs at the overall job level. Includes aggregate logs about import items, but doesn't specify which import item has an error.

SyncMode

Specifies how the entries and aspects in a metadata job are updated.

Enums
SYNC_MODE_UNSPECIFIED Sync mode unspecified.
FULL All resources in the job's scope are modified. If a resource exists in Dataplex but isn't included in the metadata import file, the resource is deleted when you run the metadata job. Use this mode to perform a full sync of the set of entries in the job scope.
INCREMENTAL Only the entries and aspects that are explicitly included in the metadata import file are modified. Use this mode to modify a subset of resources while leaving unreferenced resources unchanged.

Status

Metadata job status.

Fields
state

State

Output only. State of the metadata job.

message

string

Output only. Message relating to the progression of a metadata job.

completion_percent

int32

Output only. Progress tracking.

update_time

Timestamp

Output only. The time when the status was updated.

State

State of a metadata job.

Enums
STATE_UNSPECIFIED State unspecified.
QUEUED The job is queued.
RUNNING The job is running.
CANCELING The job is being canceled.
CANCELED The job is canceled.
SUCCEEDED The job succeeded.
FAILED The job failed.
SUCCEEDED_WITH_ERRORS The job completed with some errors.

Type

Metadata job type.

Enums
TYPE_UNSPECIFIED Unspecified.
IMPORT Import job.

OperationMetadata

Represents the metadata of a long-running operation.

Fields
create_time

Timestamp

Output only. The time the operation was created.

end_time

Timestamp

Output only. The time the operation finished running.

target

string

Output only. Server-defined resource path for the target of the operation.

verb

string

Output only. Name of the verb executed by the operation.

status_message

string

Output only. Human-readable status of the operation, if any.

requested_cancellation

bool

Output only. Identifies whether the user has requested cancellation of the operation. Operations that have successfully been cancelled have [Operation.error][] value with a google.rpc.Status.code of 1, corresponding to Code.CANCELLED.

api_version

string

Output only. API version used to start the operation.

Partition

Represents partition metadata contained within entity instances.

Fields
name

string

Output only. Partition values used in the HTTP URL must be double encoded. For example, url_encode(url_encode(value)) can be used to encode "US:CA/CA#Sunnyvale so that the request URL ends with "/partitions/US%253ACA/CA%2523Sunnyvale". The name field in the response retains the encoded format.

values[]

string

Required. Immutable. The set of values representing the partition, which correspond to the partition schema defined in the parent entity.

location

string

Required. Immutable. The location of the entity data within the partition, for example, gs://bucket/path/to/entity/key1=value1/key2=value2. Or projects/<project_id>/datasets/<dataset_id>/tables/<table_id>

etag
(deprecated)

string

Optional. The etag for this partition.

ResourceAccessSpec

ResourceAccessSpec holds the access control configuration to be enforced on the resources, for example, Cloud Storage bucket, BigQuery dataset, BigQuery table.

Fields
readers[]

string

Optional. The format of strings follows the pattern followed by IAM in the bindings. user:{email}, serviceAccount:{email} group:{email}. The set of principals to be granted reader role on the resource.

writers[]

string

Optional. The set of principals to be granted writer role on the resource.

owners[]

string

Optional. The set of principals to be granted owner role on the resource.

RunDataScanRequest

Run DataScan Request

Fields
name

string

Required. The resource name of the DataScan: projects/{project}/locations/{location_id}/dataScans/{data_scan_id}. where project refers to a project_id or project_number and location_id refers to a GCP region.

Only OnDemand data scans are allowed.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.datascans.run

RunDataScanResponse

Run DataScan Response.

Fields
job

DataScanJob

DataScanJob created by RunDataScan request.

RunTaskRequest

Fields
name

string

Required. The resource name of the task: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/tasks/{task_id}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.tasks.run
labels

map<string, string>

Optional. User-defined labels for the task. If the map is left empty, the task will run with existing labels from task definition. If the map contains an entry with a new key, the same will be added to existing set of labels. If the map contains an entry with an existing label key in task definition, the task will run with new label value for that entry. Clearing an existing label will require label value to be explicitly set to a hyphen "-". The label value cannot be empty.

args

map<string, string>

Optional. Execution spec arguments. If the map is left empty, the task will run with existing execution spec args from task definition. If the map contains an entry with a new key, the same will be added to existing set of args. If the map contains an entry with an existing arg key in task definition, the task will run with new arg value for that entry. Clearing an existing arg will require arg value to be explicitly set to a hyphen "-". The arg value cannot be empty.

RunTaskResponse

Fields
job

Job

Jobs created by RunTask API.

ScannedData

The data scanned during processing (e.g. in incremental DataScan)

Fields
Union field data_range. The range of scanned data data_range can be only one of the following:
incremental_field

IncrementalField

The range denoted by values of an incremental field

IncrementalField

A data range denoted by a pair of start/end values of a field.

Fields
field

string

The field that contains values which monotonically increases over time (e.g. a timestamp column).

start

string

Value that marks the start of the range.

end

string

Value that marks the end of the range.

Schema

Schema information describing the structure and layout of the data.

Fields
user_managed

bool

Required. Set to true if user-managed or false if managed by Dataplex. The default is false (managed by Dataplex).

  • Set to falseto enable Dataplex discovery to update the schema. including new data discovery, schema inference, and schema evolution. Users retain the ability to input and edit the schema. Dataplex treats schema input by the user as though produced by a previous Dataplex discovery operation, and it will evolve the schema and take action based on that treatment.

  • Set to true to fully manage the entity schema. This setting guarantees that Dataplex will not change schema fields.

fields[]

SchemaField

Optional. The sequence of fields describing data in table entities. Note: BigQuery SchemaFields are immutable.

partition_fields[]

PartitionField

Optional. The sequence of fields describing the partition structure in entities. If this field is empty, there are no partitions within the data.

partition_style

PartitionStyle

Optional. The structure of paths containing partition data within the entity.

Mode

Additional qualifiers to define field semantics.

Enums
MODE_UNSPECIFIED Mode unspecified.
REQUIRED The field has required semantics.
NULLABLE The field has optional semantics, and may be null.
REPEATED The field has repeated (0 or more) semantics, and is a list of values.

PartitionField

Represents a key field within the entity's partition structure. You could have up to 20 partition fields, but only the first 10 partitions have the filtering ability due to performance consideration. Note: Partition fields are immutable.

Fields
name

string

Required. Partition field name must consist of letters, numbers, and underscores only, with a maximum of length of 256 characters, and must begin with a letter or underscore..

type

Type

Required. Immutable. The type of field.

PartitionStyle

The structure of paths within the entity, which represent partitions.

Enums
PARTITION_STYLE_UNSPECIFIED PartitionStyle unspecified
HIVE_COMPATIBLE Partitions are hive-compatible. Examples: gs://bucket/path/to/table/dt=2019-10-31/lang=en, gs://bucket/path/to/table/dt=2019-10-31/lang=en/late.

SchemaField

Represents a column field within a table schema.

Fields
name

string

Required. The name of the field. Must contain only letters, numbers and underscores, with a maximum length of 767 characters, and must begin with a letter or underscore.

description

string

Optional. User friendly field description. Must be less than or equal to 1024 characters.

type

Type

Required. The type of field.

mode

Mode

Required. Additional field semantics.

fields[]

SchemaField

Optional. Any nested field for complex types.

Type

Type information for fields in schemas and partition schemas.

Enums
TYPE_UNSPECIFIED SchemaType unspecified.
BOOLEAN Boolean field.
BYTE Single byte numeric field.
INT16 16-bit numeric field.
INT32 32-bit numeric field.
INT64 64-bit numeric field.
FLOAT Floating point numeric field.
DOUBLE Double precision numeric field.
DECIMAL Real value numeric field.
STRING Sequence of characters field.
BINARY Sequence of bytes field.
TIMESTAMP Date and time field.
DATE Date field.
TIME Time field.
RECORD Structured field. Nested fields that define the structure of the map. If all nested fields are nullable, this field represents a union.
NULL Null field that does not have values.

SearchEntriesRequest

Fields
name

string

Required. The project to which the request should be attributed in the following form: projects/{project}/locations/{location}.

Authorization requires the following IAM permission on the specified resource name:

  • dataplex.projects.search
query

string

Required. The query against which entries in scope should be matched.

page_size

int32

Optional. Number of results in the search page. If <=0, then defaults to 10. Max limit for page_size is 1000. Throws an invalid argument for page_size > 1000.

page_token

string

Optional. Page token received from a previous SearchEntries call. Provide this to retrieve the subsequent page.

order_by

string

Optional. Specifies the ordering of results.

scope

string

Optional. The scope under which the search should be operating. It must either be organizations/<org_id> or projects/<project_ref>. If it is unspecified, it defaults to the organization where the project provided in name is located.

SearchEntriesResponse

Fields
results[]

SearchEntriesResult

The results matching the search query.

total_size

int32

The estimated total number of matching entries. This number isn't guaranteed to be accurate.

next_page_token

string

Token to retrieve the next page of results, or empty if there are no more results in the list.

unreachable[]

string

Locations that the service couldn't reach. Search results don't include data from these locations.

SearchEntriesResult

A single result of a SearchEntries request.

Fields
linked_resource
(deprecated)

string

Linked resource name.

dataplex_entry

Entry

snippets
(deprecated)

Snippets

Snippets.

Snippets

Snippets for the entry, contains HTML-style highlighting for matched tokens, will be used in UI.

Fields
dataplex_entry
(deprecated)

Entry

Entry

Session

Represents an active analyze session running for a user.

Fields
name

string

Output only. The relative resource name of the content, of the form: projects/{project_id}/locations/{location_id}/lakes/{lake_id}/environment/{environment_id}/sessions/{session_id}

user_id

string

Output only. Email of user running the session.

create_time

Timestamp

Output only. Session start time.

state

State

Output only. State of Session

SessionEvent

These messages contain information about sessions within an environment. The monitored resource is 'Environment'.

Fields
message

string

The log message.

user_id

string

The information about the user that created the session. It will be the email address of the user.

session_id

string

Unique identifier for the session.

type

EventType

The type of the event.

event_succeeded

bool

The status of the event.

fast_startup_enabled

bool

If the session is associated with an environment with fast startup enabled, and was created before being assigned to a user.

unassigned_duration

Duration

The idle duration of a warm pooled session before it is assigned to user.

Union field detail. Additional information about the Query metadata. detail can be only one of the following:
query

QueryDetail

The execution details of the query.

EventType

The type of the event.

Enums
EVENT_TYPE_UNSPECIFIED An unspecified event type.
START Event when the session is assigned to a user.
STOP Event for stop of a session.
QUERY Query events in the session.
CREATE Event for creation of a cluster. It is not yet assigned to a user. This comes before START in the sequence

QueryDetail

Execution details of the query.

Fields
query_id

string

The unique Query id identifying the query.

query_text

string

The query text executed.

engine

Engine

Query Execution engine.

duration

Duration

Time taken for execution of the query.

result_size_bytes

int64

The size of results the query produced.

data_processed_bytes

int64

The data processed by the query.

Engine

Query Execution engine.

Enums
ENGINE_UNSPECIFIED An unspecified Engine type.
SPARK_SQL Spark-sql engine is specified in Query.
BIGQUERY BigQuery engine is specified in Query.

State

State of a resource.

Enums
STATE_UNSPECIFIED State is not specified.
ACTIVE Resource is active, i.e., ready to use.
CREATING Resource is under creation.
DELETING Resource is under deletion.
ACTION_REQUIRED Resource is active but has unresolved actions.

StorageAccess

Describes the access mechanism of the data within its storage location.

Fields
read

AccessMode

Output only. Describes the read access mechanism of the data. Not user settable.

AccessMode

Access Mode determines how data stored within the Entity is read.

Enums
ACCESS_MODE_UNSPECIFIED Access mode unspecified.
DIRECT Default. Data is accessed directly using storage APIs.
MANAGED Data is accessed through a managed interface using BigQuery APIs.

StorageFormat

Describes the format of the data within its storage location.

Fields
format

Format

Output only. The data format associated with the stored data, which represents content type values. The value is inferred from mime type.

compression_format

CompressionFormat

Optional. The compression type associated with the stored data. If unspecified, the data is uncompressed.

mime_type

string

Required. The mime type descriptor for the data. Must match the pattern {type}/{subtype}. Supported values:

  • application/x-parquet
  • application/x-avro
  • application/x-orc
  • application/x-tfrecord
  • application/x-parquet+iceberg
  • application/x-avro+iceberg
  • application/x-orc+iceberg
  • application/json
  • application/{subtypes}
  • text/csv
  • text/
  • image/{image subtype}
  • video/{video subtype}
  • audio/{audio subtype}
Union field options. Additional format-specific options. options can be only one of the following:
csv

CsvOptions

Optional. Additional information about CSV formatted data.

json

JsonOptions

Optional. Additional information about CSV formatted data.

iceberg

IcebergOptions

Optional. Additional information about iceberg tables.

CompressionFormat

The specific compressed file format of the data.

Enums
COMPRESSION_FORMAT_UNSPECIFIED CompressionFormat unspecified. Implies uncompressed data.
GZIP GZip compressed set of files.
BZIP2 BZip2 compressed set of files.

CsvOptions

Describes CSV and similar semi-structured data formats.

Fields
encoding

string

Optional. The character encoding of the data. Accepts "US-ASCII", "UTF-8", and "ISO-8859-1". Defaults to UTF-8 if unspecified.

header_rows

int32

Optional. The number of rows to interpret as header rows that should be skipped when reading data rows. Defaults to 0.

delimiter

string

Optional. The delimiter used to separate values. Defaults to ','.

quote

string

Optional. The character used to quote column values. Accepts '"' (double quotation mark) or ''' (single quotation mark). Defaults to '"' (double quotation mark) if unspecified.

Format

The specific file format of the data.

Enums
FORMAT_UNSPECIFIED Format unspecified.
PARQUET Parquet-formatted structured data.
AVRO Avro-formatted structured data.
ORC Orc-formatted structured data.
CSV Csv-formatted semi-structured data.
JSON Json-formatted semi-structured data.
IMAGE Image data formats (such as jpg and png).
AUDIO Audio data formats (such as mp3, and wav).
VIDEO Video data formats (such as mp4 and mpg).
TEXT Textual data formats (such as txt and xml).
TFRECORD TensorFlow record format.
OTHER Data that doesn't match a specific format.
UNKNOWN Data of an unknown format.

IcebergOptions

Describes Iceberg data format.

Fields
metadata_location

string

Optional. The location of where the iceberg metadata is present, must be within the table path

JsonOptions

Describes JSON data format.

Fields
encoding

string

Optional. The character encoding of the data. Accepts "US-ASCII", "UTF-8" and "ISO-8859-1". Defaults to UTF-8 if not specified.

StorageSystem

Identifies the cloud system that manages the data storage.

Enums
STORAGE_SYSTEM_UNSPECIFIED Storage system unspecified.
CLOUD_STORAGE The entity data is contained within a Cloud Storage bucket.
BIGQUERY The entity data is contained within a BigQuery dataset.

Task

A task represents a user-visible job.

Fields
name

string

Output only. The relative resource name of the task, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/ tasks/{task_id}.

uid

string

Output only. System generated globally unique ID for the task. This ID will be different if the task is deleted and re-created with the same name.

create_time

Timestamp

Output only. The time when the task was created.

update_time

Timestamp

Output only. The time when the task was last updated.

description

string

Optional. Description of the task.

display_name

string

Optional. User friendly display name.

state

State

Output only. Current state of the task.

labels

map<string, string>

Optional. User-defined labels for the task.

trigger_spec

TriggerSpec

Required. Spec related to how often and when a task should be triggered.

execution_spec

ExecutionSpec

Required. Spec related to how a task is executed.

execution_status

ExecutionStatus

Output only. Status of the latest task executions.

Union field config. Task template specific user-specified config. config can be only one of the following:
spark

SparkTaskConfig

Config related to running custom Spark tasks.

notebook

NotebookTaskConfig

Config related to running scheduled Notebooks.

ExecutionSpec

Execution related settings, like retry and service_account.

Fields
args

map<string, string>

Optional. The arguments to pass to the task. The args can use placeholders of the format ${placeholder} as part of key/value string. These will be interpolated before passing the args to the driver. Currently supported placeholders: - ${task_id} - ${job_time} To pass positional args, set the key as TASK_ARGS. The value should be a comma-separated string of all the positional arguments. To use a delimiter other than comma, refer to https://cloud.google.com/sdk/gcloud/reference/topic/escaping. In case of other keys being present in the args, then TASK_ARGS will be passed as the last argument.

service_account

string

Required. Service account to use to execute a task. If not provided, the default Compute service account for the project is used.

project

string

Optional. The project in which jobs are run. By default, the project containing the Lake is used. If a project is provided, the ExecutionSpec.service_account must belong to this project.

max_job_execution_lifetime

Duration

Optional. The maximum duration after which the job execution is expired.

kms_key

string

Optional. The Cloud KMS key to use for encryption, of the form: projects/{project_number}/locations/{location_id}/keyRings/{key-ring-name}/cryptoKeys/{key-name}.

ExecutionStatus

Status of the task execution (e.g. Jobs).

Fields
update_time

Timestamp

Output only. Last update time of the status.

latest_job

Job

Output only. latest job execution

InfrastructureSpec

Configuration for the underlying infrastructure used to run workloads.

Fields
Union field resources. Hardware config. resources can be only one of the following:
batch

BatchComputeResources

Compute resources needed for a Task when using Dataproc Serverless.

Union field runtime. Software config. runtime can be only one of the following:
container_image

ContainerImageRuntime

Container Image Runtime Configuration.

Union field network. Networking config. network can be only one of the following:
vpc_network

VpcNetwork

Vpc network.

BatchComputeResources

Batch compute resources associated with the task.

Fields
executors_count

int32

Optional. Total number of job executors. Executor Count should be between 2 and 100. [Default=2]

max_executors_count

int32

Optional. Max configurable executors. If max_executors_count > executors_count, then auto-scaling is enabled. Max Executor Count should be between 2 and 1000. [Default=1000]

ContainerImageRuntime

Container Image Runtime Configuration used with Batch execution.

Fields
image

string

Optional. Container image to use.

java_jars[]

string

Optional. A list of Java JARS to add to the classpath. Valid input includes Cloud Storage URIs to Jar binaries. For example, gs://bucket-name/my/path/to/file.jar

python_packages[]

string

Optional. A list of python packages to be installed. Valid formats include Cloud Storage URI to a PIP installable library. For example, gs://bucket-name/my/path/to/lib.tar.gz

properties

map<string, string>

Optional. Override to common configuration of open source components installed on the Dataproc cluster. The properties to set on daemon config files. Property keys are specified in prefix:property format, for example core:hadoop.tmp.dir. For more information, see Cluster properties.

VpcNetwork

Cloud VPC Network used to run the infrastructure.

Fields
network_tags[]

string

Optional. List of network tags to apply to the job.

Union field network_name. The Cloud VPC network identifier. network_name can be only one of the following:
network

string

Optional. The Cloud VPC network in which the job is run. By default, the Cloud VPC network named Default within the project is used.

sub_network

string

Optional. The Cloud VPC sub-network in which the job is run.

NotebookTaskConfig

Config for running scheduled notebooks.

Fields
notebook

string

Required. Path to input notebook. This can be the Cloud Storage URI of the notebook file or the path to a Notebook Content. The execution args are accessible as environment variables (TASK_key=value).

infrastructure_spec

InfrastructureSpec

Optional. Infrastructure specification for the execution.

file_uris[]

string

Optional. Cloud Storage URIs of files to be placed in the working directory of each executor.

archive_uris[]

string

Optional. Cloud Storage URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.

SparkTaskConfig

User-specified config for running a Spark task.

Fields
file_uris[]

string

Optional. Cloud Storage URIs of files to be placed in the working directory of each executor.

archive_uris[]

string

Optional. Cloud Storage URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.

infrastructure_spec

InfrastructureSpec

Optional. Infrastructure specification for the execution.

Union field driver. Required. The specification of the main method to call to drive the job. Specify either the jar file that contains the main class or the main class name. driver can be only one of the following:
main_jar_file_uri

string

The Cloud Storage URI of the jar file that contains the main class. The execution args are passed in as a sequence of named process arguments (--key=value).

main_class

string

The name of the driver's main class. The jar file that contains the class must be in the default CLASSPATH or specified in jar_file_uris. The execution args are passed in as a sequence of named process arguments (--key=value).

python_script_file

string

The Gcloud Storage URI of the main Python file to use as the driver. Must be a .py file. The execution args are passed in as a sequence of named process arguments (--key=value).

sql_script_file

string

A reference to a query file. This should be the Cloud Storage URI of the query file. The execution args are used to declare a set of script variables (set key="value";).

sql_script

string

The query text. The execution args are used to declare a set of script variables (set key="value";).

TriggerSpec

Task scheduling and trigger settings.

Fields
type

Type

Required. Immutable. Trigger type of the user-specified Task.

start_time

Timestamp

Optional. The first run of the task will be after this time. If not specified, the task will run shortly after being submitted if ON_DEMAND and based on the schedule if RECURRING.

disabled

bool

Optional. Prevent the task from executing. This does not cancel already running tasks. It is intended to temporarily disable RECURRING tasks.

max_retries

int32

Optional. Number of retry attempts before aborting. Set to zero to never attempt to retry a failed task.

Union field trigger. Trigger only applies for RECURRING tasks. trigger can be only one of the following:
schedule

string

Optional. Cron schedule (https://en.wikipedia.org/wiki/Cron) for running tasks periodically. To explicitly set a timezone to the cron tab, apply a prefix in the cron tab: "CRON_TZ=${IANA_TIME_ZONE}" or "TZ=${IANA_TIME_ZONE}". The ${IANA_TIME_ZONE} may only be a valid string from IANA time zone database. For example, CRON_TZ=America/New_York 1 * * * *, or TZ=America/New_York 1 * * * *. This field is required for RECURRING tasks.

Type

Determines how often and when the job will run.

Enums
TYPE_UNSPECIFIED Unspecified trigger type.
ON_DEMAND The task runs one-time shortly after Task Creation.
RECURRING The task is scheduled to run periodically.

TransferStatus

Denotes the transfer status of a resource. It is unspecified for resources created from Dataplex API.

Enums
TRANSFER_STATUS_UNSPECIFIED The default value. It is set for resources that were not subject for migration from Data Catalog service.
TRANSFER_STATUS_MIGRATED Indicates that a resource was migrated from Data Catalog service but it hasn't been transferred yet. In particular the resource cannot be updated from Dataplex API.
TRANSFER_STATUS_TRANSFERRED Indicates that a resource was transferred from Data Catalog service. The resource can only be updated from Dataplex API.

Trigger

DataScan scheduling and trigger settings.

Fields

Union field mode. DataScan scheduling and trigger settings.

If not specified, the default is onDemand. mode can be only one of the following:

on_demand

OnDemand

The scan runs once via RunDataScan API.

schedule

Schedule

The scan is scheduled to run periodically.

OnDemand

This type has no fields.

The scan runs once via RunDataScan API.

Schedule

The scan is scheduled to run periodically.

Fields
cron

string

Required. Cron schedule for running scans periodically.

To explicitly set a timezone in the cron tab, apply a prefix in the cron tab: "CRON_TZ=${IANA_TIME_ZONE}" or "TZ=${IANA_TIME_ZONE}". The ${IANA_TIME_ZONE} may only be a valid string from IANA time zone database (wikipedia). For example, CRON_TZ=America/New_York 1 * * * *, or TZ=America/New_York 1 * * * *.

This field is required for Schedule scans.

UpdateAspectTypeRequest

Update AspectType Request

Fields
aspect_type

AspectType

Required. AspectType Resource

Authorization requires the following IAM permission on the specified resource aspectType:

  • dataplex.aspectTypes.update
update_mask

FieldMask

Required. Mask of fields to update.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

UpdateAssetRequest

Update asset request.

Fields
update_mask

FieldMask

Required. Mask of fields to update.

asset

Asset

Required. Update description. Only fields specified in update_mask are updated.

Authorization requires the following IAM permission on the specified resource asset:

  • dataplex.assets.update
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

UpdateContentRequest

Update content request.

Fields
update_mask

FieldMask

Required. Mask of fields to update.

content

Content

Required. Update description. Only fields specified in update_mask are updated.

Authorization requires the following IAM permission on the specified resource content:

  • dataplex.content.update
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

UpdateDataAttributeBindingRequest

Update DataAttributeBinding request.

Fields
update_mask

FieldMask

Required. Mask of fields to update.

data_attribute_binding

DataAttributeBinding

Required. Only fields specified in update_mask are updated.

Authorization requires the following IAM permission on the specified resource dataAttributeBinding:

  • dataplex.dataAttributeBindings.update
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

UpdateDataAttributeRequest

Update DataAttribute request.

Fields
update_mask

FieldMask

Required. Mask of fields to update.

data_attribute

DataAttribute

Required. Only fields specified in update_mask are updated.

Authorization requires the following IAM permission on the specified resource dataAttribute:

  • dataplex.dataAttributes.update
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

UpdateDataScanRequest

Update dataScan request.

Fields
data_scan

DataScan

Required. DataScan resource to be updated.

Only fields specified in update_mask are updated.

Authorization requires the following IAM permission on the specified resource dataScan:

  • dataplex.datascans.update
update_mask

FieldMask

Required. Mask of fields to update.

validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

UpdateDataTaxonomyRequest

Update DataTaxonomy request.

Fields
update_mask

FieldMask

Required. Mask of fields to update.

data_taxonomy

DataTaxonomy

Required. Only fields specified in update_mask are updated.

Authorization requires the following IAM permission on the specified resource dataTaxonomy:

  • dataplex.dataTaxonomies.update
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

UpdateEntityRequest

Update a metadata entity request. The exiting entity will be fully replaced by the entity in the request. The entity ID is mutable. To modify the ID, use the current entity ID in the request URL and specify the new ID in the request body.

Fields
entity

Entity

Required. Update description.

Authorization requires the following IAM permission on the specified resource entity:

  • dataplex.entities.update
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

UpdateEntryGroupRequest

Update EntryGroup Request.

Fields
entry_group

EntryGroup

Required. EntryGroup Resource.

Authorization requires the following IAM permission on the specified resource entryGroup:

  • dataplex.entryGroups.update
update_mask

FieldMask

Required. Mask of fields to update.

validate_only

bool

Optional. The service validates the request, without performing any mutations. The default is false.

UpdateEntryRequest

Update Entry request.

Fields
entry

Entry

Required. Entry resource.

update_mask

FieldMask

Optional. Mask of fields to update. To update Aspects, the update_mask must contain the value "aspects".

If the update_mask is empty, the service will update all modifiable fields present in the request.

allow_missing

bool

Optional. If set to true and the entry doesn't exist, the service will create it.

delete_missing_aspects

bool

Optional. If set to true and the aspect_keys specify aspect ranges, the service deletes any existing aspects from that range that weren't provided in the request.

aspect_keys[]

string

Optional. The map keys of the Aspects which the service should modify. It supports the following syntaxes:

  • <aspect_type_reference> - matches an aspect of the given type and empty path.
  • <aspect_type_reference>@path - matches an aspect of the given type and specified path. For example, to attach an aspect to a field that is specified by the schema aspect, the path should have the format Schema.<field_name>.
  • <aspect_type_reference>* - matches aspects of the given type for all paths.
  • *@path - matches aspects of all types on the given path.

The service will not remove existing aspects matching the syntax unless delete_missing_aspects is set to true.

If this field is left empty, the service treats it as specifying exactly those Aspects present in the request.

UpdateEntryTypeRequest

Update EntryType Request.

Fields
entry_type

EntryType

Required. EntryType Resource.

Authorization requires the following IAM permission on the specified resource entryType:

  • dataplex.entryTypes.update
update_mask

FieldMask

Required. Mask of fields to update.

validate_only

bool

Optional. The service validates the request without performing any mutations. The default is false.

UpdateEnvironmentRequest

Update environment request.

Fields
update_mask

FieldMask

Required. Mask of fields to update.

environment

Environment

Required. Update description. Only fields specified in update_mask are updated.

Authorization requires the following IAM permission on the specified resource environment:

  • dataplex.environments.update
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

UpdateLakeRequest

Update lake request.

Fields
update_mask

FieldMask

Required. Mask of fields to update.

lake

Lake

Required. Update description. Only fields specified in update_mask are updated.

Authorization requires the following IAM permission on the specified resource lake:

  • dataplex.lakes.update
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

UpdateTaskRequest

Update task request.

Fields
update_mask

FieldMask

Required. Mask of fields to update.

task

Task

Required. Update description. Only fields specified in update_mask are updated.

Authorization requires the following IAM permission on the specified resource task:

  • dataplex.tasks.update
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

UpdateZoneRequest

Update zone request.

Fields
update_mask

FieldMask

Required. Mask of fields to update.

zone

Zone

Required. Update description. Only fields specified in update_mask are updated.

Authorization requires the following IAM permission on the specified resource zone:

  • dataplex.zones.update
validate_only

bool

Optional. Only validate the request, but do not perform mutations. The default is false.

Zone

A zone represents a logical group of related assets within a lake. A zone can be used to map to organizational structure or represent stages of data readiness from raw to curated. It provides managing behavior that is shared or inherited by all contained assets.

Fields
name

string

Output only. The relative resource name of the zone, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}.

display_name

string

Optional. User friendly display name.

uid

string

Output only. System generated globally unique ID for the zone. This ID will be different if the zone is deleted and re-created with the same name.

create_time

Timestamp

Output only. The time when the zone was created.

update_time

Timestamp

Output only. The time when the zone was last updated.

labels

map<string, string>

Optional. User defined labels for the zone.

description

string

Optional. Description of the zone.

state

State

Output only. Current state of the zone.

type

Type

Required. Immutable. The type of the zone.

discovery_spec

DiscoverySpec

Optional. Specification of the discovery feature applied to data in this zone.

resource_spec

ResourceSpec

Required. Specification of the resources that are referenced by the assets within this zone.

asset_status

AssetStatus

Output only. Aggregated status of the underlying assets of the zone.

DiscoverySpec

Settings to manage the metadata discovery and publishing in a zone.

Fields
enabled

bool

Required. Whether discovery is enabled.

include_patterns[]

string

Optional. The list of patterns to apply for selecting data to include during discovery if only a subset of the data should considered. For Cloud Storage bucket assets, these are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these are interpreted as patterns to match table names.

exclude_patterns[]

string

Optional. The list of patterns to apply for selecting data to exclude during discovery. For Cloud Storage bucket assets, these are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these are interpreted as patterns to match table names.

csv_options

CsvOptions

Optional. Configuration for CSV data.

json_options

JsonOptions

Optional. Configuration for Json data.

Union field trigger. Determines when discovery is triggered. trigger can be only one of the following:
schedule

string

Optional. Cron schedule (https://en.wikipedia.org/wiki/Cron) for running discovery periodically. Successive discovery runs must be scheduled at least 60 minutes apart. The default value is to run discovery every 60 minutes. To explicitly set a timezone to the cron tab, apply a prefix in the cron tab: "CRON_TZ=${IANA_TIME_ZONE}" or TZ=${IANA_TIME_ZONE}". The ${IANA_TIME_ZONE} may only be a valid string from IANA time zone database. For example, CRON_TZ=America/New_York 1 * * * *, or TZ=America/New_York 1 * * * *.

CsvOptions

Describe CSV and similar semi-structured data formats.

Fields
header_rows

int32

Optional. The number of rows to interpret as header rows that should be skipped when reading data rows.

delimiter

string

Optional. The delimiter being used to separate values. This defaults to ','.

encoding

string

Optional. The character encoding of the data. The default is UTF-8.

disable_type_inference

bool

Optional. Whether to disable the inference of data type for CSV data. If true, all columns will be registered as strings.

JsonOptions

Describe JSON data format.

Fields
encoding

string

Optional. The character encoding of the data. The default is UTF-8.

disable_type_inference

bool

Optional. Whether to disable the inference of data type for Json data. If true, all columns will be registered as their primitive types (strings, number or boolean).

ResourceSpec

Settings for resources attached as assets within a zone.

Fields
location_type

LocationType

Required. Immutable. The location type of the resources that are allowed to be attached to the assets within this zone.

LocationType

Location type of the resources attached to a zone.

Enums
LOCATION_TYPE_UNSPECIFIED Unspecified location type.
SINGLE_REGION Resources that are associated with a single region.
MULTI_REGION Resources that are associated with a multi-region location.

Type

Type of zone.

Enums
TYPE_UNSPECIFIED Zone type not specified.
RAW A zone that contains data that needs further processing before it is considered generally ready for consumption and analytics workloads.
CURATED A zone that contains data that is considered to be ready for broader consumption and analytics workloads. Curated structured data stored in Cloud Storage must conform to certain file formats (parquet, avro and orc) and organized in a hive-compatible directory layout.