DataDiscoverySpec

Spec for a data discovery scan.

JSON representation
{
  "bigqueryPublishingConfig": {
    object (BigQueryPublishingConfig)
  },

  // Union field resource_config can be only one of the following:
  "storageConfig": {
    object (StorageConfig)
  }
  // End of list of possible types for union field resource_config.
}
Fields
bigqueryPublishingConfig

object (BigQueryPublishingConfig)

Optional. Configuration for metadata publishing.

Union field resource_config. The configurations of the data discovery scan resource. resource_config can be only one of the following:
storageConfig

object (StorageConfig)

Cloud Storage related configurations.

BigQueryPublishingConfig

Describes BigQuery publishing configurations.

JSON representation
{
  "tableType": enum (TableType),
  "connection": string
}
Fields
tableType

enum (TableType)

Optional. Determines whether to publish discovered tables as BigLake external tables or non-BigLake external tables.

connection

string

Optional. The BigQuery connection used to create BigLake tables. Must be in the form projects/{projectId}/locations/{locationId}/connections/{connection_id}

TableType

Determines how discovered tables are published.

Enums
TABLE_TYPE_UNSPECIFIED Table type unspecified.
EXTERNAL Default. Discovered tables are published as BigQuery external tables whose data is accessed using the credentials of the user querying the table.
BIGLAKE Discovered tables are published as BigLake external tables whose data is accessed using the credentials of the associated BigQuery connection.

StorageConfig

Configurations related to Cloud Storage as the data source.

JSON representation
{
  "includePatterns": [
    string
  ],
  "excludePatterns": [
    string
  ],
  "csvOptions": {
    object (CsvOptions)
  },
  "jsonOptions": {
    object (JsonOptions)
  }
}
Fields
includePatterns[]

string

Optional. Defines the data to include during discovery when only a subset of the data should be considered. Provide a list of patterns that identify the data to include. For Cloud Storage bucket assets, these patterns are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these patterns are interpreted as patterns to match table names.

excludePatterns[]

string

Optional. Defines the data to exclude during discovery. Provide a list of patterns that identify the data to exclude. For Cloud Storage bucket assets, these patterns are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these patterns are interpreted as patterns to match table names.

csvOptions

object (CsvOptions)

Optional. Configuration for CSV data.

jsonOptions

object (JsonOptions)

Optional. Configuration for JSON data.

CsvOptions

Describes CSV and similar semi-structured data formats.

JSON representation
{
  "headerRows": integer,
  "delimiter": string,
  "encoding": string,
  "typeInferenceDisabled": boolean,
  "quote": string
}
Fields
headerRows

integer

Optional. The number of rows to interpret as header rows that should be skipped when reading data rows.

delimiter

string

Optional. The delimiter that is used to separate values. The default is , (comma).

encoding

string

Optional. The character encoding of the data. The default is UTF-8.

typeInferenceDisabled

boolean

Optional. Whether to disable the inference of data types for CSV data. If true, all columns are registered as strings.

quote

string

Optional. The character used to quote column values. Accepts " (double quotation mark) or ' (single quotation mark). If unspecified, defaults to " (double quotation mark).

JsonOptions

Describes JSON data format.

JSON representation
{
  "encoding": string,
  "typeInferenceDisabled": boolean
}
Fields
encoding

string

Optional. The character encoding of the data. The default is UTF-8.

typeInferenceDisabled

boolean

Optional. Whether to disable the inference of data types for JSON data. If true, all columns are registered as their primitive types (strings, number, or boolean).