DataProfileResult

DataProfileResult defines the output of DataProfileScan. Each field of the table will have field type specific profile result.

JSON representation
{
  "rowCount": string,
  "profile": {
    object (Profile)
  },
  "scannedData": {
    object (ScannedData)
  },
  "postScanActionsResult": {
    object (PostScanActionsResult)
  }
}
Fields
rowCount

string (int64 format)

The count of rows scanned.

profile

object (Profile)

The profile information per field.

scannedData

object (ScannedData)

The data scanned for this result.

postScanActionsResult

object (PostScanActionsResult)

Output only. The result of post scan actions.

Profile

Contains name, type, mode and field type specific profile information.

JSON representation
{
  "fields": [
    {
      object (Field)
    }
  ]
}
Fields
fields[]

object (Field)

List of fields with structural and profile information for each field.

Field

A field within a table.

JSON representation
{
  "name": string,
  "type": string,
  "mode": string,
  "profile": {
    object (ProfileInfo)
  }
}
Fields
name

string

The name of the field.

type

string

The data type retrieved from the schema of the data source. For instance, for a BigQuery native table, it is the BigQuery Table Schema. For a Dataplex Entity, it is the Entity Schema.

mode

string

The mode of the field. Possible values include:

  • REQUIRED, if it is a required field.
  • NULLABLE, if it is an optional field.
  • REPEATED, if it is a repeated field.
profile

object (ProfileInfo)

Profile information for the corresponding field.

ProfileInfo

The profile information for each field type.

JSON representation
{
  "nullRatio": number,
  "distinctRatio": number,
  "topNValues": [
    {
      object (TopNValue)
    }
  ],

  // Union field field_info can be only one of the following:
  "stringProfile": {
    object (StringFieldInfo)
  },
  "integerProfile": {
    object (IntegerFieldInfo)
  },
  "doubleProfile": {
    object (DoubleFieldInfo)
  }
  // End of list of possible types for union field field_info.
}
Fields
nullRatio

number

Ratio of rows with null value against total scanned rows.

distinctRatio

number

Ratio of rows with distinct values against total scanned rows. Not available for complex non-groupable field type RECORD and fields with REPEATABLE mode.

topNValues[]

object (TopNValue)

The list of top N non-null values, frequency and ratio with which they occur in the scanned data. N is 10 or equal to the number of distinct values in the field, whichever is smaller. Not available for complex non-groupable field type RECORD and fields with REPEATABLE mode.

Union field field_info. Structural and profile information for specific field type. Not available, if mode is REPEATABLE. field_info can be only one of the following:
stringProfile

object (StringFieldInfo)

String type field information.

integerProfile

object (IntegerFieldInfo)

Integer type field information.

doubleProfile

object (DoubleFieldInfo)

Double type field information.

TopNValue

Top N non-null values in the scanned data.

JSON representation
{
  "value": string,
  "count": string,
  "ratio": number
}
Fields
value

string

String value of a top N non-null value.

count

string (int64 format)

Count of the corresponding value in the scanned data.

ratio

number

Ratio of the corresponding value in the field against the total number of rows in the scanned data.

StringFieldInfo

The profile information for a string type field.

JSON representation
{
  "minLength": string,
  "maxLength": string,
  "averageLength": number
}
Fields
minLength

string (int64 format)

Minimum length of non-null values in the scanned data.

maxLength

string (int64 format)

Maximum length of non-null values in the scanned data.

averageLength

number

Average length of non-null values in the scanned data.

IntegerFieldInfo

The profile information for an integer type field.

JSON representation
{
  "average": number,
  "standardDeviation": number,
  "min": string,
  "quartiles": [
    string
  ],
  "max": string
}
Fields
average

number

Average of non-null values in the scanned data. NaN, if the field has a NaN.

standardDeviation

number

Standard deviation of non-null values in the scanned data. NaN, if the field has a NaN.

min

string (int64 format)

Minimum of non-null values in the scanned data. NaN, if the field has a NaN.

quartiles[]

string (int64 format)

A quartile divides the number of data points into four parts, or quarters, of more-or-less equal size. Three main quartiles used are: The first quartile (Q1) splits off the lowest 25% of data from the highest 75%. It is also known as the lower or 25th empirical quartile, as 25% of the data is below this point. The second quartile (Q2) is the median of a data set. So, 50% of the data lies below this point. The third quartile (Q3) splits off the highest 25% of data from the lowest 75%. It is known as the upper or 75th empirical quartile, as 75% of the data lies below this point. Here, the quartiles is provided as an ordered list of approximate quartile values for the scanned data, occurring in order Q1, median, Q3.

max

string (int64 format)

Maximum of non-null values in the scanned data. NaN, if the field has a NaN.

DoubleFieldInfo

The profile information for a double type field.

JSON representation
{
  "average": number,
  "standardDeviation": number,
  "min": number,
  "quartiles": [
    number
  ],
  "max": number
}
Fields
average

number

Average of non-null values in the scanned data. NaN, if the field has a NaN.

standardDeviation

number

Standard deviation of non-null values in the scanned data. NaN, if the field has a NaN.

min

number

Minimum of non-null values in the scanned data. NaN, if the field has a NaN.

quartiles[]

number

A quartile divides the number of data points into four parts, or quarters, of more-or-less equal size. Three main quartiles used are: The first quartile (Q1) splits off the lowest 25% of data from the highest 75%. It is also known as the lower or 25th empirical quartile, as 25% of the data is below this point. The second quartile (Q2) is the median of a data set. So, 50% of the data lies below this point. The third quartile (Q3) splits off the highest 25% of data from the lowest 75%. It is known as the upper or 75th empirical quartile, as 75% of the data lies below this point. Here, the quartiles is provided as an ordered list of quartile values for the scanned data, occurring in order Q1, median, Q3.

max

number

Maximum of non-null values in the scanned data. NaN, if the field has a NaN.

PostScanActionsResult

The result of post scan actions of DataProfileScan job.

JSON representation
{
  "bigqueryExportResult": {
    object (BigQueryExportResult)
  }
}
Fields
bigqueryExportResult

object (BigQueryExportResult)

Output only. The result of BigQuery export post scan action.

BigQueryExportResult

The result of BigQuery export post scan action.

JSON representation
{
  "state": enum (State),
  "message": string
}
Fields
state

enum (State)

Output only. Execution state for the BigQuery exporting.

message

string

Output only. Additional information about the BigQuery exporting.

State

Execution state for the exporting.

Enums
STATE_UNSPECIFIED The exporting state is unspecified.
SUCCEEDED The exporting completed successfully.
FAILED The exporting is no longer running due to an error.
SKIPPED The exporting is skipped due to no valid scan result to export (usually caused by scan failed).