REST Resource: projects.locations.datasets.tableSpecs.columnSpecs

Resource: ColumnSpec

A representation of a column in a relational table. When listing them, column specs are returned in the same order in which they were given on import . Used by: * Tables

JSON representation
{
  "name": string,
  "dataType": {
    object (DataType)
  },
  "displayName": string,
  "dataStats": {
    object (DataStats)
  },
  "topCorrelatedColumns": [
    {
      object (CorrelatedColumn)
    }
  ],
  "etag": string
}
Fields
name

string

Output only. The resource name of the column specs. Form:

projects/{project_id}/locations/{locationId}/datasets/{datasetId}/tableSpecs/{tableSpecId}/columnSpecs/{columnSpecId}

dataType

object (DataType)

The data type of elements stored in the column.

displayName

string

Output only. The name of the column to show in the interface. The name can be up to 100 characters long and can consist only of ASCII Latin letters A-Z and a-z, ASCII digits 0-9, underscores(_), and forward slashes(/), and must start with a letter or a digit.

dataStats

object (DataStats)

Output only. Stats of the series of values in the column. This field may be stale, see the ancestor's Dataset.tables_dataset_metadata.stats_update_time field for the timestamp at which these stats were last updated.

topCorrelatedColumns[]

object (CorrelatedColumn)

Deprecated.

etag

string

Used to perform consistent read-modify-write updates. If not set, a blind "overwrite" update happens.

DataType

Indicated the type of data that can be stored in a structured data entity (e.g. a table).

JSON representation
{
  "typeCode": enum (TypeCode),
  "nullable": boolean,

  // Union field details can be only one of the following:
  "listElementType": {
    object (DataType)
  },
  "structType": {
    object (StructType)
  },
  "timeFormat": string
  // End of list of possible types for union field details.
}
Fields
typeCode

enum (TypeCode)

Required. The TypeCode for this type.

nullable

boolean

If true, this DataType can also be NULL. In .CSV files NULL value is expressed as an empty string.

Union field details. Details of DataType-s that need additional specification. details can be only one of the following:
listElementType

object (DataType)

If typeCode == ARRAY, then listElementType is the type of the elements.

structType

object (StructType)

If typeCode == STRUCT, then structType provides type information for the struct's fields.

timeFormat

string

If typeCode == TIMESTAMP then timeFormat provides the format in which that time field is expressed. The timeFormat must either be one of: * UNIX_SECONDS * UNIX_MILLISECONDS * UNIX_MICROSECONDS * UNIX_NANOSECONDS (for respectively number of seconds, milliseconds, microseconds and nanoseconds since start of the Unix epoch); or be written in strftime syntax. If timeFormat is not set, then the default format as described on the typeCode is used.

StructType

StructType defines the DataType-s of a STRUCT type.

JSON representation
{
  "fields": {
    string: {
      object(DataType)
    },
    ...
  }
}
Fields
fields

map (key: string, value: object (DataType))

Unordered map of struct field names to their data types. Fields cannot be added or removed via Update. Their names and data types are still mutable.

TypeCode

TypeCode is used as a part of DataType.

Enums
TYPE_CODE_UNSPECIFIED Not specified. Should not be used.
FLOAT64 Encoded as number, or the strings "NaN", "Infinity", or "-Infinity".
TIMESTAMP Must be between 0AD and 9999AD. Encoded as string according to timeFormat, or, if that format is not set, then in RFC 3339 date-time format, where time-offset = "Z" (e.g. 1985-04-12T23:20:50.52Z).
STRING Encoded as string.
ARRAY

Encoded as list, where the list elements are represented according to

listElementType.

STRUCT Encoded as struct, where field values are represented according to structType.
CATEGORY Values of this type are not further understood by AutoML, e.g. AutoML is unable to tell the order of values (as it could with FLOAT64), or is unable to say if one value contains another (as it could with STRING). Encoded as string (bytes should be base64-encoded, as described in RFC 4648, section 4).

DataStats

The data statistics of a series of values that share the same DataType.

JSON representation
{
  "distinctValueCount": string,
  "nullValueCount": string,
  "validValueCount": string,

  // Union field stats can be only one of the following:
  "float64Stats": {
    object (Float64Stats)
  },
  "stringStats": {
    object (StringStats)
  },
  "timestampStats": {
    object (TimestampStats)
  },
  "arrayStats": {
    object (ArrayStats)
  },
  "structStats": {
    object (StructStats)
  },
  "categoryStats": {
    object (CategoryStats)
  }
  // End of list of possible types for union field stats.
}
Fields
distinctValueCount

string (int64 format)

The number of distinct values.

nullValueCount

string (int64 format)

The number of values that are null.

validValueCount

string (int64 format)

The number of values that are valid.

Union field stats. The data statistics specific to a DataType. stats can be only one of the following:
float64Stats

object (Float64Stats)

The statistics for FLOAT64 DataType.

stringStats

object (StringStats)

The statistics for STRING DataType.

timestampStats

object (TimestampStats)

The statistics for TIMESTAMP DataType.

arrayStats

object (ArrayStats)

The statistics for ARRAY DataType.

structStats

object (StructStats)

The statistics for STRUCT DataType.

categoryStats

object (CategoryStats)

The statistics for CATEGORY DataType.

Float64Stats

The data statistics of a series of FLOAT64 values.

JSON representation
{
  "mean": number,
  "standardDeviation": number,
  "quantiles": [
    number
  ],
  "histogramBuckets": [
    {
      object (HistogramBucket)
    }
  ]
}
Fields
mean

number

The mean of the series.

standardDeviation

number

The standard deviation of the series.

quantiles[]

number

Ordered from 0 to k k-quantile values of the data series of n values. The value at index i is, approximately, the i*n/k-th smallest value in the series; for i = 0 and i = k these are, respectively, the min and max values.

histogramBuckets[]

object (HistogramBucket)

Histogram buckets of the data series. Sorted by the min value of the bucket, ascendingly, and the number of the buckets is dynamically generated. The buckets are non-overlapping and completely cover whole FLOAT64 range with min of first bucket being "-Infinity", and max of the last one being "Infinity".

HistogramBucket

A bucket of a histogram.

JSON representation
{
  "min": number,
  "max": number,
  "count": string
}
Fields
min

number

The minimum value of the bucket, inclusive.

max

number

The maximum value of the bucket, exclusive unless max = "Infinity", in which case it's inclusive.

count

string (int64 format)

The number of data values that are in the bucket, i.e. are between min and max values.

StringStats

The data statistics of a series of STRING values.

JSON representation
{
  "topUnigramStats": [
    {
      object (UnigramStats)
    }
  ]
}
Fields
topUnigramStats[]

object (UnigramStats)

The statistics of the top 20 unigrams, ordered by count.

UnigramStats

The statistics of a unigram.

JSON representation
{
  "value": string,
  "count": string
}
Fields
value

string

The unigram.

count

string (int64 format)

The number of occurrences of this unigram in the series.

TimestampStats

The data statistics of a series of TIMESTAMP values.

JSON representation
{
  "granularStats": {
    string: {
      object(GranularStats)
    },
    ...
  }
}
Fields
granularStats

map (key: string, value: object (GranularStats))

The string key is the pre-defined granularity. Currently supported: hour_of_day, day_of_week, month_of_year. Granularities finer that the granularity of timestamp data are not populated (e.g. if timestamps are at day granularity, then hour_of_day is not populated).

ArrayStats

The data statistics of a series of ARRAY values.

JSON representation
{
  "memberStats": {
    object (DataStats)
  }
}
Fields
memberStats

object (DataStats)

Stats of all the values of all arrays, as if they were a single long series of data. The type depends on the element type of the array.

StructStats

The data statistics of a series of STRUCT values.

JSON representation
{
  "fieldStats": {
    string: {
      object(DataStats)
    },
    ...
  }
}
Fields
fieldStats

map (key: string, value: object (DataStats))

Map from a field name of the struct to data stats aggregated over series of all data in that field across all the structs.

CategoryStats

The data statistics of a series of CATEGORY values.

JSON representation
{
  "topCategoryStats": [
    {
      object (SingleCategoryStats)
    }
  ]
}
Fields
topCategoryStats[]

object (SingleCategoryStats)

The statistics of the top 20 CATEGORY values, ordered by

count.

SingleCategoryStats

The statistics of a single CATEGORY value.

JSON representation
{
  "value": string,
  "count": string
}
Fields
value

string

The CATEGORY value.

count

string (int64 format)

The number of occurrences of this value in the series.

CorrelatedColumn

Identifies the table's column, and its correlation with the column this ColumnSpec describes.

JSON representation
{
  "columnSpecId": string,
  "correlationStats": {
    object (CorrelationStats)
  }
}
Fields
columnSpecId

string

The columnSpecId of the correlated column, which belongs to the same table as the in-context column.

correlationStats

object (CorrelationStats)

Correlation between this and the in-context column.

CorrelationStats

A correlation statistics between two series of DataType values. The series may have differing DataType-s, but within a single series the DataType must be the same.

JSON representation
{
  "cramersV": number
}
Fields
cramersV

number

The correlation value using the Cramer's V measure.

Methods

get

Gets a column spec.

list

Lists column specs in a table spec.

patch

Updates a column spec.