REST Resource: projects.locations.datasets.tableSpecs

Resource: TableSpec

A specification of a relational table. The table's schema is represented via its child column specs. It is pre-populated as part of datasets.importData by schema inference algorithm, the version of which is a required parameter of datasets.importData InputConfig.

Note: While working with a table, at times the schema may be inconsistent with the data in the table (e.g. string in a FLOAT64 column). The consistency validation is done upon creation of a model.

JSON representation
{
  "name": string,
  "timeColumnSpecId": string,
  "rowCount": string,
  "columnCount": string,
  "inputConfigs": [
    {
      object (InputConfig)
    }
  ],
  "etag": string
}
Fields
name

string

Output only. The resource name of the table spec. Form:

projects/{project_id}/locations/{locationId}/datasets/{datasetId}/tableSpecs/{tableSpecId}

timeColumnSpecId

string

columnSpecId of the time column. Only used if the parent dataset's mlUseColumnSpecId is not set. Used to split rows into TRAIN, VALIDATE and TEST sets such that oldest rows go to TRAIN set, newest to TEST, and those in between to VALIDATE. Required type: TIMESTAMP. If both this column and ml_use_column are not set, then ML use of all rows will be assigned by AutoML. NOTE: Updates of this field will instantly affect any other users concurrently working with the dataset.

rowCount

string (int64 format)

Output only. The number of rows (i.e. examples) in the table.

columnCount

string (int64 format)

Output only. The number of columns of the table. That is, the number of child ColumnSpec-s.

inputConfigs[]

object (InputConfig)

Output only. Input configs via which data currently residing in the table had been imported.

etag

string

Used to perform consistent read-modify-write updates. If not set, a blind "overwrite" update happens.

InputConfig

Input configuration for datasets.importData action.

See Preparing your training data for more information.

You can provide model training data to AutoML Tables in two ways:

  • A BigQuery table. Specify the BigQuery URI in the bigquerySource field of your input configuration. The size of the table cannot exceed 100 GB.

  • Comma-separated values (CSV) files. Store the CSV files in Google Cloud Storage and specify the URIs to the CSV files in the gcsSource field of your input configuration. Each CSV file cannot exceed 10 GB in size, and the total size of all CSV files cannot exceed 100 GB.

The first file specified must have a header containing column names. If the first row of a subsequent file is the same as the header of the first specified file, then that row is also treated as a header. All other rows contain values for the corresponding columns.

If any of the provided CSV files can't be parsed or if more than certain percent of CSV rows cannot be processed then the operation fails and nothing is imported. Regardless of overall success or failure the per-row failures, up to a certain count cap, will be listed in Operation.metadata.partial_failures.

AutoML Tables limits the amount of table data that you can import to at least 1,000 and no more than 100,000,000 rows with at least 2 and no more than 1,000 columns. AutoML Tables infers the schema from the data when the data is imported. You can have at most five import requests running in parallel.

The Google Cloud Storage bucket must be Regional, and must reside in the us-central1 region.

JSON representation
{
  "params": {
    string: string,
    ...
  },

  // Union field source can be only one of the following:
  "gcsSource": {
    object (GcsSource)
  },
  "bigquerySource": {
    object (BigQuerySource)
  }
  // End of list of possible types for union field source.
}
Fields
params

map (key: string, value: string)

Additional domain-specific parameters describing the semantic of the imported data, any string must be up to 25000 characters long.

You must supply the following fields:

  • schema_inference_version - (integer) Required. The version of the algorithm that should be used for the initial inference of the schema--the column data types--of the table that you are importing data into. Allowed values: "1".

Union field source. The source of the input. source can be only one of the following:
gcsSource

object (GcsSource)

The Google Cloud Storage location for the input content. In datasets.importData, the gcsSource points to a csv with structure described in the comment.

bigquerySource

object (BigQuerySource)

The BigQuery location for the input content.

GcsSource

The Google Cloud Storage location for the input content.

JSON representation
{
  "inputUris": [
    string
  ]
}
Fields
inputUris[]

string

Required. Google Cloud Storage URIs to input files, up to 2000 characters long. Accepted forms: * Full object path, e.g. gs://bucket/directory/object.csv

BigQuerySource

The BigQuery location for the input content.

JSON representation
{
  "inputUri": string
}
Fields
inputUri

string

Required. BigQuery URI to a table, up to 2000 characters long. Accepted forms: * BigQuery path e.g. bq://projectId.bqDatasetId.bqTableId

Methods

get

Gets a table spec.

list

Lists table specs in a dataset.

patch

Updates a table spec.
هل كانت هذه الصفحة مفيدة؟ يرجى تقييم أدائنا:

إرسال تعليقات حول...