REST Resource: projects.locations.datasets

Resource: Dataset

A workspace for solving a single, particular machine learning (ML) problem. A workspace contains examples that may be annotated.

JSON representation
{
  "name": string,
  "displayName": string,
  "description": string,
  "exampleCount": integer,
  "createTime": string,
  "etag": string,

  // Union field dataset_metadata can be only one of the following:
  "translationDatasetMetadata": {
    object (TranslationDatasetMetadata)
  },
  "imageClassificationDatasetMetadata": {
    object (ImageClassificationDatasetMetadata)
  },
  "textClassificationDatasetMetadata": {
    object (TextClassificationDatasetMetadata)
  },
  "imageObjectDetectionDatasetMetadata": {
    object (ImageObjectDetectionDatasetMetadata)
  },
  "videoClassificationDatasetMetadata": {
    object (VideoClassificationDatasetMetadata)
  },
  "videoObjectTrackingDatasetMetadata": {
    object (VideoObjectTrackingDatasetMetadata)
  },
  "textExtractionDatasetMetadata": {
    object (TextExtractionDatasetMetadata)
  },
  "textSentimentDatasetMetadata": {
    object (TextSentimentDatasetMetadata)
  },
  "tablesDatasetMetadata": {
    object (TablesDatasetMetadata)
  }
  // End of list of possible types for union field dataset_metadata.
}
Fields
name

string

Output only. The resource name of the dataset. Form: projects/{project_id}/locations/{locationId}/datasets/{datasetId}

displayName

string

Required. The name of the dataset to show in the interface. The name can be up to 32 characters long and can consist only of ASCII Latin letters A-Z and a-z, underscores (_), and ASCII digits 0-9.

description

string

User-provided description of the dataset. The description can be up to 25000 characters long.

exampleCount

integer

Output only. The number of examples in the dataset.

createTime

string (Timestamp format)

Output only. Timestamp when this dataset was created.

A timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds. Example: "2014-10-02T15:01:23.045123456Z".

etag

string

Used to perform consistent read-modify-write updates. If not set, a blind "overwrite" update happens.

Union field dataset_metadata. Required. The dataset metadata that is specific to the problem type. dataset_metadata can be only one of the following:
translationDatasetMetadata

object (TranslationDatasetMetadata)

Metadata for a dataset used for translation.

imageClassificationDatasetMetadata

object (ImageClassificationDatasetMetadata)

Metadata for a dataset used for image classification.

textClassificationDatasetMetadata

object (TextClassificationDatasetMetadata)

Metadata for a dataset used for text classification.

imageObjectDetectionDatasetMetadata

object (ImageObjectDetectionDatasetMetadata)

Metadata for a dataset used for image object detection.

videoClassificationDatasetMetadata

object (VideoClassificationDatasetMetadata)

Metadata for a dataset used for video classification.

videoObjectTrackingDatasetMetadata

object (VideoObjectTrackingDatasetMetadata)

Metadata for a dataset used for video object tracking.

textExtractionDatasetMetadata

object (TextExtractionDatasetMetadata)

Metadata for a dataset used for text extraction.

textSentimentDatasetMetadata

object (TextSentimentDatasetMetadata)

Metadata for a dataset used for text sentiment.

tablesDatasetMetadata

object (TablesDatasetMetadata)

Metadata for a dataset used for Tables.

TranslationDatasetMetadata

Dataset metadata that is specific to translation.

JSON representation
{
  "sourceLanguageCode": string,
  "targetLanguageCode": string
}
Fields
sourceLanguageCode

string

Required. The BCP-47 language code of the source language.

targetLanguageCode

string

Required. The BCP-47 language code of the target language.

ImageClassificationDatasetMetadata

Dataset metadata that is specific to image classification.

JSON representation
{
  "classificationType": enum (ClassificationType)
}
Fields
classificationType

enum (ClassificationType)

Required. Type of the classification problem.

TextClassificationDatasetMetadata

Dataset metadata for classification.

JSON representation
{
  "classificationType": enum (ClassificationType)
}
Fields
classificationType

enum (ClassificationType)

Required. Type of the classification problem.

ImageObjectDetectionDatasetMetadata

Dataset metadata specific to image object detection.

VideoClassificationDatasetMetadata

Dataset metadata specific to video classification. All Video Classification datasets are treated as multi label.

VideoObjectTrackingDatasetMetadata

Dataset metadata specific to video object tracking.

TextExtractionDatasetMetadata

Dataset metadata that is specific to text extraction

TextSentimentDatasetMetadata

Dataset metadata for text sentiment.

JSON representation
{
  "sentimentMax": integer
}
Fields
sentimentMax

integer

Required. A sentiment is expressed as an integer ordinal, where higher value means a more positive sentiment. The range of sentiments that will be used is between 0 and sentimentMax (inclusive on both ends), and all the values in the range must be represented in the dataset before a model can be created. sentimentMax value must be between 1 and 10 (inclusive).

TablesDatasetMetadata

Metadata for a dataset used for AutoML Tables.

JSON representation
{
  "primaryTableSpecId": string,
  "targetColumnSpecId": string,
  "weightColumnSpecId": string,
  "mlUseColumnSpecId": string,
  "targetColumnCorrelations": {
    string: {
      object(CorrelationStats)
    },
    ...
  },
  "statsUpdateTime": string
}
Fields
primaryTableSpecId

string

Output only. The tableSpecId of the primary table of this dataset.

targetColumnSpecId

string

columnSpecId of the primary table's column that should be used as the training & prediction target. This column must be non-nullable and have one of following data types (otherwise model creation will error):

  • CATEGORY

  • FLOAT64

If the type is CATEGORY , only up to 100 unique values may exist in that column across all rows.

NOTE: Updates of this field will instantly affect any other users concurrently working with the dataset.

weightColumnSpecId

string

columnSpecId of the primary table's column that should be used as the weight column, i.e. the higher the value the more important the row will be during model training. Required type: FLOAT64. Allowed values: 0 to 10000, inclusive on both ends; 0 means the row is ignored for training. If not set all rows are assumed to have equal weight of 1. NOTE: Updates of this field will instantly affect any other users concurrently working with the dataset.

mlUseColumnSpecId

string

columnSpecId of the primary table column which specifies a possible ML use of the row, i.e. the column will be used to split the rows into TRAIN, VALIDATE and TEST sets. Required type: STRING. This column, if set, must either have all of TRAIN, VALIDATE, TEST among its values, or only have TEST, UNASSIGNED values. In the latter case the rows with UNASSIGNED value will be assigned by AutoML. Note that if a given ml use distribution makes it impossible to create a "good" model, that call will error describing the issue. If both this columnSpecId and primary table's timeColumnSpecId are not set, then all rows are treated as UNASSIGNED. NOTE: Updates of this field will instantly affect any other users concurrently working with the dataset.

targetColumnCorrelations

map (key: string, value: object (CorrelationStats))

Output only. Correlations between

TablesDatasetMetadata.target_column_spec_id, and other columns of the

TablesDatasetMetadataprimary_table. Only set if the target column is set. Mapping from other column spec id to its CorrelationStats with the target column. This field may be stale, see the statsUpdateTime field for for the timestamp at which these stats were last updated.

statsUpdateTime

string (Timestamp format)

Output only. The most recent timestamp when targetColumnCorrelations field and all descendant ColumnSpec.data_stats and ColumnSpec.top_correlated_columns fields were last (re-)generated. Any changes that happened to the dataset afterwards are not reflected in these fields values. The regeneration happens in the background on a best effort basis.

A timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds. Example: "2014-10-02T15:01:23.045123456Z".

Methods

create

Creates a dataset.

delete

Deletes a dataset and all of its contents.

exportData

Exports dataset's data to the provided output location.

get

Gets a dataset.

getIamPolicy

Gets the access control policy for a resource.

importData

Imports data into a dataset.

list

Lists datasets in a project.

patch

Updates a dataset.

setIamPolicy

Sets the access control policy on the specified resource.