- 1.71.0 (latest)
- 1.70.0
- 1.69.0
- 1.68.0
- 1.67.1
- 1.66.0
- 1.65.0
- 1.63.0
- 1.62.0
- 1.60.0
- 1.59.0
- 1.58.0
- 1.57.0
- 1.56.0
- 1.55.0
- 1.54.1
- 1.53.0
- 1.52.0
- 1.51.0
- 1.50.0
- 1.49.0
- 1.48.0
- 1.47.0
- 1.46.0
- 1.45.0
- 1.44.0
- 1.43.0
- 1.39.0
- 1.38.1
- 1.37.0
- 1.36.4
- 1.35.0
- 1.34.0
- 1.33.1
- 1.32.0
- 1.31.1
- 1.30.1
- 1.29.0
- 1.28.1
- 1.27.1
- 1.26.1
- 1.25.0
- 1.24.1
- 1.23.0
- 1.22.1
- 1.21.0
- 1.20.0
- 1.19.1
- 1.18.3
- 1.17.1
- 1.16.1
- 1.15.1
- 1.14.0
- 1.13.1
- 1.12.1
- 1.11.0
- 1.10.0
- 1.9.0
- 1.8.1
- 1.7.1
- 1.6.2
- 1.5.0
- 1.4.3
- 1.3.0
- 1.2.0
- 1.1.1
- 1.0.1
- 0.9.0
- 0.8.0
- 0.7.1
- 0.6.0
- 0.5.1
- 0.4.0
- 0.3.1
TabularDataset(
dataset_name: str,
project: Optional[str] = None,
location: Optional[str] = None,
credentials: Optional[google.auth.credentials.Credentials] = None,
)
Managed tabular dataset resource for Vertex AI.
Inheritance
builtins.object > google.cloud.aiplatform.base.VertexAiResourceNoun > builtins.object > google.cloud.aiplatform.base.FutureManager > google.cloud.aiplatform.base.VertexAiResourceNounWithFutureManager > google.cloud.aiplatform.datasets.dataset._Dataset > TabularDatasetProperties
column_names
Retrieve the columns for the dataset by extracting it from the Google Cloud Storage or Google BigQuery source.
Type | Description |
RuntimeError | When no valid source is found. |
Methods
TabularDataset
TabularDataset(
dataset_name: str,
project: Optional[str] = None,
location: Optional[str] = None,
credentials: Optional[google.auth.credentials.Credentials] = None,
)
Retrieves an existing managed dataset given a dataset name or ID.
Name | Description |
dataset_name |
str
Required. A fully-qualified dataset resource name or dataset ID. Example: "projects/123/locations/us-central1/datasets/456" or "456" when project and location are initialized or passed. |
project |
str
Optional project to retrieve dataset from. If not set, project set in aiplatform.init will be used. |
location |
str
Optional location to retrieve dataset from. If not set, location set in aiplatform.init will be used. |
credentials |
auth_credentials.Credentials
Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init. |
create
create(
display_name: str,
gcs_source: Optional[Union[str, Sequence[str]]] = None,
bq_source: Optional[str] = None,
project: Optional[str] = None,
location: Optional[str] = None,
credentials: Optional[google.auth.credentials.Credentials] = None,
request_metadata: Optional[Sequence[Tuple[str, str]]] = (),
encryption_spec_key_name: Optional[str] = None,
sync: bool = True,
)
Creates a new tabular dataset.
Name | Description |
display_name |
str
Required. The user-defined name of the Dataset. The name can be up to 128 characters long and can be consist of any UTF-8 characters. |
gcs_source |
Union[str, Sequence[str]]
Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. .. rubric:: Examples str: "gs://bucket/file.csv" Sequence[str]: ["gs://bucket/file1.csv", "gs://bucket/file2.csv"] |
bq_source |
str
BigQuery URI to the input table. .. rubric:: Example "bq://project.dataset.table_name" |
project |
str
Project to upload this model to. Overrides project set in aiplatform.init. |
location |
str
Location to upload this model to. Overrides location set in aiplatform.init. |
credentials |
auth_credentials.Credentials
Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init. |
request_metadata |
Sequence[Tuple[str, str]]
Strings which should be sent along with the request as metadata. |
encryption_spec_key_name |
Optional[str]
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the dataset. Has the form: |
sync |
bool
Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed. |
Type | Description |
tabular_dataset (TabularDataset) | Instantiated representation of the managed tabular dataset resource. |
import_data
import_data()
Upload data to existing managed dataset.
Name | Description |
gcs_source |
Union[str, Sequence[str]]
Required. Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. .. rubric:: Examples str: "gs://bucket/file.csv" Sequence[str]: ["gs://bucket/file1.csv", "gs://bucket/file2.csv"] |
import_schema_uri |
str
Required. Points to a YAML file stored on Google Cloud Storage describing the import format. Validation will be done against the schema. The schema is defined as an |
data_item_labels |
Dict
Labels that will be applied to newly imported DataItems. If an identical DataItem as one being imported already exists in the Dataset, then these labels will be appended to these of the already existing one, and if labels with identical key is imported before, the old label value will be overwritten. If two DataItems are identical in the same import data operation, the labels will be combined and if key collision happens in this case, one of the values will be picked randomly. Two DataItems are considered identical if their content bytes are identical (e.g. image bytes or pdf bytes). These labels will be overridden by Annotation labels specified inside index file refenced by |
sync |
bool
Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed. |
Type | Description |
dataset (Dataset) | Instantiated representation of the managed dataset resource. |