- 1.78.0 (latest)
- 1.77.0
- 1.76.0
- 1.75.0
- 1.74.0
- 1.73.0
- 1.72.0
- 1.71.1
- 1.70.0
- 1.69.0
- 1.68.0
- 1.67.1
- 1.66.0
- 1.65.0
- 1.63.0
- 1.62.0
- 1.60.0
- 1.59.0
- 1.58.0
- 1.57.0
- 1.56.0
- 1.55.0
- 1.54.1
- 1.53.0
- 1.52.0
- 1.51.0
- 1.50.0
- 1.49.0
- 1.48.0
- 1.47.0
- 1.46.0
- 1.45.0
- 1.44.0
- 1.43.0
- 1.39.0
- 1.38.1
- 1.37.0
- 1.36.4
- 1.35.0
- 1.34.0
- 1.33.1
- 1.32.0
- 1.31.1
- 1.30.1
- 1.29.0
- 1.28.1
- 1.27.1
- 1.26.1
- 1.25.0
- 1.24.1
- 1.23.0
- 1.22.1
- 1.21.0
- 1.20.0
- 1.19.1
- 1.18.3
- 1.17.1
- 1.16.1
- 1.15.1
- 1.14.0
- 1.13.1
- 1.12.1
- 1.11.0
- 1.10.0
- 1.9.0
- 1.8.1
- 1.7.1
- 1.6.2
- 1.5.0
- 1.4.3
- 1.3.0
- 1.2.0
- 1.1.1
- 1.0.1
- 0.9.0
- 0.8.0
- 0.7.1
- 0.6.0
- 0.5.1
- 0.4.0
- 0.3.1
TabularDataset(
dataset_name: str,
project: Optional[str] = None,
location: Optional[str] = None,
credentials: Optional[google.auth.credentials.Credentials] = None,
)
Managed tabular dataset resource for Vertex AI.
Inheritance
builtins.object > google.cloud.aiplatform.base.VertexAiResourceNoun > builtins.object > google.cloud.aiplatform.base.FutureManager > google.cloud.aiplatform.base.VertexAiResourceNounWithFutureManager > google.cloud.aiplatform.datasets.dataset._Dataset > google.cloud.aiplatform.datasets.column_names_dataset._ColumnNamesDataset > TabularDatasetProperties
column_names
Retrieve the columns for the dataset by extracting it from the Google Cloud Storage or Google BigQuery source.
Type | Description |
RuntimeError | When no valid source is found. |
create_time
Time this resource was created.
display_name
Display name of this resource.
encryption_spec
Customer-managed encryption key options for this Vertex AI resource.
If this is set, then all resources created by this Vertex AI resource will be encrypted with the provided encryption key.
gca_resource
The underlying resource proto representation.
labels
User-defined labels containing metadata about this resource.
Read more about labels at https://goo.gl/xmQnxf
metadata_schema_uri
The metadata schema uri of this dataset resource.
name
Name of this resource.
resource_name
Full qualified resource name.
update_time
Time this resource was last updated.
Methods
TabularDataset
TabularDataset(
dataset_name: str,
project: Optional[str] = None,
location: Optional[str] = None,
credentials: Optional[google.auth.credentials.Credentials] = None,
)
Retrieves an existing managed dataset given a dataset name or ID.
Name | Description |
dataset_name |
str
Required. A fully-qualified dataset resource name or dataset ID. Example: "projects/123/locations/us-central1/datasets/456" or "456" when project and location are initialized or passed. |
project |
str
Optional project to retrieve dataset from. If not set, project set in aiplatform.init will be used. |
location |
str
Optional location to retrieve dataset from. If not set, location set in aiplatform.init will be used. |
credentials |
auth_credentials.Credentials
Custom credentials to use to retrieve this Dataset. Overrides credentials set in aiplatform.init. |
create
create(
display_name: Optional[str] = None,
gcs_source: Optional[Union[str, Sequence[str]]] = None,
bq_source: Optional[str] = None,
project: Optional[str] = None,
location: Optional[str] = None,
credentials: Optional[google.auth.credentials.Credentials] = None,
request_metadata: Optional[Sequence[Tuple[str, str]]] = (),
labels: Optional[Dict[str, str]] = None,
encryption_spec_key_name: Optional[str] = None,
sync: bool = True,
create_request_timeout: Optional[float] = None,
)
Creates a new tabular dataset.
Name | Description |
display_name |
str
Optional. The user-defined name of the Dataset. The name can be up to 128 characters long and can be consist of any UTF-8 characters. |
gcs_source |
Union[str, Sequence[str]]
Google Cloud Storage URI(-s) to the input file(s). .. rubric:: Examples str: "gs://bucket/file.csv" Sequence[str]: ["gs://bucket/file1.csv", "gs://bucket/file2.csv"] |
bq_source |
str
BigQuery URI to the input table. .. rubric:: Example "bq://project.dataset.table_name" |
project |
str
Project to upload this dataset to. Overrides project set in aiplatform.init. |
location |
str
Location to upload this dataset to. Overrides location set in aiplatform.init. |
credentials |
auth_credentials.Credentials
Custom credentials to use to upload this dataset. Overrides credentials set in aiplatform.init. |
request_metadata |
Sequence[Tuple[str, str]]
Strings which should be sent along with the request as metadata. |
labels |
Dict[str, str]
Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with "aiplatform.googleapis.com/" and are immutable. |
encryption_spec_key_name |
Optional[str]
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the dataset. Has the form: |
sync |
bool
Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed. |
create_request_timeout |
float
Optional. The timeout for the create request in seconds. |
Type | Description |
tabular_dataset (TabularDataset) | Instantiated representation of the managed tabular dataset resource. |
create_from_dataframe
create_from_dataframe(
df_source: pd.DataFrame,
staging_path: str,
bq_schema: Optional[Union[str, google.cloud.bigquery.schema.SchemaField]] = None,
display_name: Optional[str] = None,
project: Optional[str] = None,
location: Optional[str] = None,
credentials: Optional[google.auth.credentials.Credentials] = None,
)
Creates a new tabular dataset from a Pandas DataFrame.
Name | Description |
staging_path |
str
Required. The BigQuery table to stage the data for Vertex. Because Vertex maintains a reference to this source to create the Vertex Dataset, this BigQuery table should not be deleted. Example: |
bq_schema |
Optional[Union[str, bigquery.SchemaField]]
Optional. If not set, BigQuery will autodetect the schema using your DataFrame's column types. If set, BigQuery will use the schema you provide when creating the staging table. For more details, see: https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.job.LoadJobConfig#google_cloud_bigquery_job_LoadJobConfig_schema |
display_name |
str
Optional. The user-defined name of the Dataset. The name can be up to 128 characters long and can be consist of any UTF-8 charact |
project |
str
Optional. Project to upload this dataset to. Overrides project set in aiplatform.init. |
location |
str
Optional. Location to upload this dataset to. Overrides location set in aiplatform.init. |
credentials |
auth_credentials.Credentials
Optional. Custom credentials to use to upload this dataset. Overrides credentials set in aiplatform.init. |
df_source |
pd.DataFrame
Required. Pandas DataFrame containing the source data for ingestion as a TabularDataset. This method will use the data types from the provided DataFrame when creating the dataset. |
Type | Description |
tabular_dataset (TabularDataset) | Instantiated representation of the managed tabular dataset resource. |
delete
delete(sync: bool = True)
Deletes this Vertex AI resource. WARNING: This deletion is permanent.
Name | Description |
sync |
bool
Whether to execute this deletion synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed. |
export_data
export_data(output_dir: str)
Exports data to output dir to GCS.
Name | Description |
output_dir |
str
Required. The Google Cloud Storage location where the output is to be written to. In the given directory a new directory will be created with name: |
Type | Description |
exported_files (Sequence[str]) | All of the files that are exported in this export operation. |
import_data
import_data()
Upload data to existing managed dataset.
Name | Description |
gcs_source |
Union[str, Sequence[str]]
Required. Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames. .. rubric:: Examples str: "gs://bucket/file.csv" Sequence[str]: ["gs://bucket/file1.csv", "gs://bucket/file2.csv"] |
import_schema_uri |
str
Required. Points to a YAML file stored on Google Cloud Storage describing the import format. Validation will be done against the schema. The schema is defined as an |
data_item_labels |
Dict
Labels that will be applied to newly imported DataItems. If an identical DataItem as one being imported already exists in the Dataset, then these labels will be appended to these of the already existing one, and if labels with identical key is imported before, the old label value will be overwritten. If two DataItems are identical in the same import data operation, the labels will be combined and if key collision happens in this case, one of the values will be picked randomly. Two DataItems are considered identical if their content bytes are identical (e.g. image bytes or pdf bytes). These labels will be overridden by Annotation labels specified inside index file referenced by |
sync |
bool
Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed. |
import_request_timeout |
float
Optional. The timeout for the import request in seconds. |
Type | Description |
dataset (Dataset) | Instantiated representation of the managed dataset resource. |
list
list(
filter: Optional[str] = None,
order_by: Optional[str] = None,
project: Optional[str] = None,
location: Optional[str] = None,
credentials: Optional[google.auth.credentials.Credentials] = None,
)
List all instances of this Dataset resource.
Example Usage:
aiplatform.TabularDataset.list( filter='labels.my_key="my_value"', order_by='display_name' )
Name | Description |
filter |
str
Optional. An expression for filtering the results of the request. For field names both snake_case and camelCase are supported. |
order_by |
str
Optional. A comma-separated list of fields to order by, sorted in ascending order. Use "desc" after a field name for descending. Supported fields: |
project |
str
Optional. Project to retrieve list from. If not set, project set in aiplatform.init will be used. |
location |
str
Optional. Location to retrieve list from. If not set, location set in aiplatform.init will be used. |
credentials |
auth_credentials.Credentials
Optional. Custom credentials to use to retrieve list. Overrides credentials set in aiplatform.init. |
to_dict
to_dict()
Returns the resource proto as a dictionary.
update
update(
*,
display_name: Optional[str] = None,
labels: Optional[Dict[str, str]] = None,
description: Optional[str] = None,
update_request_timeout: Optional[float] = None
)
Update the dataset. Updatable fields:
display_name
description
labels
Name | Description |
display_name |
str
Optional. The user-defined name of the Dataset. The name can be up to 128 characters long and can be consist of any UTF-8 characters. |
labels |
Dict[str, str]
Optional. Labels with user-defined metadata to organize your Tensorboards. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. No more than 64 user labels can be associated with one Tensorboard (System labels are excluded). See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with "aiplatform.googleapis.com/" and are immutable. |
description |
str
Optional. The description of the Dataset. |
update_request_timeout |
float
Optional. The timeout for the update request in seconds. |
Type | Description |
dataset (Dataset) | Updated dataset. |
wait
wait()
Helper method that blocks until all futures are complete.