You can use a managed dataset to provide the source data used to train AutoML and custom models on Vertex AI. A managed dataset is required for AutoML and is optional for custom training.
Permissions and access control
When you use data from a Cloud Storage bucket to create a dataset, Vertex AI requires permissions to access the data. Vertex AI uses a special Google-managed service account known as a Service Agent to securely access your data. For more information on the roles required and how the Service Agent works, see Access control with IAM.
Create a managed dataset for AutoML models
You can create managed datasets for training AutoML models by using the Google Cloud console or the Vertex AI API. The instructions for how to do this slightly vary based on your data type and model objective. Start by preparing your training data.
Image
Learn how to create a managed dataset for the following types of image AutoML models:
Tabular
Learn how to create a managed dataset for the following types of tabular AutoML models:
Video
Learn how to create a managed dataset for the following types of video AutoML models:
Create a managed dataset for custom trained models
The instructions on how to create a managed dataset for training custom models are the same, regardless of your data type or model objective.
For details, see Use managed datasets.
View managed datasets using Data Catalog
Data Catalog is a fully managed, scalable metadata management service that provides a centralized location to search for datasets across projects and regions.
For details, see Use Data Catalog to search for model and dataset resources overview.