Hello text data: Create a text classification dataset and import documents

Use the Vertex AI console to create a text classification dataset. After your dataset is created, use the CSV that you copied into your Cloud Storage bucket to import those documents into the dataset.

This tutorial has several pages:

Setting up your project and environment.
Creating a text classification dataset .
Training an AutoML text classification model.
Deploy model to an endpoint and send a prediction.
Cleaning up your project.

Each page assumes that you have already performed the instructions from the previous pages of the tutorial.

Go to the Vertex AI console.
From the Get started with Vertex AI page, click Create dataset.
Specify details about your dataset.
1. Specify a name for this dataset, such as text_classification_tutorial.
2. In the Select a data type and objective section, click Text and then select Text classification (Single-label).
3. For the Region, select us-central1.
  
  This tutorial uses us-central1, but Vertex AI supports other regions, such as europe-west4.
4. Click Create to create the empty dataset and then import documents.
On the import page, select the Select import files from Cloud Storage and specify the Cloud Storage location of your CSV file. Tip: Click Browse, select the happiness.csv file in the Select object dialog, and click Select.

For this tutorial, the CSV file is at: gs://${BUCKET}/text/happiness.csv. The bucket for this tutorial is in the same region as the dataset, but you can specify files that are in buckets from any region.
Keep the Default data split.

Vertex AI automatically assigns documents to training, validation, and test sets. For more information, see About data splits for AutoML models.
Click Continue to start the import.

The import process will take a few minutes. When the import completes, you can browse all of the imported documents and their associated labels in the dataset's Browse tab.

What's next

Follow the next page of this tutorial to start an AutoML model training job.

Set up your project and environment

Train an AutoML text classification model