Manage dataset versions

Vertex AI lets you create versions for a dataset. This functionality can be useful for reproducibility, traceability, and dataset lineage management.

You can create versions for image and text datasets. When you create a dataset version, Vertex AI creates a BigQuery dataset if none exists. The BigQuery dataset stores all versions for the associated Vertex AI dataset.

When you restore a version, you override the associated dataset. The dataset is temporarily unavailable for other requests until the restore operation ends.

Create a dataset version

You can use the Vertex AI API to create a dataset version. Follow the steps in the corresponding tab:

REST

Get the dataset's ID

To create a version, you must know the numerical ID of the dataset. If you know the display name of the dataset but not the ID, expand the following section to learn how to get the ID using the API:

Get a Dataset's ID from its display name

Before using any of the request data, make the following replacements:

  • LOCATION: The region where the Dataset is stored. For example, us-central.

  • PROJECT_ID: Your project ID.

  • DATASET_DISPLAY_NAME: The display name of the Dataset.

HTTP method and URL:

GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets?filter=displayName=DATASET_DISPLAY_NAME

To send your request, choose one of these options:

curl

Execute the following command:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets?filter=displayName=DATASET_DISPLAY_NAME"

PowerShell

Execute the following command:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets?filter=displayName=DATASET_DISPLAY_NAME" | Select-Object -Expand Content

The following example response has been truncated with ... to emphasize where you can find your Dataset's ID: it is the number that takes the place of DATASET_ID.

{
  "datasets": [
    {
      "name": "projects/PROJECT_NUMBER/locations/LOCATION/datasets/DATASET_ID",
      "displayName": "DATASET_DISPLAY_NAME",
      ...
    }
  ]
}

Alternatively, you can get the dataset's ID from the Google Cloud console: Go to the Vertex AI Datasets page and find the number in the ID column.

Go to the Datasets page

Before using any of the request data, make the following replacements:

  • LOCATION: The region where the dataset version is stored. For example, us-central.

  • PROJECT_ID: Your project ID.

  • DATASET_ID: The numerical ID of the dataset.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/datasetVersions

To send your request, choose one of these options:

curl

Execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d "" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/datasetVersions"

PowerShell

Execute the following command:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/datasetVersions" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION/datasets/DATASET_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateDatasetVersionOperationMetadata",
    "genericMetadata": {
      "createTime": "2021-02-17T00:54:58.827429Z",
      "updateTime": "2021-02-17T00:54:58.827429Z"
    },
  }
}

Some requests start long-running operations that require time to complete. These requests return an operation name, which you can use to view the operation's status or cancel the operation. Vertex AI provides helper methods to make calls against long-running operations. For more information, see Working with long-running operations.

Restore a dataset version

You can use the Vertex AI API to restore a dataset version. Follow the steps in the corresponding tab:

REST

Get the dataset version's ID

To restore a version, you must know the version's numerical ID. You can list all the dataset versions using the API:

List a Dataset's DatasetVersions

Before using any of the request data, make the following replacements:

  • LOCATION: The region where the dataset version is stored. For example, us-central.

  • PROJECT_ID: Your project ID.

  • DATASET_ID: The numerical ID of the dataset.

HTTP method and URL:

GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/datasetVersions

To send your request, choose one of these options:

curl

Execute the following command:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/datasetVersions"

PowerShell

Execute the following command:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/datasetVersions" | Select-Object -Expand Content

The following example response has been truncated with ... to emphasize where you can find your dataset version's ID: it is the number that takes the place of DATASET_VERSION_ID.

{
  "datasetVersions": [
    {
      "name": "projects/PROJECT_NUMBER/locations/LOCATION/datasets/DATASET_ID/datasetVersions/DATASET_VERSION_ID",
      ...
    }
  ]
}

Before using any of the request data, make the following replacements:

  • LOCATION: The region where the dataset version is stored. For example, us-central.

  • PROJECT_ID: Your project ID.

  • DATASET_ID: The numerical ID of the dataset.

  • DATASET_VERSION_ID: The numerical ID of the dataset version.

HTTP method and URL:

GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/datasetVersions/DATASET_VERSION_ID:restore

To send your request, choose one of these options:

curl

Execute the following command:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/datasetVersions/DATASET_VERSION_ID:restore"

PowerShell

Execute the following command:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/datasetVersions/DATASET_VERSION_ID:restore" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION/datasets/DATASET_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.RestoreDatasetVersionOperationMetadata",
    "genericMetadata": {
      "createTime": "2021-02-17T00:54:58.827429Z",
      "updateTime": "2021-02-17T00:54:58.827429Z"
    },
  }
}

Some requests start long-running operations that require time to complete. These requests return an operation name, which you can use to view the operation's status or cancel the operation. Vertex AI provides helper methods to make calls against long-running operations. For more information, see Working with long-running operations.

What's next

Read more about working with datasets in Vertex AI.