Export feature values

Export feature values for all entities of a single entity type to a BigQuery table or a Cloud Storage bucket. You can choose to get a snapshot or to fully export feature values. A snapshot returns a single value per feature compared to a full export, which can return multiple values per feature. You can't select particular entity IDs or include multiple entity types when exporting feature values.

Exporting feature values is useful for archiving or for performing ad hoc analysis on your data. For example, you can store regular snapshots of your featurestore to save its state at different points in time. If you need to get feature values for building a training dataset, use batch serving instead.

Snapshot and full export comparison

Both the snapshot and full export options let you query data by specifying a single timestamp (either the start time or end time) or both timestamps. For snapshots, Vertex AI Feature Store (Legacy) returns the latest feature value within a given time range. In the output, the associated timestamp with each feature value is the snapshot timestamp (not the feature value timestamp).

For full exports, Vertex AI Feature Store (Legacy) returns all feature values within a given time range. In the output, the associated timestamp with each feature value is the feature timestamp (the specified timestamp when the feature value was ingested).

The following table summarizes what Vertex AI Feature Store (Legacy) returns based on the option that you choose and the timestamps that you provide.

Option Start time only (inclusive) End time only (inclusive) Start and end time (inclusive)
Snapshot Starting with the current time (when the request was received), returns the latest value, looking back until the start time.
The snapshot timestamp is set to the current time.
Starting with the end time, returns the latest value, looking back to the very first value for each feature.
The snapshot timestamp is set to the specified end time.
Returns the latest value within the specified time range.
The snapshot timestamp is set to the specified end time.
Full export Returns all values on and after the start time and up to the current time (when the request was sent). Returns all values up to the end time, going all the way back to the very first value for each feature. Returns all values within the specified time range.

Null values

For snapshots, if the latest feature value is null at a given timestamp, Vertex AI Feature Store (Legacy) returns the previous non-null feature value. If there are no previous non-null values, Vertex AI Feature Store (Legacy) returns null.

For full exports, if a feature value is null at a given timestamp, Vertex AI Feature Store (Legacy) returns null for that timestamp.

Examples

As an example, assume you had the following values in a featurestore, where the values for Feature_A and Feature_B share the same timestamp:

Entity ID Feature value timestamp Feature_A Feature_B
123 T1 A_T1 B_T1
123 T2 A_T2 NULL
123 T3 A_T3 NULL
123 T4 A_T4 B_T4
123 T5 NULL B_T5

Snapshot

For snapshots, Vertex AI Feature Store (Legacy) returns the following values based on the given timestamp values:

  • If only the start time is set to T3, the snapshot returns the following values:
Entity ID Snapshot timestamp Feature_A Feature_B
123 CURRENT_TIME A_T4 B_T5
  • If only the end time is set to T3, the snapshot returns the following values:
Entity ID Snapshot timestamp Feature_A Feature_B
123 T3 A_T3 B_T1
  • If the start and end times are set to T2 and T3, the snapshot returns the following values:
Entity ID Snapshot timestamp Feature_A Feature_B
123 T3 A_T3 NULL

Full export

For full exports, Vertex AI Feature Store (Legacy) returns the following values based on the given timestamp values:

  • If only the start time is set to T3, the full export returns the following values:
Entity ID Feature value timestamp Feature_A Feature_B
123 T3 A_T3 NULL
123 T4 A_T4 B_T4
123 T5 NULL B_T5
  • If only the end time is set to T3, the full export returns the following values:
Entity ID Feature value timestamp Feature_A Feature_B
123 T1 A_T1 B_T1
123 T2 A_T2 NULL
123 T3 A_T3 NULL
  • If the start and end times are set to T2 and T4, the full export returns the following values:
Entity ID Feature value timestamp Feature_A Feature_B
123 T2 A_T2 NULL
123 T3 A_T3 NULL
123 T4 A_T4 B_T4

Export feature values

When you export feature values, you choose which features to query and whether it is a snapshot or a full export. The following sections show a sample for each option.

For both options, the output destination must be in the same region as the source featurestore. For example, if your featurestore is in us-central1, then the destination Cloud Storage bucket or BigQuery table must also be in us-central1.

Snapshot

Export the latest feature values for a given time range.

Web UI

Use another method. You cannot export feature values from the Google Cloud console.

REST

To export feature values, send a POST request by using the entityTypes.exportFeatureValues method.

The following sample outputs a BigQuery table, but you can also output to a Cloud Storage bucket. Each output destination might have some prerequisites before you can submit a request. For example, if you specify a table name for the bigqueryDestination field, you must have an existing dataset. These requirements are documented in the API reference.

Before using any of the request data, make the following replacements:

  • LOCATION_ID: Region where the featurestore is located. For example, us-central1.
  • PROJECT_ID: Your project ID.
  • FEATURESTORE_ID: ID of the featurestore.
  • ENTITY_TYPE_ID: ID of the entity type.
  • START_TIME and END_TIME: (Optional) If you specify the start time only, returns the latest value starting from the current time (when the request is sent) and looking back until the start time. If you specify the end time only, returns the latest value starting from the end time (inclusive) and looking back to the very first value. If you specify a start time and end time, returns the latest value within the specified time range (inclusive). If you specify neither, returns the latest values for each feature, starting from the current time and looking back to the very first value.
  • DATASET_NAME: Name of the destination BigQuery dataset.
  • TABLE_NAME: Name of the destination BigQuery table.
  • FEATURE_ID: ID of one or more features. Specify a single * (asterisk) to select all features.

HTTP method and URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/featurestores/FEATURESTORE_ID/entityTypes/ENTITY_TYPE_ID:exportFeatureValues

Request JSON body:

{
  "snapshotExport": {
    "start_time": "START_TIME",
    "snapshot_time": "END_TIME"
  },
  "destination" : {
    "bigqueryDestination": {
      "outputUri": "bq://PROJECT_ID.DATASET_NAME.TABLE_NAME"
    }
  },
  "featureSelector": {
    "idMatcher": {
      "ids": ["FEATURE_ID", ...]
    }
  }
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/featurestores/FEATURESTORE_ID/entityTypes/ENTITY_TYPE_ID:exportFeatureValues"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/featurestores/FEATURESTORE_ID/entityTypes/ENTITY_TYPE_ID:exportFeatureValues" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/featurestores/FEATURESTORE_ID/entityTypes/ENTITY_TYPE_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.ExportFeatureValuesOperationMetadata",
    "genericMetadata": {
      "createTime": "2021-12-03T22:55:25.974976Z",
      "updateTime": "2021-12-03T22:55:25.974976Z"
    }
  }
}

Java

Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.aiplatform.v1.BigQueryDestination;
import com.google.cloud.aiplatform.v1.EntityTypeName;
import com.google.cloud.aiplatform.v1.ExportFeatureValuesOperationMetadata;
import com.google.cloud.aiplatform.v1.ExportFeatureValuesRequest;
import com.google.cloud.aiplatform.v1.ExportFeatureValuesRequest.SnapshotExport;
import com.google.cloud.aiplatform.v1.ExportFeatureValuesResponse;
import com.google.cloud.aiplatform.v1.FeatureSelector;
import com.google.cloud.aiplatform.v1.FeatureValueDestination;
import com.google.cloud.aiplatform.v1.FeaturestoreServiceClient;
import com.google.cloud.aiplatform.v1.FeaturestoreServiceSettings;
import com.google.cloud.aiplatform.v1.IdMatcher;
import java.io.IOException;
import java.util.Arrays;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class ExportFeatureValuesSnapshotSample {

  public static void main(String[] args)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "YOUR_PROJECT_ID";
    String featurestoreId = "YOUR_FEATURESTORE_ID";
    String entityTypeId = "YOUR_ENTITY_TYPE_ID";
    String destinationTableUri = "YOUR_DESTINATION_TABLE_URI";
    List<String> featureSelectorIds = Arrays.asList("title", "genres", "average_rating");
    String location = "us-central1";
    String endpoint = "us-central1-aiplatform.googleapis.com:443";
    int timeout = 300;
    exportFeatureValuesSnapshotSample(
        project,
        featurestoreId,
        entityTypeId,
        destinationTableUri,
        featureSelectorIds,
        location,
        endpoint,
        timeout);
  }

  static void exportFeatureValuesSnapshotSample(
      String project,
      String featurestoreId,
      String entityTypeId,
      String destinationTableUri,
      List<String> featureSelectorIds,
      String location,
      String endpoint,
      int timeout)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    FeaturestoreServiceSettings featurestoreServiceSettings =
        FeaturestoreServiceSettings.newBuilder().setEndpoint(endpoint).build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (FeaturestoreServiceClient featurestoreServiceClient =
        FeaturestoreServiceClient.create(featurestoreServiceSettings)) {

      FeatureSelector featureSelector =
          FeatureSelector.newBuilder()
              .setIdMatcher(IdMatcher.newBuilder().addAllIds(featureSelectorIds).build())
              .build();

      ExportFeatureValuesRequest exportFeatureValuesRequest =
          ExportFeatureValuesRequest.newBuilder()
              .setEntityType(
                  EntityTypeName.of(project, location, featurestoreId, entityTypeId).toString())
              .setDestination(
                  FeatureValueDestination.newBuilder()
                      .setBigqueryDestination(
                          BigQueryDestination.newBuilder().setOutputUri(destinationTableUri)))
              .setFeatureSelector(featureSelector)
              .setSnapshotExport(SnapshotExport.newBuilder())
              .build();

      OperationFuture<ExportFeatureValuesResponse, ExportFeatureValuesOperationMetadata>
          exportFeatureValuesFuture =
              featurestoreServiceClient.exportFeatureValuesAsync(exportFeatureValuesRequest);
      System.out.format(
          "Operation name: %s%n", exportFeatureValuesFuture.getInitialFuture().get().getName());
      System.out.println("Waiting for operation to finish...");
      ExportFeatureValuesResponse exportFeatureValuesResponse =
          exportFeatureValuesFuture.get(timeout, TimeUnit.SECONDS);
      System.out.println("Snapshot Export Feature Values Response");
      System.out.println(exportFeatureValuesResponse);
      featurestoreServiceClient.close();
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const project = 'YOUR_PROJECT_ID';
// const featurestoreId = 'YOUR_FEATURESTORE_ID';
// const entityTypeId = 'YOUR_ENTITY_TYPE_ID';
// const destinationTableUri = 'YOUR_BQ_DESTINATION_TABLE_URI';
// const timestamp = <STARTING_TIMESTAMP_OF_SNAPSHOT_IN_SECONDS>;
// const location = 'YOUR_PROJECT_LOCATION';
// const apiEndpoint = 'YOUR_API_ENDPOINT';
// const timeout = <TIMEOUT_IN_MILLI_SECONDS>;

// Imports the Google Cloud Featurestore Service Client library
const {FeaturestoreServiceClient} = require('@google-cloud/aiplatform').v1;

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: apiEndpoint,
};

// Instantiates a client
const featurestoreServiceClient = new FeaturestoreServiceClient(
  clientOptions
);

async function exportFeatureValuesSnapshot() {
  // Configure the entityType resource
  const entityType = `projects/${project}/locations/${location}/featurestores/${featurestoreId}/entityTypes/${entityTypeId}`;

  const destination = {
    bigqueryDestination: {
      // # Output to BigQuery table created earlier
      outputUri: destinationTableUri,
    },
  };

  const featureSelector = {
    idMatcher: {
      ids: ['age', 'gender', 'liked_genres'],
    },
  };

  const snapshotExport = {
    startTime: {
      seconds: Number(timestamp),
    },
  };

  const request = {
    entityType: entityType,
    destination: destination,
    featureSelector: featureSelector,
    snapshotExport: snapshotExport,
  };

  // Export Feature Values Request
  const [operation] = await featurestoreServiceClient.exportFeatureValues(
    request,
    {timeout: Number(timeout)}
  );
  const [response] = await operation.promise();

  console.log('Export feature values snapshot response');
  console.log('Raw response:');
  console.log(JSON.stringify(response, null, 2));
}
exportFeatureValuesSnapshot();

Additional languages

To learn how to install and use the Vertex AI SDK for Python, see Use the Vertex AI SDK for Python. For more information, see the Vertex AI SDK for Python API reference documentation.

Full export

Export all feature values within a given time range.

Web UI

Use another method. You cannot export feature values from the Google Cloud console.

REST

To export feature values, send a POST request by using the entityTypes.exportFeatureValues method.

The following sample outputs a BigQuery table, but you can also output to a Cloud Storage bucket. Each output destination might have some prerequisites before you can submit a request. For example, if you specify a table name for the bigqueryDestination field, you must have an existing dataset. These requirements are documented in the API reference.

Before using any of the request data, make the following replacements:

  • LOCATION_ID: Region where the featurestore is located. For example, us-central1.
  • PROJECT_ID: Your project ID.
  • FEATURESTORE_ID: ID of the featurestore.
  • ENTITY_TYPE_ID: ID of the entity type.
  • START_TIME and END_TIME: (Optional) If you specify the start time only, returns all values between the current time (when the request is sent) and the start time (inclusive). If you specify the end time only, returns all values between the end time (inclusive) and the very first value timestamp (for each feature). If you specify a start time and end time, returns all values within the specified time range (inclusive). If you specify neither, returns all values between the current time and the very first value timestamp (for each feature).
  • DATASET_NAME: Name of the destination BigQuery dataset.
  • TABLE_NAME: Name of the destination BigQuery table.
  • FEATURE_ID: ID of one or more features. Specify a single * (asterisk) to select all features.

HTTP method and URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION_ID/featurestores/FEATURESTORE_ID/entityTypes/ENTITY_TYPE_ID:exportFeatureValues

Request JSON body:

{
  "fullExport": {
    "start_time": "START_TIME",
    "end_time": "END_TIME"
  },
  "destination" : {
    "bigqueryDestination": {
      "outputUri": "bq://PROJECT.DATASET_NAME.TABLE_NAME"
    }
  },
  "featureSelector": {
    "idMatcher": {
      "ids": ["FEATURE_ID", ...]
    }
  }
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION_ID/featurestores/FEATURESTORE_ID/entityTypes/ENTITY_TYPE_ID:exportFeatureValues"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION_ID/featurestores/FEATURESTORE_ID/entityTypes/ENTITY_TYPE_ID:exportFeatureValues" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/featurestores/FEATURESTORE_ID/entityTypes/ENTITY_TYPE_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.ExportFeatureValuesOperationMetadata",
    "genericMetadata": {
      "createTime": "2021-12-03T22:55:25.974976Z",
      "updateTime": "2021-12-03T22:55:25.974976Z"
    }
  }
}

Java

Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.aiplatform.v1.BigQueryDestination;
import com.google.cloud.aiplatform.v1.EntityTypeName;
import com.google.cloud.aiplatform.v1.ExportFeatureValuesOperationMetadata;
import com.google.cloud.aiplatform.v1.ExportFeatureValuesRequest;
import com.google.cloud.aiplatform.v1.ExportFeatureValuesRequest.FullExport;
import com.google.cloud.aiplatform.v1.ExportFeatureValuesResponse;
import com.google.cloud.aiplatform.v1.FeatureSelector;
import com.google.cloud.aiplatform.v1.FeatureValueDestination;
import com.google.cloud.aiplatform.v1.FeaturestoreServiceClient;
import com.google.cloud.aiplatform.v1.FeaturestoreServiceSettings;
import com.google.cloud.aiplatform.v1.IdMatcher;
import java.io.IOException;
import java.util.Arrays;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class ExportFeatureValuesSample {

  public static void main(String[] args)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "YOUR_PROJECT_ID";
    String featurestoreId = "YOUR_FEATURESTORE_ID";
    String entityTypeId = "YOUR_ENTITY_TYPE_ID";
    String destinationTableUri = "YOUR_DESTINATION_TABLE_URI";
    List<String> featureSelectorIds = Arrays.asList("title", "genres", "average_rating");
    String location = "us-central1";
    String endpoint = "us-central1-aiplatform.googleapis.com:443";
    int timeout = 300;
    exportFeatureValuesSample(
        project,
        featurestoreId,
        entityTypeId,
        destinationTableUri,
        featureSelectorIds,
        location,
        endpoint,
        timeout);
  }

  static void exportFeatureValuesSample(
      String project,
      String featurestoreId,
      String entityTypeId,
      String destinationTableUri,
      List<String> featureSelectorIds,
      String location,
      String endpoint,
      int timeout)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    FeaturestoreServiceSettings featurestoreServiceSettings =
        FeaturestoreServiceSettings.newBuilder().setEndpoint(endpoint).build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (FeaturestoreServiceClient featurestoreServiceClient =
        FeaturestoreServiceClient.create(featurestoreServiceSettings)) {

      FeatureSelector featureSelector =
          FeatureSelector.newBuilder()
              .setIdMatcher(IdMatcher.newBuilder().addAllIds(featureSelectorIds).build())
              .build();

      ExportFeatureValuesRequest exportFeatureValuesRequest =
          ExportFeatureValuesRequest.newBuilder()
              .setEntityType(
                  EntityTypeName.of(project, location, featurestoreId, entityTypeId).toString())
              .setDestination(
                  FeatureValueDestination.newBuilder()
                      .setBigqueryDestination(
                          BigQueryDestination.newBuilder().setOutputUri(destinationTableUri)))
              .setFeatureSelector(featureSelector)
              .setFullExport(FullExport.newBuilder())
              .build();

      OperationFuture<ExportFeatureValuesResponse, ExportFeatureValuesOperationMetadata>
          exportFeatureValuesFuture =
              featurestoreServiceClient.exportFeatureValuesAsync(exportFeatureValuesRequest);
      System.out.format(
          "Operation name: %s%n", exportFeatureValuesFuture.getInitialFuture().get().getName());
      System.out.println("Waiting for operation to finish...");
      ExportFeatureValuesResponse exportFeatureValuesResponse =
          exportFeatureValuesFuture.get(timeout, TimeUnit.SECONDS);
      System.out.println("Export Feature Values Response");
      System.out.println(exportFeatureValuesResponse);
      featurestoreServiceClient.close();
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const project = 'YOUR_PROJECT_ID';
// const featurestoreId = 'YOUR_FEATURESTORE_ID';
// const entityTypeId = 'YOUR_ENTITY_TYPE_ID';
// const destinationTableUri = 'YOUR_BQ_DESTINATION_TABLE_URI';
// const location = 'YOUR_PROJECT_LOCATION';
// const apiEndpoint = 'YOUR_API_ENDPOINT';
// const timeout = <TIMEOUT_IN_MILLI_SECONDS>;

// Imports the Google Cloud Featurestore Service Client library
const {FeaturestoreServiceClient} = require('@google-cloud/aiplatform').v1;

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: apiEndpoint,
};

// Instantiates a client
const featurestoreServiceClient = new FeaturestoreServiceClient(
  clientOptions
);

async function exportFeatureValues() {
  // Configure the entityType resource
  const entityType = `projects/${project}/locations/${location}/featurestores/${featurestoreId}/entityTypes/${entityTypeId}`;

  const destination = {
    bigqueryDestination: {
      // # Output to BigQuery table created earlier
      outputUri: destinationTableUri,
    },
  };

  const featureSelector = {
    idMatcher: {
      ids: ['age', 'gender', 'liked_genres'],
    },
  };

  const request = {
    entityType: entityType,
    destination: destination,
    featureSelector: featureSelector,
    fullExport: {},
  };

  // Export Feature Values Request
  const [operation] = await featurestoreServiceClient.exportFeatureValues(
    request,
    {timeout: Number(timeout)}
  );
  const [response] = await operation.promise();

  console.log('Export feature values response');
  console.log('Raw response:');
  console.log(JSON.stringify(response, null, 2));
}
exportFeatureValues();

Additional languages

To learn how to install and use the Vertex AI SDK for Python, see Use the Vertex AI SDK for Python. For more information, see the Vertex AI SDK for Python API reference documentation.

What's next