Prepare data source

Before you can start serving features online using Vertex AI Feature Store, you need to set up your feature data source in BigQuery, as follows:

  1. Create a BigQuery table or view using your feature data. To load feature data into a BigQuery table or view, you can create a BigQuery dataset using the data, create a BigQuery table, and then load the feature data from the dataset into the table.

  2. After you load the feature data into the BigQuery table or view, you need to make this data source available to Vertex AI Feature Store for online serving. There are two ways in which you can connect the data source to online serving resources, such as online stores and feature view instances:

    • Register the data source by creating feature groups and features: You can associate feature groups and features with feature view instances in your online store. In this scenario, you can format your data as a time series by including the feature_timestamp column. Vertex AI Feature Store serves only the latest non-null values for each unique entity ID, based on the feature timestamp. For information about how to create feature groups, see Create a feature group. For information about how to create features within a feature group, see create a feature.

    • Directly serve features from the data source without creating feature groups and features: You can specify the URI of the data source in the feature view. Note that in this scenario, you can't format your data as a time series or include historical data in the BigQuery source. Each row must contain the latest feature values corresponding to a unique ID. Multiple occurrences of the same entity ID in different rows are not supported.

Since Vertex AI Feature Store lets you maintain feature data in BigQuery and serves features from the BigQuery data source, there's no need to import or copy the features to an offline store.

Data source preparation guidelines

Follow these guidelines to understand the schema and constraints while preparing the data source in BigQuery:

  1. The data source must contain the following columns:

    • An entity ID column with string values. The size of each value in this column must be less than 4 KB.

      • If you're registering the data source by creating feature groups, the name of this column must be entity_id. You don't need to specify the entity ID column while associating feature groups during feature view creation.

      • If you're going to specify the data source URI to create the feature view, then you need to specify the name of this column during feature view creation. In this case, it's not mandatory to name this column entity_id.

    • If you register the data source using feature groups and features, include the feature_timestamp column and format the data as a time series. The feature_timestamp column contains values of type timestamp. During online serving, Vertex AI Feature Store serves the latest non-null values of a feature based on this timestamp.

    If you directly associate a BigQuery data source with a feature view, the feature_timestamp column isn't required. In this scenario, you must include only the latest feature values in the data source and Vertex AI Feature Store doesn't look up the timestamp.

    • If you want to enable embedding management in your online store, the data source must contain the following columns:

    • An embedding column containing arrays of type float.

    • Optional: One or more filtering columns of type string or string array.

    • Optional: A crowding column of type int.

  2. Each row in data source is a complete record of feature values associated with an entity ID. If a feature value is missing in one of the columns, then it's considered a null value. Depending on how you define the feature view, there are two ways in which Vertex AI Feature Store selects the feature values it serves:

    • If the feature view is defined based on feature groups and features, Vertex AI Feature Store serves the latest non-null feature value by using the feature timestamp. For example, if the value of a particular feature corresponding to the latest timestamp is null, then Vertex AI Feature Store serves the most recent non-null value from the historical values of the feature.

    • If the feature view is defined by directly specifying a BigQuery data source, then every row must contain a unique entity ID. In this case, Vertex AI Feature Store serves all the feature values from the associated data source.

  3. Each column of the BigQuery table or view represents a feature. Provide the values for each feature in a separate column. If you're associating the data source with a feature group and features, associate each column with a separate feature.

  4. Supported data types for feature values include bool, int, float, string, timestamp, arrays of these data types, and bytes. Note that during data sync, feature values of type timestamp are converted to int64.

  5. The data source must be located in the same region as the online store instance, or in a multi-region that includes or overlaps with the region for the online store. For example, if the online store is in us-central, the BigQuery source might be located in us-central or US.

  6. Sync the data in a feature view before online serving to ensure that you serve only the latest feature values.

What's next