Data model and resources

The following sections introduce the Vertex AI Feature Store data model and terminology that is used to describe Vertex AI Feature Store resources and components.

Vertex AI Feature Store data model

Vertex AI Feature Store uses a time series data model to store a series of values for features. This model enables Vertex AI Feature Store to maintain feature values as they change over time. Vertex AI Feature Store organizes resources hierarchically in the following order: Featurestore -> EntityType -> Feature. You must create these resources before you can ingest data into Vertex AI Feature Store.

As an example, assume that you have the following sample source data from a BigQuery table. This source data is about movies and their features.

Shows how source data maps to the Vertex AI Feature Store data model

Before you can ingest this data into Vertex AI Feature Store, you need to create a featurestore, which is a top-level container for all other resources. In the featurestore, create entity types that group and contain related features. You can then create features that map to features in your source data. The names of the entity type and features can mirror the column header names, but that is not required.

In this example, the movie_id column header can map to an entity type movie. The average_rating, title, and genre are features of the movie entity type. The values in each column map to specific instances of an entity type or features, which are called entities and feature values.

The timestamp column indicates when the feature values were generated. In the featurestore, the timestamps are an attribute of the feature values, not a separate resource type. If all feature values were generated at the same time, you are not required to have a timestamp column. You can specify the timestamp as part of your ingestion request.

Featurestore

A featurestore is the top-level container for entity types, features, and feature values. Typically, an organization creates one shared featurestore for feature ingestion, serving, and sharing across all teams in the organization. However, sometimes you might choose to create multiple featurestores within the same project to isolate environments. For example, you might have separate featurestores for experimentation, testing, and production.

Entity type

An entity type is a collection of semantically related features. You define your own entity types, based on the concepts that are relevant to your use case. For example, a movie service might have the entity types movie and user, which group related features that correspond to movies or customers.

Entity

An entity is an instance of an entity type. For example, movie_01 and movie_02 are entities of the entity type movie. In a featurestore each entity must have a unique ID and must be of type STRING.

Feature

A feature is a measurable property or attribute of an entity type. For example, the movie entity type has features such as average_rating and title that track various properties of movies. Features are associated with entity types. Features must be distinct within a given entity type, but they don't need to be globally unique. For example, if you use title for two different entity types, Vertex AI Feature Store interprets title as two different features. When reading feature values, you provide the feature and its entity type as part of the request.

When you create a feature, you specify its value type such as BOOL_ARRAY, DOUBLE, DOUBLE_ARRAY, and STRING. This value determines what value types you can ingest for a particular feature. For more information about the supported value types, see the valueType in the API reference.

Feature value

Vertex AI Feature Store captures feature values for a feature at a specific point in time. In other words, you can have multiple values for a given entity and feature. For example, the movie_01 entity can have multiple feature values for the average_rating feature. The value can be 4.4 at one time and 4.8 at some later time. Vertex AI Feature Store associates a tuple identifier with each feature value (entity_id, feature_id, timestamp), which Vertex AI Feature Store uses to look up values at serving time.

Vertex AI Feature Store stores discrete values even though time is continuous. When you request a feature value at time t, the returned value is the latest stored value before time t. For example, if the Vertex AI Feature Store stores the location information of a car at times 100 and 110, the location at time 100 is used for requests at all times between 100 and 110 (including 105). If you require higher resolution, you can, for example, infer the location between values or increase the sampling rate of your data.

Feature ingestion

Feature ingestion is the process of importing feature values computed by your feature engineering jobs into a featurestore. Before you can ingest data, the corresponding entity type and features must be defined in the featurestore. Vertex AI Feature Store offers batch ingestion so that you can do a bulk ingestion of values into a featurestore. For example, your computed source data might live in locations such as BigQuery or Cloud Storage. You can then ingest data from those sources into a featurestore so that feature values can be served in a uniform format from the central featurestore.

For more information, see Batch ingesting feature values.

Feature serving

Feature serving is the process of exporting stored feature values for training or inference. Vertex AI Feature Store offers two methods for serving features: batch and online. Batch serving is for high throughput and serving large volumes of data for offline processing (like for model training or batch predictions). Online serving is for low-latency data retrieval of small batches of data for real-time processing (like for online predictions).

For more information, see online or batch serving.

Entity view

When you retrieve values from a featurestore, the service returns an entity view that contains the feature values that you requested. You can think of an entity view as a projection of the features and values that Vertex AI Feature Store returns from an online or batch serving request:

  • For online serving requests, you can get all or a subset of features for a particular entity type.
  • For batch serving requests, you can get all or a subset of features for one or more entity types. For example, if features are distributed across multiple entity types, you can retrieve them together in a single request that you can feed to a machine learning or batch prediction request.

Data retention

Vertex AI Feature Store keeps feature values up to the data retention limit. This limit is based on the timestamp associated with the feature values, not when the values were imported. Vertex AI Feature Store schedules to delete values with timestamps that exceed the limit.

Online and offline storage

Vertex AI Feature Store uses two storage methods labeled as online storage and offline storage, which are priced differently. Online storage keeps just the latest timestamp values of your features to efficiently handle online serving request. When you run an ingestion job, you can control if data is written to the online store by using the API. For example, when you run backfill jobs, you can disable writes to the online store. For more information, see the disableOnlineServing flag in the API reference.

Vertex AI Feature Store uses offline storage to permanently store data until the data reaches the retention limit or until you delete the data. You can control offline storage costs by managing how much data you keep.

You can view how much online and offline storage you are using by using the Cloud Console. View your featurestore's Total offline storage and Total online storage monitoring metrics to see your usage.

Online serving nodes

Each featurestore instance has one or more online serving nodes. These nodes provide the compute resources that are used to serve feature values for low-latency online serving. The number of online serving nodes that you require is directly proportional on two factors. One factor is the number online serving requests (queries per second) that the featurestore receives, and the other factor is the number of ingestion jobs that write to online storage. You can check your featurestore's queries per second and the number of online serving nodes by viewing the Queries per second and Node count metrics in the console. For information about your ingestion jobs, you can view them by using the Cloud Console.

What's next