Introduction to Vertex Feature Store

Vertex Feature Store (Feature Store) provides a centralized repository for organizing, storing, and serving ML features. By using a central featurestore, your organization can efficiently share, discover, and re-use ML features at scale, which can increase the velocity of your organization for developing and deploying new ML applications. Feature Store is a fully managed solution; it manages and scales the underlying infrastructure for you such as storage and compute resources. This solution means that your data scientists can focus on the feature computation logic instead of worrying about the challenges of deploying features into production.

Feature Store is an integrated part of Vertex AI. You can use Feature Store independently or as part of Vertex AI workflows. For example, you can fetch data from Feature Store to train custom or AutoML models in Vertex AI.

Overview

Use Feature Store to create and manage resources, such as a featurestore. A featurestore is a top-level container for your features and their values. When you set up a featurestore, permitted users can add and share their features without additional engineering support. Users can define features and then ingest (import) feature values from various data sources.

Any permitted user can search and retrieve values from the featurestore. For example, you can find features and then do a batch export to get training data for ML model creation. You can also retrieve feature values in real time to perform fast online predictions.

Benefits

Before using Feature Store, you might have computed feature values and saved them in various locations such as tables in BigQuery or as files in Cloud Storage. Moreover, you might have built and managed separate solutions for storage and the consumption of feature values. In contrast, Feature Store provides a unified solution for batch and online storage as well as the serving of ML features. The following sections details the benefits that Feature Store provides.

Share features across your organization

If you produce features in a featurestore, you can quickly share them with others for training or serving tasks. Teams don't need to re-engineer features for different projects or use cases. Also, because you can manage and serve features from a central repository, you can maintain consistency across your organization and reduces duplicate efforts, particularly for high value features.

Feature Store provides search and filter capabilities so that others can easily discover and reuse existing features. For each feature, you can view relevant metadata to determine the quality and usage patterns of the feature. For example, you can view the fraction of entities that have a valid value for a feature (also known as feature coverage) and the statistical distribution of feature values.

Managed solution for online serving at scale

Feature Store provides a managed solution for online feature serving (low-latency serving), which is critical for making timely online predictions. You do not need to build and operate low-latency data serving infrastructure; Feature Store does this for you and scales as needed. You code the logic to generate features but offload the task of serving features. All of this included management reduces the friction for building new features, enabling data scientists to do their work without worrying about deployment.

Mitigate training-serving skew

Training-serving skew occurs when the feature data distribution that you use in production differs from the feature data distribution that was used to train your model. This skew often results in discrepancies between a model's performance during training and its performance in production. The following examples describe how Feature Store can address potential sources of training-serving skew:

  • Feature Store ensures that a feature value is ingested once into a featurestore and that same value is reused for both training and serving. Without a featurestore, you might have different code paths for generating features between training and serving. So, feature values might differ between training and serving.
  • Feature Store provides point-in-time lookups to fetch historical data for training. With these lookups, you can mitigate data leakage by fetching only the feature values that were available before a prediction and not after.

Detecting drift

Feature Store helps you detect significant changes to your feature data distribution over time, also known as drift. Feature Store constantly tracks the distribution of feature values that are ingested into the featurestore. As feature drift increases, you might need to retrain models that are using the affected features. For more information, see Feature monitoring.

Quotas and limits

Feature Store enforces quotas and limits to help you manage resources by setting your own usage limits and to protect the community of Google Cloud users by preventing unforeseen spikes in usage. To prevent you from hitting unplanned constraints, review Feature Store quotas on the Quotas and limits page. For example, Feature Store sets a quota on the number of online serving nodes and a quota on the number of online serving requests that you can make per minute.

Pricing

Feature Store pricing is based on several factors, such as how much data you store and the number of featurestore online nodes you use. Charges start right after you create a featurestore. For more information, see Feature Store pricing.

What's next