Best practices for Feature Store

The following best practices will help you plan and use Vertex Feature Store in various scenarios. This guide is not intended to be exhaustive, but will provide you with an understanding of how you can use Feature Store in your organization.

Modeling features that jointly describe multiple entities

In some cases, a feature might apply to multiple entity types. You might have a calculated value, for example, clicks per product by a particular user. This feature jointly describes pairs of products and users for which you already have separate entity types for products and users. The best practice, in this case, is to create a separate entity type to group shared features. You can create an entity type, such as product-user, to contain the shared features. For the specific entity IDs, concatenate the IDs of the individual entities, such as the entity IDs of the individual product and user. The only requirement is that the IDs must be strings. These combined entity types are referred to as composite entity types.

For more information, see creating an entity type.

Monitor and tune resources accordingly to optimize batch ingestion

Batch ingestion jobs require workers to process and write data, which can increase the CPU utilization of your featurestore and affect online serving performance. If preserving online serving performance is a priority, start with 1 worker for every 10 online serving nodes. During ingestion, monitor the CPU usage of the online storage. If CPU usage is lower than expected, increase the number of workers for future batch ingestion jobs to increase throughput. If CPU usage is higher than expected, increase the number of online serving nodes to increase CPU capacity or lower the batch ingestion worker count, both of which can lower CPU usage.

If you do increase the number of online serving nodes, note that Feature Store takes roughly 15 minutes to reach optimal performance after you make the update.

For more information, see updating a featurestore and batch ingesting feature values.

For more information about featurestore monitoring, see Cloud Monitoring metrics.

Use the disableOnlineServing field when backfilling historical data

Backfilling is the process of ingesting historical feature values and don't impact the most recent feature values. In this case, you can disable online serving, which skips any changes to the online store. For more information, see the Backfill historical data.

What's next

Learn Feature Store best practices for implementing custom-trained ML models on Vertex AI.