Best practices for Vertex AI Feature Store (Legacy)

The following best practices will help you plan and use Vertex AI Feature Store (Legacy) in various scenarios. This guide is not intended to be exhaustive..

Model features that jointly describe multiple entities

Some features might apply to multiple entity types. For example, you might have a calculated value that records clicks per product by user. This feature jointly describes product-user pairs.

The best practice, in this case, is to create a separate entity type to group shared features. You can create an entity type, such as product-user, to contain shared features.

For the entity IDs, concatenate the IDs of the individual entities, such as the entity IDs of the individual product and user. The only requirement is that the IDs must be strings. These combined entity types are referred to as composite entity types.

For more information, see creating an entity type.

Use IAM policies to control access across multiple teams

Use IAM roles and policies to set different levels of access to different groups of users. For example, ML researchers, data scientists, DevOps, and site reliability engineers all require access to the same featurestore, but their level of access can differ. For example, DevOps users might require permissions to manage a featurestore, but they don't require access to the contents of the featurestore.

You can also restrict access to a particular featurestore or entity type by using resource-level IAM policies.

As an example, imagine that your organization includes the following personas. Because each persona requires a different level of access, each persona is assigned a different predefined IAM role. You can also create and use your own custom roles.

Persona	Description	Predefined role
ML researcher or business analyst	Users who only view data on specific entity types	`roles/aiplatform.featurestoreDataViewer` (can be granted at the project or resource level)
Data scientists or data engineers	Users who work with specific entity type resources. For the resources they own, they can delegate access to other principals.	`roles/aiplatform.entityTypeOwner` (can be granted at the project or resource level)
IT or DevOps	Users who must maintain and tune the performance of specific featurestores but don't require access to the data.	`roles/aiplatform.featurestoreInstanceCreator` (can be granted at the project or resource level)
Automated data import pipeline	Applications that write data to specific entity types.	`roles/aiplatform.featurestoreDataWriter` (can be granted at the project or resource level)
Site reliability engineer	Users who manage particular featurestores or all featurestores in a project	`roles/aiplatform.featurestoreAdmin` (can be granted at the project or resource level)
Global (any Vertex AI Feature Store (Legacy) user)	Allow users to view and search for existing features. If they find a feature they want to work with, they can request access from the feature owners. For Google Cloud console users, this role is also required to view the Vertex AI Feature Store (Legacy) landing page, import jobs page, and batch serving jobs page.	Grant the `roles/aiplatform.featurestoreResourceViewer` role at the project level.

Monitor and tune resources accordingly to optimize batch import

Batch import jobs require workers to process and write data, which can increase the CPU utilization of your featurestore and affect online serving performance. If preserving online serving performance is a priority, start with one worker for every ten online serving nodes. During import, monitor the CPU usage of the online storage. If CPU usage is lower than expected, increase the number of workers for future batch import jobs to increase throughput. If CPU usage is higher than expected, increase the number of online serving nodes to increase CPU capacity or lower the batch import worker count, both of which can lower CPU usage.

If you do increase the number of online serving nodes, note that Vertex AI Feature Store (Legacy) takes roughly 15 minutes to reach optimal performance after you make the update.

For more information, see update a featurestore and batch import feature values.

For more information about featurestore monitoring, see Cloud Monitoring metrics.

Use the `disableOnlineServing` field when backfilling historical data

Backfilling is the process of importing historical feature values and don't impact the most recent feature values. In this case, you can disable online serving, which skips any changes to the online store. For more information, see Backfill historical data.

Use autoscaling to reduce costs during load fluctuations

If you use Vertex AI Feature Store (Legacy) extensively and encounter frequent load fluctuations in your traffic patterns, use autoscaling to optimize costs. Autoscaling lets Vertex AI Feature Store (Legacy) review traffic patterns and automatically adjust the number of nodes up or down depending on CPU utilization, instead of maintaining a high node count. This option works well for traffic patterns that encounter gradual growth and decline.

For more information about autoscaling, see Scaling options.

Test the performance of online serving nodes for real-time serving

You can ensure the performance of your featurestore during real-time online serving by testing the performance of your online serving nodes. You can perform these tests based on various benchmarking parameters, such as QPS, latency, and API. Follow these guidelines to test the performance of online serving nodes:

Run all test clients from the same region, preferably on Compute Engine or Google Kubernetes Engine: This prevents discrepancies due to network latency resulting from hops across regions.
Use the gRPC API in the SDK: The gRPC API performs better than the REST API. If you need to use the REST API, enable the HTTP keep-alive option to reuse HTTP connections. Otherwise, each request results in the creation of a new HTTP connection, which increases the latency.
Run longer duration tests: Run tests with longer duration (15 minutes or more) and a minimum of 5 QPS to calculate more accurate metrics.
Add a "warm-up" period: If you start testing after a period of inactivity, you might observe high latency while connections are reestablished. To account for the initial period of high latency, you can designate this period as a "warm-up period", when the initial data reads are ignored. As an alternative, you can send a low but consistent rate of artificial traffic to the featurestore to keep the connection active.
If required, enable autoscaling: If you anticipate gradual growth and decline in your online traffic, enable autoscaling. If you choose autoscaling, Vertex AI automatically changes the number of online serving nodes based on CPU utilization.

For more information about online serving, see Online serving. For more information about online serving nodes, see Online serving nodes.

Specify a start time to optimize offline storage costs during batch serve and batch export

To optimize offline storage costs during batch serving and batch export, you can specify the a startTime in your batchReadFeatureValues or exportFeatureValues request. The request runs a query over a subset of the available feature data, based on the specified startTime. Otherwise, the request queries the entire available volume of feature data, resulting in high offline storage usage costs.

What's next

Learn Vertex AI Feature Store (Legacy) best practices for implementing custom-trained ML models on Vertex AI.