If you need to retrieve or serve current as well as historical feature data, use offline serving to fetch feature values from BigQuery. For example, you can use offline serving to retrieve the feature values for specific timestamps to train a model.
All feature data, including historical feature data, is maintained in BigQuery, which constitutes the offline store for your feature values. To use offline serving, you must first register your BigQuery data source by creating feature groups and feature values. Also, in the case of offline serving, every row containing the same entity ID must have a different timestamp. For more information about data source preparation guidelines, see Prepare data source.
Before you begin
Authenticate to Vertex AI, unless you've done so already.
To use the Python samples on this page in a local development environment, install and initialize the gcloud CLI, and then set up Application Default Credentials with your user credentials.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
If you're using a local shell, then create local authentication credentials for your user account:
gcloud auth application-default login
You don't need to do this if you're using Cloud Shell.
For more information, see Set up authentication for a local development environment.
Fetch historical feature values
Use the following sample to fetch historical values from a feature from multiple entity IDs and timestamps.
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
import bigframes
import bigframes.pandas
import pandas as pd
from google.cloud import bigquery
from vertexai.resources.preview.feature_store import (Feature, FeatureGroup, offline_store)
from vertexai.resources.preview.feature_store import utils as fs_utils
fg = FeatureGroup("FEATURE_GROUP_NAME")
f1 = fg.get_feature("FEATURE_NAME_1")
f2 = fg.get_feature("FEATURE_NAME_2")
entity_df = pd.DataFrame(
data={
"ENTITY_ID_COLUMN": [
"ENTITY_ID_1",
"ENTITY_ID_2",
],
"timestamp": [
pd.Timestamp("FEATURE_TIMESTAMP_1"),
pd.Timestamp("FEATURE_TIMESTAMP_2"),
],
},
)
offline_store.fetch_historical_feature_values(
entity_df=entity_df,
features=[f1,f2],
)
Replace the following:
FEATURE_GROUP_NAME: The name of the existing feature group containing the feature.
FEATURE_NAME_1 and FEATURE_NAME_2: The names of the registered features from which you want to retrieve the feature values.
ENTITY_ID_COLUMN: The name of the column containing the entity IDs. You can specify a column name only if it's registered in the feature group.
ENTITY_ID_1 and ENTITY_ID_2: The entity IDs for which you want to fetch the feature values. If you want to retrieve feature values for the same entity ID at different timestamps, specify the same entity ID corresponding to each timestamp.
FEATURE_TIMESTAMP_1 and FEATURE_TIMESTAMP_2: The timestamps corresponding to the historical feature values you want to retrieve. FEATURE_TIMESTAMP_1 corresponds to ENTITY_ID_1, FEATURE_TIMESTAMP_2 corresponds to ENTITY_ID_2, and so on. Specify the timestamps in the datetime format—for example,
2024-05-01T12:00:00
.