此页面由 Cloud Translation API 翻译。

在 Vertex AI RAG Engine 中使用 Vertex AI Feature Store

本页面介绍了如何将 Vertex AI Feature Store 设置为向量数据库，以便与 RAG 引擎搭配使用。

您也可以使用 RAG 引擎搭配 Vertex AI Feature Store 笔记本进行操作。

通过将 Vertex AI Feature Store 集成为额外的向量数据库，RAG 引擎可以使用 Vertex AI Feature Store 以低延迟处理大量数据，从而有助于提高 RAG 应用的性能和可伸缩性。

设置 Vertex AI Feature Store

Vertex AI Feature Store 是一种托管式云原生服务，是 Vertex AI 的重要组件。它使您可以在 BigQuery 表或视图中管理特征数据，从而简化了机器学习 (ML) 特征管理和在线传送。这样便可实现低延迟的在线特征传送。

对于使用优化在线传送创建的 FeatureOnlineStore 实例，您可以利用向量相似度搜索来检索在语义上相似或相关的实体（称为近似最近邻）。

以下部分介绍了如何为 RAG 应用设置 Vertex AI Feature Store 实例。

创建 BigQuery 表架构

使用 Google Cloud 控制台创建 BigQuery 表架构。它必须包含以下字段才能用作数据源。

字段名称	数据类型	状态
`corpus_id`	`String`	必需
`file_id`	`String`	必需
`chunk_id`	`String`	必需
`chunk_data_type`	`String`	可以为 Null
`chunk_data`	`String`	可以为 Null
`file_original_uri`	`String`	可以为 Null
`embeddings`	`Float`	重复

此代码示例演示了如何定义 BigQuery 表架构。

SQL

  CREATE TABLE `PROJECT_ID.input_us_central1.rag_source_new` (
    `corpus_id` STRING NOT NULL,
    `file_id` STRING NOT NULL,
    `chunk_id` STRING NOT NULL,
    `chunk_data_type` STRING,
    `chunk_data` STRING,
    `embeddings` ARRAY<FLOAT64>,
    `file_original_uri` STRING
  );

预配 `FeatureOnlineStore` 实例

如需实现特征在线传送，请使用 Vertex AI Feature Store CreateFeatureOnlineStore API 设置 FeatureOnlineStore 实例。如果您是首次预配 FeatureOnlineStore，此操作可能需要大约五分钟才能完成。

REST

如需创建在线存储区实例，请使用 featureOnlineStores.create 方法发送 POST 请求。

在使用任何请求数据之前，请先进行以下替换：

LOCATION_ID：要在其中创建 FeatureOnlineStore 实例的区域，例如 us-central1。
PROJECT_ID：您的项目 ID。
FEATUREONLINESTORE_NAME：新 FeatureOnlineStore 实例的名称。

HTTP 方法和网址：

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/featureOnlineStores?feature_online_store_id=FEATUREONLINESTORE_NAME

请求 JSON 正文：

{
  "optimized": {}
}

如需发送请求，请选择以下方式之一：

curl

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI，或者使用了 Cloud Shell，这会使您自动登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/featureOnlineStores?feature_online_store_id=FEATUREONLINESTORE_NAME"

PowerShell

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/featureOnlineStores?feature_online_store_id=FEATUREONLINESTORE_NAME" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/featureOnlineStores/FEATUREONLINESTORE_NAME/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateFeatureOnlineStoreOperationMetadata",
    "genericMetadata": {
      "createTime": "2023-09-18T17:49:23.847496Z",
      "updateTime": "2023-09-18T17:49:23.847496Z"
    }
  }
}

Python

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python API 参考文档。


from google.cloud import aiplatform
from vertexai.resources.preview import feature_store


def create_optimized_public_feature_online_store_sample(
    project: str,
    location: str,
    feature_online_store_id: str,
):
    aiplatform.init(project=project, location=location)
    fos = feature_store.FeatureOnlineStore.create_optimized_store(
        feature_online_store_id
    )
    return fos

project：您的项目 ID。
location：要在其中创建 FeatureOnlineStore 实例的区域，例如 us-central1。
feature_online_store_id：新 FeatureOnlineStore 实例的名称。

创建 `FeatureView` 资源

如需将存储特征数据源的 BigQuery 表连接到 FeatureOnlineStore 实例，请调用 CreateFeatureView API 以创建 FeatureView 资源。创建 FeatureView 资源时，请选择默认距离指标 DOT_PRODUCT_DISTANCE，该指标定义为点积的负数（DOT_PRODUCT_DISTANCE 越小表示相似度越高）。

此代码示例演示了如何创建 FeatureView 资源。

REST

  # TODO(developer): Update and uncomment the following lines:
  # Set feature_view_id
  # Example: "feature_view_test"
  # FEATURE_VIEW_ID = "your-feature-view-id"
  #
  # The big_query_uri generated in the above BigQuery table schema creation step
  # The format should be "bq://" + BigQuery table ID
  # Example: "bq://tester.ragtest1.rag_testdata"
  # BIG_QUERY_URI=YOUR_BIG_QUERY_URI

  # Call CreateFeatureView API to create a FeatureView
  curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" \
  https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/us-central1/featureOnlineStores/${FEATURE_ONLINE_STORE_ID}/featureViews?feature_view_id=${FEATURE_VIEW_ID} \
    -d '{
          "vertex_rag_source": {
            "uri": '\""${BIG_QUERY_URI}"\"'
          }
      }'

  # Call ListFeatureViews API to verify the FeatureView is created successfully
  curl -X GET -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" -H "Content-Type: application/json" https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/us-central1/featureOnlineStores/${FEATURE_ONLINE_STORE_ID}/featureViews

Python

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python API 参考文档。


from google.cloud import aiplatform
from vertexai.resources.preview import feature_store


def create_feature_view_from_rag_source(
    project: str,
    location: str,
    existing_feature_online_store_id: str,
    feature_view_id: str,
    bq_table_uri: str,
):
    aiplatform.init(project=project, location=location)
    fos = feature_store.FeatureOnlineStore(existing_feature_online_store_id)
    fv = fos.create_feature_view(
        name=feature_view_id,
        source=feature_store.utils.FeatureViewVertexRagSource(uri=bq_table_uri),
    )
    return fv

上传数据和在线传送

RAG API 负责处理数据上传和在线传送。

在 RAG 引擎中使用 Vertex AI Feature Store

设置 Vertex AI Feature Store 实例后，以下部分将介绍如何将其设置为向量数据库以便与 RAG 应用搭配使用。

使用 Vertex AI Feature Store 实例作为向量数据库以创建 RAG 语料库

如需创建 RAG 语料库，您必须使用 FEATURE_VIEW_RESOURCE_NAME。RAG 语料库会进行创建并自动与 Vertex AI Feature Store 实例相关联。RAG API 使用生成的 rag_corpus_id 来处理到 Vertex AI Feature Store 实例的数据上传，并从 rag_corpus_id 检索相关上下文。

此代码示例演示了如何将 Vertex AI Feature Store 实例用作向量数据库以创建 RAG 语料库。

REST

# TODO(developer): Update and uncomment the following lines:
# CORPUS_DISPLAY_NAME = "your-corpus-display-name"
#
# Full feature view resource name
# Format: projects/${PROJECT_ID}/locations/us-central1/featureOnlineStores/${FEATURE_ONLINE_STORE_ID}/featureViews/${FEATURE_VIEW_ID}
# FEATURE_VIEW_RESOURCE_NAME = "your-feature-view-resource-name"

# Call CreateRagCorpus API to create a new RAG corpus
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
  https://us-central1-aiplatform.googleapis.com/v1beta1/projects/{PROJECT_ID}/locations/us-central1/ragCorpora -d '{
    "display_name" : '\""${CORPUS_DISPLAY_NAME}"\"',
    "rag_vector_db_config" : {
      "vertex_feature_store": {
        "feature_view_resource_name":'\""${FEATURE_VIEW_RESOURCE_NAME}"\"'
      }
    }
  }'

# Call ListRagCorpora API to verify the RAG corpus is created successfully
curl -sS -X GET \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://us-central1-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/us-central1/ragCorpora"

Python

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python API 参考文档。


from vertexai.preview import rag
import vertexai

# TODO(developer): Update and un-comment below lines
# PROJECT_ID = "your-project-id"
# feature_view_name = "projects/{PROJECT_ID}/locations/{LOCATION}/featureOnlineStores/{FEATURE_ONLINE_STORE_ID}/featureViews/{FEATURE_VIEW_ID}"
# display_name = "test_corpus"
# description = "Corpus Description"

# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location="us-central1")

# Configure embedding model (Optional)
embedding_model_config = rag.EmbeddingModelConfig(
    publisher_model="publishers/google/models/text-embedding-004"
)

# Configure Vector DB
vector_db = rag.VertexFeatureStore(resource_name=feature_view_name)

corpus = rag.create_corpus(
    display_name=display_name,
    description=description,
    embedding_model_config=embedding_model_config,
    vector_db=vector_db,
)
print(corpus)
# Example response:
# RagCorpus(name='projects/1234567890/locations/us-central1/ragCorpora/1234567890',
# display_name='test_corpus', description='Corpus Description', embedding_model_config=...
# ...

使用 RAG API 将文件导入 BigQuery 表

使用 ImportRagFiles API 将文件从 Google Cloud Storage 或 Google 云端硬盘导入到 Vertex AI Feature Store 实例的 BigQuery 表。这些文件会嵌入到 BigQuery 表中并存储在其中。

此代码示例演示了如何使用 RAG API 将文件导入 BigQuery 表。

REST

# TODO(developer): Update and uncomment the following lines:
# RAG_CORPUS_ID = "your-rag-corpus-id"
#
# Google Cloud Storage bucket/file location.
# For example, "gs://rag-fos-test/"
# GCS_URIS= "your-gcs-uris"

# Call ImportRagFiles API to embed files and store in the BigQuery table
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/us-central1/ragCorpora/${RAG_CORPUS_ID}/ragFiles:import \
-d '{
  "import_rag_files_config": {
    "gcs_source": {
      "uris": '\""${GCS_URIS}"\"'
    },
    "rag_file_chunking_config": {
      "chunk_size": 512
    }
  }
}'

# Call ListRagFiles API to verify the files are imported successfully
curl -X GET \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://us-central1-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/us-central1/ragCorpora/${RAG_CORPUS_ID}/ragFiles

Python

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python API 参考文档。


from vertexai import rag
import vertexai

# TODO(developer): Update and un-comment below lines
# PROJECT_ID = "your-project-id"
# corpus_name = "projects/{PROJECT_ID}/locations/us-central1/ragCorpora/{rag_corpus_id}"
# paths = ["https://drive.google.com/file/123", "gs://my_bucket/my_files_dir"]  # Supports Google Cloud Storage and Google Drive Links

# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location="us-central1")

response = rag.import_files(
    corpus_name=corpus_name,
    paths=paths,
    transformation_config=rag.TransformationConfig(
        rag.ChunkingConfig(chunk_size=512, chunk_overlap=100)
    ),
    import_result_sink="gs://sample-existing-folder/sample_import_result_unique.ndjson",  # Optional, this has to be an existing storage bucket folder, and file name has to be unique (non-existent).
    max_embedding_requests_per_min=900,  # Optional
)
print(f"Imported {response.imported_rag_files_count} files.")
# Example response:
# Imported 2 files.

运行同步过程以构建 `FeatureOnlineStore` 索引

将数据上传到 BigQuery 表后，运行同步过程以便使数据可进行在线传送。您必须使用 FeatureView 生成 FeatureOnlineStore 索引，并且同步过程可能需要 20 分钟才能完成。

此代码示例演示了如何运行同步过程以构建 FeatureOnlineStore 索引。

REST

在使用任何请求数据之前，请先进行以下替换：

LOCATION_ID：在线存储区所在的区域，例如 us-central1。
PROJECT_ID：您的项目 ID。
FEATUREONLINESTORE_NAME：包含特征视图的在线存储区的名称。
FEATUREVIEW_NAME：您要在其中手动启动数据同步的特征视图的名称。

HTTP 方法和网址：

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/featureOnlineStores/FEATUREONLINESTORE_NAME/featureViews/FEATUREVIEW_NAME:sync

如需发送请求，请选择以下方式之一：

curl

执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d "" \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/featureOnlineStores/FEATUREONLINESTORE_NAME/featureViews/FEATUREVIEW_NAME:sync"

PowerShell

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/featureOnlineStores/FEATUREONLINESTORE_NAME/featureViews/FEATUREVIEW_NAME:sync" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

{
  "featureViewSync": "projects/PROJECT_ID/locations/LOCATION_ID/featureOnlineStores/FEATUREONLINESTORE_NAME/featureViews/FEATUREVIEW_NAME/featureViewSyncs/OPERATION_ID"
}

使用 RAG API 检索相关上下文

同步过程完成后，您可以通过 RetrieveContexts API 从 FeatureOnlineStore 索引检索相关上下文。

REST

# TODO(developer): Update and uncomment the following lines:
# RETRIEVAL_QUERY="your-retrieval-query"
#
# Full RAG corpus resource name
# Format:
# "projects/${PROJECT_ID}/locations/us-central1/ragCorpora/${RAG_CORPUS_ID}"
# RAG_CORPUS_RESOURCE="your-rag-corpus-resource"

# Call RetrieveContexts API to retrieve relevant contexts
curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://us-central1-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/us-central1:retrieveContexts \
  -d '{
    "vertex_rag_store": {
      "rag_resources": {
          "rag_corpus": '\""${RAG_CORPUS_RESOURCE}"\"',
        },
    },
    "query": {
      "text": '\""${RETRIEVAL_QUERY}"\"',
      "similarity_top_k": 10
    }
  }'

Python

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python API 参考文档。


from vertexai import rag
import vertexai

# TODO(developer): Update and un-comment below lines
# PROJECT_ID = "your-project-id"
# corpus_name = "projects/[PROJECT_ID]/locations/us-central1/ragCorpora/[rag_corpus_id]"

# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location="us-central1")

response = rag.retrieval_query(
    rag_resources=[
        rag.RagResource(
            rag_corpus=corpus_name,
            # Optional: supply IDs from `rag.list_files()`.
            # rag_file_ids=["rag-file-1", "rag-file-2", ...],
        )
    ],
    text="Hello World!",
    rag_retrieval_config=rag.RagRetrievalConfig(
        top_k=10,
        filter=rag.utils.resources.Filter(vector_distance_threshold=0.5),
    ),
)
print(response)
# Example response:
# contexts {
#   contexts {
#     source_uri: "gs://your-bucket-name/file.txt"
#     text: "....
#   ....

使用 Vertex AI Gemini API 生成内容

调用 Vertex AI GenerateContent API 以使用 Gemini 模型生成内容，并在请求中指定 RAG_CORPUS_RESOURCE 以从 FeatureOnlineStore 索引检索数据。

REST

# TODO(developer): Update and uncomment the following lines:
# MODEL_ID=gemini-2.0-flash
# GENERATE_CONTENT_PROMPT="your-generate-content-prompt"

# GenerateContent with contexts retrieved from the FeatureStoreOnline index
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json"  https://us-central1-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/us-central1/publishers/google/models/${MODEL_ID}:generateContent \
-d '{
  "contents": {
    "role": "user",
    "parts": {
      "text": '\""${GENERATE_CONTENT_PROMPT}"\"'
    }
  },
  "tools": {
    "retrieval": {
      "vertex_rag_store": {
        "rag_resources": {
            "rag_corpus": '\""${RAG_CORPUS_RESOURCE}"\"',
          },
        "similarity_top_k": 8,
      }
    }
  }
}'

Python

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python API 参考文档。


from vertexai import rag
from vertexai.generative_models import GenerativeModel, Tool
import vertexai

# TODO(developer): Update and un-comment below lines
# PROJECT_ID = "your-project-id"
# corpus_name = "projects/{PROJECT_ID}/locations/us-central1/ragCorpora/{rag_corpus_id}"

# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location="us-central1")

rag_retrieval_tool = Tool.from_retrieval(
    retrieval=rag.Retrieval(
        source=rag.VertexRagStore(
            rag_resources=[
                rag.RagResource(
                    rag_corpus=corpus_name,
                    # Optional: supply IDs from `rag.list_files()`.
                    # rag_file_ids=["rag-file-1", "rag-file-2", ...],
                )
            ],
            rag_retrieval_config=rag.RagRetrievalConfig(
                top_k=10,
                filter=rag.utils.resources.Filter(vector_distance_threshold=0.5),
            ),
        ),
    )
)

rag_model = GenerativeModel(
    model_name="gemini-2.0-flash-001", tools=[rag_retrieval_tool]
)
response = rag_model.generate_content("Why is the sky blue?")
print(response.text)
# Example response:
#   The sky appears blue due to a phenomenon called Rayleigh scattering.
#   Sunlight, which contains all colors of the rainbow, is scattered
#   by the tiny particles in the Earth's atmosphere....
#   ...

后续步骤

将 Weaviate 数据库与 Vertex AI RAG 引擎搭配使用

在 Vertex AI RAG Engine 中使用 Vertex AI Feature Store 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

设置 Vertex AI Feature Store

创建 BigQuery 表架构

SQL

预配 FeatureOnlineStore 实例

REST

curl

PowerShell

Python

创建 FeatureView 资源

REST

Python

上传数据和在线传送

在 RAG 引擎中使用 Vertex AI Feature Store

使用 Vertex AI Feature Store 实例作为向量数据库以创建 RAG 语料库

REST

Python

使用 RAG API 将文件导入 BigQuery 表

REST

Python

运行同步过程以构建 FeatureOnlineStore 索引

REST

curl

PowerShell

使用 RAG API 检索相关上下文

REST

Python

使用 Vertex AI Gemini API 生成内容

REST

Python

后续步骤

在 Vertex AI RAG Engine 中使用 Vertex AI Feature Store

预配 `FeatureOnlineStore` 实例

创建 `FeatureView` 资源

运行同步过程以构建 `FeatureOnlineStore` 索引