Import index data from BigQuery

This guide explains how to import index data from BigQuery into Vector Search with the ImportIndex API, streamlining the process of populating Vector Search indexes directly from your BigQuery tables that contain vector embeddings.

Preparing BigQuery data for import

Before importing index data, your BigQuery table must have the following columns:

  • Unique identifiers: This column contains unique identifiers for each data point. It is mapped to the id field in Vector Search.

  • Vector embeddings: This column contains the vector embeddings, represented as a repeated FLOAT field. It is mapped to the embedding field in Vector Search.

Optionally, you can include the following columns:

  • Restricts: These are columns for string and numeric restricts, which lets you filter your data during searches.

  • Metadata: These are columns of metadata to be returned with Vector Search index query results.

Preparing Vector Search index for import

Once you've prepared your BigQuery data, ensure the destination Vector Search index:

  • Exists in Vector Search within your project: This index serves as the destination for your imported data. The index must be created within your project.

  • Is set to overwrite or append data: During the import process, you have the option to either overwrite the existing data within your Vector Search index or append the data imported from BigQuery. Overwriting replaces the current data points with the imported data. Appending adds the new data to the existing index.

  • Matches dimensionality: The dimensionality of the embeddings stored in your BigQuery data must be identical to the dimensionality configured for your Vector Search index.

Specifying the ImportIndexRequest

Before importing data from BigQuery, create an ImportIndexRequest object that specifies the target index, whether to overwrite existing data, and the configuration for connecting to BigQuery. Send this request object to the ImportIndex API.

The following is an example of an ImportIndexRequest in JSON format:

{
  "name": "projects/[PROJECT_ID]/locations/[LOCATION]/indexes/[INDEX_ID]",
  "isCompleteOverwrite": true,
  "config": {
    "bigQuerySourceConfig": {
      "tablePath": "[PROJECT_ID].[DATASET_ID].[TABLE_ID]",
      "datapointFieldMapping": {
        "idColumn": "[ID_COLUMN_NAME]",
        "embeddingColumn": "[EMBEDDING_COLUMN_NAME]",
        "restricts": [
          {
            "namespace": "[RESTRICT_NAMESPACE]",
            "allowColumn": ["[RESTRICT_ALLOW_COLUMN_NAME]"],
            "denyColumn": ["[RESTRICT_DENY_COLUMN_NAME]"]
          }
        ],
        "numericRestricts": [
          {
            "namespace": "[RESTRICT_NAMESPACE]",
            "valueColumn": "[RESTRICT_VALUE_COLUMN_NAME]",
            "valueType": "INT"
          }
        ],
        "metadataColumns": ["METADATA_COLUMN1", "METADATA_COLUMN2", ...]
      }
    }
  }
}
  • name: The full resource name of the Vector Search index where you want to import the data.

  • isCompleteOverwrite: A boolean that indicates whether to overwrite existing data in the index. Set to true to replace existing data.

  • config: Contains the configuration for the BigQuery source.

    • bigquerySourceConfig: Specifies the details for connecting to your BigQuery table.

    • tablePath: The full path to your BigQuery table in the format [PROJECT_ID].[DATASET_ID].[TABLE_ID].

    • datapointFieldMapping: Maps the columns in your BigQuery table to the fields in Vector Search.

      • idColumn: The name of the column containing unique identifiers.

      • embeddingColumn: The name of the column containing vector embeddings.

      • restricts: (Optional) Specifies string restricts.

      • namespace: The namespace for the restrict.

      • allowColumn: The array containing column name(s) for allowed values for the restrict.

      • denyColumn: The array containing column name(s) for denied values for the restrict.

      • numericRestricts: (Optional) Specifies numeric restricts.

      • namespace: The namespace for the numeric restrict.

      • value_column: The name of the column containing numeric values.

      • value_type: The type of the numeric value such as INT, FLOAT, or DOUBLE.

      • metadataColumns: (Optional) Metadata fields to include with the feature embedding. These metadata fields can be retrieved from the index search results, but they don't affect the search itself. For example, filtering cannot be performed on metadata fields.

Executing the import

Once you have created an ImportIndexRequest, send it to the ImportIndex API endpoint. This triggers the import process, which exports data from BigQuery and ingests it into your Vector Search index. ImportIndex returns a long-running operation. You can use the operation ID to monitor the progress of the import operation.

After the imported data is stored, it resides within your Vector Search index and is indistinguishable from data ingested using other methods. The index can continue to be managed using standard Vector Search APIs.

The following code sample demonstrates a query result with return_full_datapoint set to true and the BigQuery connector configuration that specifies a genre restricts, a year numeric restricts, and metadata columns title and description.

nearest_neighbors {
  neighbors {
    datapoint {
      datapoint_id: "4"
      feature_vector: 0.7
      feature_vector: 0.8
      restricts {
        namespace: "genre"
        allow_list: "Drama"
      }
      embedding_metadata {
        title: "A Movie"
        description: "The story of A Movie..."
      }
      crowding_tag {
        crowding_attribute: "0"
      }
      numeric_restricts {
        namespace: "year"
        value_int: 1942
      }
    }
    distance: 0.75
  }