Use data source access control

This page describes how to enforce data source access control for search apps in Vertex AI Search and Conversation.

Access control for your data sources in Vertex AI Search and Conversation limits the data that users can view in your search app's results. Google uses your identity provider to identify the end user performing a search and determine if they have access to the documents that are returned as results.

For example, say that employees at your company search across Confluence documents using your search app. However, you need to make sure they can't view content through the app that they aren't allowed to access. If you have set up a workforce pool in Google Cloud for your organization's identity provider, then you can also specify that workforce pool in Vertex AI Search and Conversation. Now, if an employee uses your app, they get search results only for documents that their account already has access to in Confluence.

About data source access control

Turning on access control is a one-time procedure.

Access control is available for Cloud Storage, BigQuery, Google Drive, and all third-party data sources.

To turn on data source access control for Vertex AI Search and Conversation, you must have your organization's identity provider configured in Google Cloud. The following authentication frameworks are supported:

  • Google Identity: If you use Google Identity, then all user identities and user groups are present and managed through Google Cloud. For more information about Google Identity, see the Google Identity documentation.
  • Third party identity provider federation: If you use an external identity provider, for example Okta or Azure, then you must set up workforce identity federation in Google Cloud before you can turn on data source access control for Vertex AI Search and Conversation.

Limitations

Access control has the following limitations:

  • 100 readers are allowed per document. Each principal counts as a reader, where a principal can be a group or an individual user.
  • You can select one identity provider per Vertex AI Search-supported location.
  • Access control is honored only for identity and groups that are explicitly defined in your identity provider. Identities or groups that are defined natively within third-party apps are not supported.
  • To set a data source as access-controlled, you must select this setting during data store creation. You can't turn this setting on or off for an existing data store.
  • The Data > Documents tab in the console doesn't show data for access-controlled data sources because this data should only be visible to users that have view access.
  • To preview results in the console for search apps that use third-party access control, you must log into the federated console. See Preview results for access controlled apps.

Before you begin

This procedure assumes you have set up an identity provider in your Google Cloud project.

  • Google Identity: If you use Google Identity, you can proceed to the Connect to your identity provider procedure.
  • Third-party identity provider: Make sure you have set up a workforce identity pool for your third-party identity provider. Ensure you have specified subject and group attribute mappings when setting up workforce pool. For information about attribute mappings, see Attribute mappings in the IAM documentation. For more information about workforce identity pools, see Manage workforce identity pool providers in the IAM documentation.

Connect to your identity provider

To specify an identity provider for Vertex AI Search and Conversation and turn on data source access control, follow these steps:

  1. In the Google Cloud console, go to the Search and Conversation page.

    Search and Conversation

  2. Go to the Settings > Authentication page.

  3. Click Add identity provider for the location you want to update.

  4. Select your identity provider in the Add identity provider dialog. If you select a third party identity provider, also select the workforce pool that applies for your data sources.

  5. Click Save changes.

Configure a data source with access control

To apply access control to a data source, use the following steps depending on the kind of data source you're setting up:

Unstructured data from Cloud Storage

When setting up a data store for unstructured data from Cloud Storage, you need to also upload ACL metadata and set the data store as access controlled:

  1. When preparing your data, include ACL information in your metadata using the acl_info field. For example:

    {
       "id": "<your-id>",
       "jsonData": "<JSON string>",
       "content": {
         "mimeType": "<application/pdf or text/html>",
         "uri": "gs://<your-gcs-bucket>/directory/filename.pdf"
       }
       "acl_info": {
         "readers": [
           {
             "principals": [
               { "group_id": "group_1" },
               { "user_id": "user_1" }
             ]
           }
         ]
       }
     }
    

    For more information about unstructured data with metadata, see the Unstructured data section of Prepare data for ingesting.

  2. When following the steps for data store creation in Create a search data store, you can enable access control by doing the following in either the console or using the API:

    • Console: When creating a data store, select This data store contains access control information during data store creation.
    • API: When creating data store, include the flag "aclEnabled": "true" in your JSON payload.
  3. When following the steps for data import in Create a search data store, make sure to do the following:

    • Upload your metadata with ACL information from the same bucket as your unstructured data
    • If using the API, set GcsSource.dataSchema to document

Structured data from Cloud Storage

When setting up a data store for unstructured data from Cloud Storage, you need to also upload ACL metadata and set the data store as access controlled:

  1. When preparing your data, include ACL information in your metadata using the acl_info field. For example:

    {
       "id": "<your-id>",
       "jsonData": "<JSON string>",
       "acl_info": {
         "readers": [
           {
             "principals": [
               { "group_id": "group_1" },
               { "user_id": "user_1" }
             ]
           }
         ]
       }
     }
    
  2. When following the steps for data store creation in Create a search data store, you can enable access control by doing the following in either the console or using the API:

    • Console: When creating a data store, select This data store contains access control information during data store creation.
    • API: When creating data store, include the flag "aclEnabled": "true" in your JSON payload.
  3. When following the steps for data import in Create a search data store, make sure to do the following:

    • Upload your metadata with ACL information from the same bucket as your unstructured data
    • If using the API, set GcsSource.dataSchema to document

Unstructured data from BigQuery

When setting up a data store for structured data from BigQuery, you need to set the data store as access controlled and provide ACL metadata using a predefined schema for Vertex AI Search:

  1. When preparing your data, specify the following schema. Don't use a custom schema.

    [
      {
        "name": "id",
        "mode": "REQUIRED",
        "type": "STRING",
        "fields": []
      },
      {
        "name": "jsonData",
        "mode": "NULLABLE",
        "type": "STRING",
        "fields": []
      },
      {
        "name": "content",
        "type": "RECORD",
        "mode": "NULLABLE",
        "fields": [
          {
            "name": "mimeType",
            "type": "STRING",
            "mode": "NULLABLE"
          },
          {
            "name": "uri",
            "type": "STRING",
            "mode": "NULLABLE"
          }
        ]
      }
      {
        "name": "acl_info",
        "type": "RECORD",
        "mode": "NULLABLE",
        "fields": [
          {
            "name": "readers",
            "type": "RECORD",
            "mode": "REPEATED",
            "fields": [
              {
                "name": "principals",
                "type": "RECORD",
                "mode": "REPEATED",
                "fields": [
                  {
                    "name": "user_id",
                    "type": "STRING",
                    "mode": "NULLABLE"
                  },
                  {
                    "name": "group_id",
                    "type": "STRING",
                    "mode": "NULLABLE"
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
    
  2. Include your ACL metadata as a column in your BigQuery table.

  3. When following the steps in Create a search data store, enable access control in either the console or using the API:

    • Console: When creating a data store, select This data store contains access control information during data store creation.
    • API: When creating data store, include the flag "aclEnabled": "true" in your JSON payload.
  4. When following the steps for data import in Create a search data store, if using the API, set BigQuerySource.dataSchema to document.

Structured data from BigQuery

When setting up a data store for structured data from BigQuery, you need to set the data store as access controlled and provide ACL metadata using a predefined schema for Vertex AI Search:

  1. When preparing your data, specify the following schema. Don't use a custom schema.

    [
      {
        "name": "id",
        "mode": "REQUIRED",
        "type": "STRING",
        "fields": []
      },
      {
        "name": "jsonData",
        "mode": "NULLABLE",
        "type": "STRING",
        "fields": []
      },
      {
        "name": "acl_info",
        "type": "RECORD",
        "mode": "NULLABLE",
        "fields": [
          {
            "name": "readers",
            "type": "RECORD",
            "mode": "REPEATED",
            "fields": [
              {
                "name": "principals",
                "type": "RECORD",
                "mode": "REPEATED",
                "fields": [
                  {
                    "name": "user_id",
                    "type": "STRING",
                    "mode": "NULLABLE"
                  },
                  {
                    "name": "group_id",
                    "type": "STRING",
                    "mode": "NULLABLE"
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
    
  2. Include your ACL metadata as a column in your BigQuery table.

  3. When following the steps in Create a search data store, enable access control in either the console or using the API:

    • Console: When creating a data store, select This data store contains access control information during data store creation.
    • API: When creating data store, include the flag "aclEnabled": "true" in your JSON payload.
  4. When following the steps for data import in Create a search data store, make sure to do the following:

    • If using the console, then when specifying the kind of data you're uploading, select JSONL for structured data with metadata
    • If using the API, set BigQuerySource.dataSchema to document

Preview results for apps with third-party access control

Previewing results in the console for apps with third-party access control requires you to sign in with your organization's credentials. Follow these steps:

  1. In the Google Cloud console, go to the Search and Conversation page.

    Search and Conversation

  2. Click the name of the search app whose results you want to preview.

  3. Go to the Preview page.

  4. Click Preview with federated identity to go to the federated console.

  5. Enter your workforce pool provider and organization's credentials.

  6. Preview results for your app on the Preview page that appears.

    For more information about previewing your search results, see Get search results.

Grant the Discovery Engine Viewer role

To give users the ability to make search calls, grant the Discovery Engine Viewer role to users in your domain or workforce pool.

For more information about granting roles, see the IAM documentation.

Authorize the search widget

If you want to deploy a search widget for an access-controlled app, follow these steps:

  1. Grant the Discovery Engine Viewer role to users in your domain or workforce pool who need to make search API calls.

  2. Generate authorization tokens to pass to your widget:

  3. Follow the steps in Add a widget with an authorization token to pass the token to your widget.