Update a schema

You can update the schema for any data containing data that supports a schema, such as structured data, website data with structured data, or other unstructured data with metadata.

You can update the schema in the Google Cloud console or by using the schemas.patch API method. Updating the schema for a website is supported only over the REST API.

To update the schema, you can add new fields, change indexable, searchable, and retrievable annotations for a field, or mark a field as a key property, such as title, uri and description.

Update your schema

You can update your schema in the Google Cloud console or using the API.

Console

To update a schema in the Google Cloud console, follow these steps:

  1. Review the Requirements and limitations section to check that your schema update is valid.

  2. If you are updating field annotations (setting fields as indexable, retrievable, dynamic facetable, searchable, or completable), review Configure field settings for the limitations and requirements of each annotation type.

  3. Check that you have completed data ingestion. Otherwise, the schema might not be available to edit yet.

  4. In the Google Cloud console, go to the Agent Builder page.

    Agent Builder

  5. In the navigation menu, click Data Stores.

  6. In the Name column, click the data store with the schema that you want to update.

  7. Click the Schema tab to view the schema for your data.

    This tab might be empty if this is the first time you're editing the fields.

  8. Click the Edit button.

  9. Update your schema:

    • Map key properties: In the Key properties column of your schema, select a key property to map a field to. For example, if a field called details always contains the description of a document, map that field to the key property Description.

    • Update number of dimensions (Advanced): You can update this setting if you are using custom vector embeddings with Vertex AI Search. See Advanced: Use custom embeddings.

    • Update field annotations: To update annotations for a field, select or deselect a field's annotation setting. Available annotations are Retrievable, Indexable, Dynamic Facetable, Searchable, and Completable. Some field settings have limitations. See Configure field settings for descriptions and requirements for each annotation type.

    • Add a new field: Adding new fields to your schema before importing new documents with those fields can shorten the time it takes Vertex AI Agent Builder to reindex your data after import.

      1. Click Add new fields to expand that section.

      2. Click add_box Add node and specify settings for the new field.

        To indicate an array, set Array to Yes. For example, to add an array of strings, set type to string and Array to Yes.

        For a website data store index, all fields that you add are arrays by default.

  10. Click Save to apply your schema changes.

    Changing the schema triggers reindexing. For large data stores, reindexing can take hours.

REST

To use the API to update your schema, follow these steps:

  1. Review the Requirements and limitations and the Limitation examples (REST only) sections to check that your schema changes are valid.

    To update the schema for data stores with websites or unstructured data with metadata, skip to Step 5 to call the schema.patch method.

  2. If you are updating field annotations (setting fields as indexable, retrievable, dynamic facetable, or searchable), review Configure field settings for the limitations and requirements of each annotation type.

  3. If you are editing an auto-detected schema, make sure that you have completed data ingestion. Otherwise, the schema might not be available to edit yet.

  4. Find your data store ID. If you already have your data store ID, skip to the next step.

    1. In the Google Cloud console, go to the Agent Builder page and in the navigation menu, click Data Stores.

      Go to the Data Stores page

    2. Click the name of your data store.

    3. On the Data page for your data store, get the data store ID.

  5. Use the schemas.patch API method to provide your new JSON schema as a JSON object.

    curl -X PATCH \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/schemas/default_schema" \
    -d '{
      "structSchema": JSON_SCHEMA_OBJECT
    }'
    

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • DATA_STORE_ID: the ID of the Vertex AI Search data store.
    • JSON_SCHEMA_OBJECT: your new JSON schema as a JSON object. For example:

      {
        "$schema": "https://json-schema.org/draft/2020-12/schema",
        "type": "object",
        "properties": {
          "title": {
            "type": "string",
            "keyPropertyMapping": "title"
          },
          "categories": {
            "type": "array",
            "items": {
              "type": "string",
              "keyPropertyMapping": "category"
            }
          },
          "uri": {
            "type": "string",
            "keyPropertyMapping": "uri"
          }
        }
      }
  6. Optional: Review the schema by following the procedure View a schema definition.

C#

For more information, see the Vertex AI Agent Builder C# API reference documentation.

To authenticate to Vertex AI Agent Builder, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;

public sealed partial class GeneratedSchemaServiceClientSnippets
{
    /// <summary>Snippet for UpdateSchema</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void UpdateSchemaRequestObject()
    {
        // Create client
        SchemaServiceClient schemaServiceClient = SchemaServiceClient.Create();
        // Initialize request argument(s)
        UpdateSchemaRequest request = new UpdateSchemaRequest
        {
            Schema = new Schema(),
            AllowMissing = false,
        };
        // Make the request
        Operation<Schema, UpdateSchemaMetadata> response = schemaServiceClient.UpdateSchema(request);

        // Poll until the returned long-running operation is complete
        Operation<Schema, UpdateSchemaMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        Schema result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<Schema, UpdateSchemaMetadata> retrievedResponse = schemaServiceClient.PollOnceUpdateSchema(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            Schema retrievedResult = retrievedResponse.Result;
        }
    }
}

Go

For more information, see the Vertex AI Agent Builder Go API reference documentation.

To authenticate to Vertex AI Agent Builder, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewSchemaClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.UpdateSchemaRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#UpdateSchemaRequest.
	}
	op, err := c.UpdateSchema(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Java

For more information, see the Vertex AI Agent Builder Java API reference documentation.

To authenticate to Vertex AI Agent Builder, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import com.google.cloud.discoveryengine.v1.Schema;
import com.google.cloud.discoveryengine.v1.SchemaServiceClient;
import com.google.cloud.discoveryengine.v1.UpdateSchemaRequest;

public class SyncUpdateSchema {

  public static void main(String[] args) throws Exception {
    syncUpdateSchema();
  }

  public static void syncUpdateSchema() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (SchemaServiceClient schemaServiceClient = SchemaServiceClient.create()) {
      UpdateSchemaRequest request =
          UpdateSchemaRequest.newBuilder()
              .setSchema(Schema.newBuilder().build())
              .setAllowMissing(true)
              .build();
      Schema response = schemaServiceClient.updateSchemaAsync(request).get();
    }
  }
}

Python

For more information, see the Vertex AI Agent Builder Python API reference documentation.

To authenticate to Vertex AI Agent Builder, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

# This snippet has been automatically generated and should be regarded as a
# code template only.
# It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
#   client as shown in:
#   https://googleapis.dev/python/google-api-core/latest/client_options.html
from google.cloud import discoveryengine_v1


def sample_update_schema():
    # Create a client
    client = discoveryengine_v1.SchemaServiceClient()

    # Initialize request argument(s)
    request = discoveryengine_v1.UpdateSchemaRequest(
    )

    # Make the request
    operation = client.update_schema(request=request)

    print("Waiting for operation to complete...")

    response = operation.result()

    # Handle the response
    print(response)

Ruby

For more information, see the Vertex AI Agent Builder Ruby API reference documentation.

To authenticate to Vertex AI Agent Builder, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

require "google/cloud/discovery_engine/v1"

##
# Snippet for the update_schema call in the SchemaService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::SchemaService::Client#update_schema.
#
def update_schema
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::SchemaService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::UpdateSchemaRequest.new

  # Call the update_schema method.
  result = client.update_schema request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

Requirements and limitations

When updating a schema, be sure that the new schema is backward compatible with the schema you are updating. To update a schema with a new schema that is not backward compatible, you need to delete all the documents in the data store, delete the schema, and create a new schema.

Updating a schema triggers re-indexing of all documents. This can take time and incur additional costs:

  • Time. Reindexing a large data store can take hours or days.

  • Expense. Reindexing can incur costs, depending on the parser. For example, reindexing data stores that use the OCR parser or the layout parser both incur costs. For more information, see Document AI feature pricing.

Schema updates don't support the following:

  • Changing a field type. A schema update doesn't support changing the type of the field. For example, a field mapped to integer cannot be changed to string.
  • Removing a field. Once defined, a field cannot be removed. You can continue adding new fields but you cannot remove an existing field.

Limitation examples (REST only)

This section shows examples of valid and invalid types of schema updates. These examples use the following example JSON schema:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "title": {
      "type": "string"
    },
    "description": {
      "type": "string",
      "keyPropertyMapping": "description"
    },
    "categories": {
      "type": "array",
      "items": {
        "type": "string",
        "keyPropertyMapping": "category"
      }
    }
  }
}

Examples of supported updates

The following updates to the example schema are supported.

  • Adding a field. In this example, the field properties.uri has been added to the schema.

    {
      "$schema": "https://json-schema.org/draft/2020-12/schema",
      "type": "object",
      "properties": {
        "title": {
          "type": "string"
        },
        "description": {
          "type": "string",
          "keyPropertyMapping": "description"
        },
        "uri": { // Added field. This is supported.
          "type": "string",
          "keyPropertyMapping": "uri"
        },
        "categories": {
          "type": "array",
          "items": {
            "type": "string",
            "keyPropertyMapping": "category"
          }
        }
      }
    }
    
  • Adding or removing key property annotations for title, description or uri. In this example, keyPropertyMapping has been added to the title field.

    {
      "$schema": "https://json-schema.org/draft/2020-12/schema",
      "type": "object",
      "properties": {
        "title": {
          "type": "string",
          "keyPropertyMapping": "title" // Added "keyPropertyMapping". This is supported.
        },
        "description": {
          "type": "string",
          "keyPropertyMapping": "description"
        },
        "categories": {
          "type": "array",
          "items": {
            "type": "string",
            "keyPropertyMapping": "category"
          }
        }
      }
    }
    

Examples of invalid schema updates

The following updates to the example schema aren't supported.

  • Changing a field type. In this example, the title field's type has been changed from string to number. This is not supported.

      {
        "$schema": "https://json-schema.org/draft/2020-12/schema",
        "type": "object",
        "properties": {
          "title": {
            "type": "number" // Changed from string. Not allowed.
          },
          "description": {
            "type": "string",
            "keyPropertyMapping": "description"
          },
          "categories": {
            "type": "array",
            "items": {
              "type": "string",
              "keyPropertyMapping": "category"
            }
          }
        }
      }
    
  • Removing a field. In this example, the title field has been removed. This is not supported.

      {
        "$schema": "https://json-schema.org/draft/2020-12/schema",
        "type": "object",
        "properties": {
          // "title" is removed. Not allowed.
          "description": {
            "type": "string",
            "keyPropertyMapping": "description"
          },
          "uri": {
            "type": "string",
            "keyPropertyMapping": "uri"
          },
          "categories": {
            "type": "array",
            "items": {
              "type": "string",
              "keyPropertyMapping": "category"
            }
          }
        }
      }
    

What's next