Set up Memory Bank

Before you work with Vertex AI Agent Engine Memory Bank, you must set up your environment. Note that although Memory Bank is part of Agent Engine, you don't need to deploy your code to Agent Engine Runtime to use Memory Bank.

Set up your Google Cloud project

Every project can be identified in two ways: the project number or the project ID. The PROJECT_NUMBER is automatically created when you create the project, whereas the PROJECT_ID is created by you, or whoever created the project. To set up a project:

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. Enable the Vertex AI API.

    Enable the API

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Verify that billing is enabled for your Google Cloud project.

  7. Enable the Vertex AI API.

    Enable the API

Get the required roles

To get the permissions that you need to use Vertex AI Agent Engine, ask your administrator to grant you the Vertex AI User (roles/aiplatform.user) IAM role on your project. For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

If you're making requests to Memory Bank from an agent deployed on Google Kubernetes Engine or Cloud Run, make sure that your service account has the necessary permissions. The Reasoning Engine Service Agent already has the necessary permissions to read and write memories, so outbound requests from Agent Engine Runtime should already have permission to access Memory Bank.

Set up your environment

This section assumes that you have set up a Python development environment, or are using a runtime with a Python development environment (such as Colab).

Install libraries

Install the Vertex AI SDK:

  pip install google-cloud-aiplatform>=1.104.0

Authentication

Authentication instructions depend on whether you're using Vertex AI in express mode:

  • If you're not using Vertex AI in express mode, follow the instructions at Authenticate to Vertex AI.

  • If you're using Vertex AI in express mode, set up authentication by setting the API key in the environment:

      os.environ["GOOGLE_API_KEY"] = "API_KEY"
    

Set up a Vertex AI SDK client

Run the following code to set up a Vertex AI SDK client:

import vertexai

client = vertexai.Client(
    project="PROJECT_ID",
    location="LOCATION",
)

where

  • PROJECT_ID is your project ID.
  • LOCATION is one of the supported regions for Memory Bank.

Configure your Agent Engine instance for Memory Bank

To get started with Memory Bank, you first need an Agent Engine instance.

You can do one of the following:

Use an existing instance

If you don't need to modify an existing Agent Engine instance, run the following to configure the instance for Memory Bank:

agent_engine = client.agent_engines.get(name="AGENT_ENGINE_NAME")

Replace the following:

  • AGENT_ENGINE_NAME: The name of the Agent Engine. It should be in the format projects/.../locations/.../reasoningEngines/.... See the supported regions for Memory Bank.

You can use the instance in any environment, including Google Kubernetes Engine and Cloud Run. To get started, you need the Agent Engine name that identifies the Memory Bank and sufficient permission to call Memory Bank.

Create or update an instance

Create

Memory Bank is enabled by default when you create an Agent Engine instance. Creating a new Agent Engine without a Runtime should only take a few seconds.

  agent_engine = client.agent_engines.create()

You can also override Agent Engine's defaults when creating an Agent Engine instance to make the following modifications:

  • Set the configuration for how Memory Bank generates and manages memories.

  • Deploy your agent to Agent Engine Runtime.

    agent_engine = client.agent_engines.create(
          # Optional. Set this argument if you want to deploy to Agent Engine Runtime.
          agent_engine=...,
          # Optional. Set this argument if you want to change the Memory Bank configuration.
          config=...
    )
    

    New instances are empty until you create or generate memories.

    You need the name of your Agent Engine to read or write memories:

    agent_engine_name = agent_engine.api_resource.name
    

Update

You can update an existing Agent Engine instance if you want to update the Agent Engine while still persisting the memories that were stored in the instance. You can make updates like changing the Memory Bank configuration or deploying your agent to Agent Engine Runtime.

  agent_engine = client.agent_engines.update(
        # If you have an existing AgentEngine, you can access the name using `agent_engine.api_resource.name`.
        name="AGENT_ENGINE_NAME",
        # Optional. Set this argument if you want to deploy to Agent Engine Runtime.
        agent_engine=...,
        # Optional. Set this argument if you want to change the Memory Bank configuration.
        config=...
  )

Replace the following:

  • AGENT_ENGINE_NAME: The name of the Agent Engine. It should be in the format projects/.../locations/.../reasoningEngines/.... See the supported regions for Memory Bank.

Set your Memory Bank configuration

You can configure your Memory Bank to customize how memories are generated and managed. If the configuration is not provided, then Memory Bank uses the default settings for each type of configuration.

The Memory Bank configuration is set when creating or updating your Agent Engine instance:

client.agent_engines.create(
      ...,
      config={
            "context_spec": {
                  "memory_bank_config": memory_bank_config
            }
      }
)

# Alternatively, update an existing Agent Engine's Memory Bank config.
agent_engine = client.agent_engines.update(
      name=agent_engine.api_resource.name,
      config={
          "context_spec": {
                "memory_bank_config": memory_bank_config
          }
      }
)

You can configure the following settings for your instance:

Customization configuration

If you want to customize how memories are extracted from your source data, you can configure the memory extraction behavior when setting up your Agent Engine instance. There are two levers that you can use for customization:

  • Configuring memory topics: Define the type of information that Memory Bank should consider meaningful to persist. Only information that fits one of these memory topics will be persisted by Memory Bank.
  • Providing few-shot examples: Demonstrate expected behavior for memory extraction to Memory Bank.

You can optionally configure different behavior for different scope-levels. For example, the topics that are meaningful for session-level memories may not be meaningful for user-level memories (across multiple sessions). To configure behavior for a certain subset of memories, set the scope keys of the customization configuration. Only GenerateMemories requests that include those scope keys will use that configuration. You can also configure default behavior (applying to all sets of scope keys) by omitting the scope_key field. This configuration will apply to all requests that don't have a configuration that exactly match the scope keys for another customization configuration.

For example, the user_level_config would only apply to GenerateMemories requests that exactly use the scope key user_id (i.e. scope={"user_id": "123"} with no additional keys). default_config would apply to other requests:

Dictionary


user_level_config = {
  "scope_keys": ["user_id"],
  "memory_topics": [...],
  "generate_memories_examples": [...]
}

default_config = {
  "memory_topics": [...],
  "generate_memories_examples": [...]
}

config = {
  "customization_configs": [
    user_level_config,
    default_config
  ]
}

Class-based

from vertexai.types import MemoryBankCustomizationConfig as CustomizationConfig

user_level_config = CustomizationConfig(
  scope_keys=["user_id"],
  memory_topics=[...],
  generate_memories_examples=[...]
)
Configuring memory topics

"Memory topics" identify what information Memory Bank considers to be meaningful and should thus be persisted as generated memories. Memory Bank supports two types of memory topics:

  • Managed topics: Label and instructions are defined by Memory Bank. You only need to provide the name of the managed topic. For example,

    Dictionary

    memory_topic = {
      "managed_memory_topic": {
        "managed_topic_enum": "USER_PERSONAL_INFO"
      }
    }
    

    Class-based

    from vertexai.types import ManagedTopicEnum
    from vertexai.types import MemoryBankCustomizationConfigMemoryTopic as MemoryTopic
    from vertexai.types import MemoryBankCustomizationConfigMemoryTopicManagedMemoryTopic as ManagedMemoryTopic
    
    memory_topic = MemoryTopic(
        managed_memory_topic=ManagedMemoryTopic(
            managed_topic_enum=ManagedTopicEnum.USER_PERSONAL_INFO
        )
    )
    

    The following managed topics are supported by Memory Bank:

    • Personal information (USER_PERSONAL_INFO): Significant personal information about the user, like names, relationships, hobbies, and important dates. For example, "I work at Google" or "My wedding anniversary is on December 31".
    • User preferences (USER_PREFERENCES): Stated or implied likes, dislikes, preferred styles, or patterns. For example, "I prefer the middle seat."
    • Key conversation events and task outcomes (KEY_CONVERSATION_DETAILS): Important milestones or conclusions within the dialogue. For example, "I booked plane tickets for a round trip between JFK and SFO. I leave on June 1, 2025 and return on June 7, 2025."
    • Explicit remember / forget instructions (EXPLICIT_INSTRUCTIONS): Information that the user explicitly asks the agent to remember or forget. For example, if the user says "Remember that I primarily use Python," Memory Bank generates a memory such as "I primarily use Python."
  • Custom topics: Label and instructions are defined by you when setting up your Memory Bank instance. They will be used in the prompt for Memory Bank's extraction step. For example,

    Dictionary

    memory_topic = {
      "custom_memory_topic": {
        "label": "business_feedback",
        "description": """Specific user feedback about their experience at
    the coffee shop. This includes opinions on drinks, food, pastries, ambiance,
    staff friendliness, service speed, cleanliness, and any suggestions for
    improvement."""
      }
    }
    

    Class-based

    from vertexai.types import MemoryBankCustomizationConfigMemoryTopic as MemoryTopic
    from vertexai.types import MemoryBankCustomizationConfigMemoryTopicCustomMemoryTopic as CustomMemoryTopic
    
    memory_topic = MemoryTopic(
      custom_memory_topic=CustomMemoryTopic(
        label="business_feedback",
        description="""Specific user feedback about their experience at
    the coffee shop. This includes opinions on drinks, food, pastries, ambiance,
    staff friendliness, service speed, cleanliness, and any suggestions for
    improvement."""
      )
    )
    

    When using custom topics, it's recommended to also provide few-shot examples demonstrating how memories should be extracted from your conversation.

With customization, you can use any combination of memory topics. For example, you can use a subset of the available managed memory topics:

Dictionary

{
  "memory_topics": [
    "managed_memory_topic": { "managed_topic_enum": "USER_PERSONAL_INFO" },
    "managed_memory_topic": { "managed_topic_enum": "USER_PREFERENCES" }
  ]
}

Class-based

from vertexai.types import MemoryBankCustomizationConfig as CustomizationConfig
from vertexai.types import MemoryBankCustomizationConfigMemoryTopic as MemoryTopic
from vertexai.types import MemoryBankCustomizationConfigMemoryTopicManagedMemoryTopic as ManagedMemoryTopic
from vertexai.types import ManagedTopicEnum

CustomizationConfig(
  memory_topics=[
      MemoryTopic(
          managed_memory_topic=ManagedMemoryTopic(
              managed_topic_enum=ManagedTopicEnum.USER_PERSONAL_INFO)
      ),
      MemoryTopic(
          managed_memory_topic=ManagedMemoryTopic(
              managed_topic_enum=ManagedTopicEnum.USER_PREFERENCES)
      ),
  ]
)

You can also use a combination of managed and custom topics (or only use custom topics):

Dictionary

{
  "memory_topics": [
    "managed_memory_topic": { "managed_topic_enum": "USER_PERSONAL_INFO" },
    "custom_memory_topic": {
      "label": "business_feedback",
      "description": """Specific user feedback about their experience at
the coffee shop. This includes opinions on drinks, food, pastries, ambiance,
staff friendliness, service speed, cleanliness, and any suggestions for
improvement."""
        }
  ]
}

Class-based

from vertexai.types import MemoryBankCustomizationConfig as CustomizationConfig
from vertexai.types import MemoryBankCustomizationConfigMemoryTopic as MemoryTopic
from vertexai.types import MemoryBankCustomizationConfigMemoryTopicCustomMemoryTopic as CustomMemoryTopic
from vertexai.types import MemoryBankCustomizationConfigMemoryTopicManagedMemoryTopic as ManagedMemoryTopic
from vertexai.types import ManagedTopicEnum

CustomizationConfig(
  memory_topics=[
      MemoryTopic(
          managed_memory_topic=ManagedMemoryTopic(
              managed_topic_enum=ManagedTopicEnum.USER_PERSONAL_INFO)
      ),
      MemoryTopic(
          custom_memory_topic=CustomMemoryTopic(
              label="business_feedback",
              description="""Specific user feedback about their experience at
the coffee shop. This includes opinions on drinks, food, pastries, ambiance,
staff friendliness, service speed, cleanliness, and any suggestions for
improvement."""
          )
    )
  ]
)
Few-shot examples

Few-shot examples allow you to demonstrate expected memory extraction behavior to Memory Bank. For example, you can provide a sample input conversation and the memories that are expected to be extracted from that conversation.

We recommend always using few-shots with custom topics so that Memory Bank can learn the intended behavior. Few-shots are optional when using managed topics since Memory Bank defines examples for each topic. Demonstrate conversations that are not expected to result in memories by providing an empty generated_memories list.

For example, you can provide few-shot examples that demonstrate how to extract feedback about your business from customer messages:

Dictionary

example = {
    "conversationSource": {
      "events": [
        {
          "content": {
            "role": "model",
            "parts": [{ "text": "Welcome back to The Daily Grind! We'd love to hear your feedback on your visit." }] }
        },
        {
          "content": {
            "role": "user",
            "parts": [{ "text": "Hey. The drip coffee was a bit lukewarm today, which was a bummer. Also, the music was way too loud, I could barely hear my friend." }] }
        }
      ]
    },
    "generatedMemories": [
      {
        "fact": "The user reported that the drip coffee was lukewarm."
      },
      {
        "fact": "The user felt the music in the shop was too loud."
      }
    ]
}

Class-based

from google.genai.types import Content, Part
from vertexai.types import MemoryBankCustomizationConfigGenerateMemoriesExample as GenerateMemoriesExample
from vertexai.types import MemoryBankCustomizationConfigGenerateMemoriesExampleConversationSource as ConversationSource
from vertexai.types import MemoryBankCustomizationConfigGenerateMemoriesExampleConversationSourceEvent as ConversationSourceEvent
from vertexai.types import MemoryBankCustomizationConfigGenerateMemoriesExampleGeneratedMemory as ExampleGeneratedMemory

example = GenerateMemoriesExample(
    conversation_source=ConversationSource(
        events=[
            ConversationSourceEvent(
                content=Content(
                    role="model",
                    parts=[Part(text="Welcome back to The Daily Grind! We'd love to hear your feedback on your visit.")]
                )
            ),
            ConversationSourceEvent(
                content=Content(
                    role="user",
                    parts=[Part(text= "Hey. The drip coffee was a bit lukewarm today, which was a bummer. Also, the music was way too loud, I could barely hear my friend.")]
                )
            )
        ]
    ),
    generated_memories=[
        ExampleGeneratedMemory(
            fact="The user reported that the drip coffee was lukewarm."
        ),
        ExampleGeneratedMemory(
            fact="The user felt the music in the shop was too loud."
        )
    ]
)

You can also provide examples of conversations that shouldn't result in any generated memories by providing an empty list for the expected output (generated_memories):

Dictionary

example = {
    "conversationSource": {
        "events": [
          {
              "content": {
                  "role": "model",
                  "parts": [{ "text": "Good morning! What can I get for you at The Daily Grind?" }] }
          },
          {
              "content": {
                  "role": "user",
                  "parts": [{ "text": "Thanks for the coffee." }] }
          }
        ]
    },
    "generatedMemories": []
}

Class-based

from google.genai.types import Content, Part
from vertexai.types import MemoryBankCustomizationConfigGenerateMemoriesExample as GenerateMemoriesExample
from vertexai.types import MemoryBankCustomizationConfigGenerateMemoriesExampleConversationSource as ConversationSource
from vertexai.types import MemoryBankCustomizationConfigGenerateMemoriesExampleConversationSourceEvent as ConversationSourceEvent

example = GenerateMemoriesExample(
    conversation_source=ConversationSource(
        events=[
            ConversationSourceEvent(
                content=Content(
                    role="model",
                    parts=[Part(text="Welcome back to The Daily Grind! We'd love to hear your feedback on your visit.")]
                )
            ),
            ConversationSourceEvent(
                content=Content(
                    role="user",
                    parts=[Part(text= "Thanks for the coffee!")]
                )
            )
        ]
    ),
    generated_memories=[]
)
Similarity search configuration

The similarity search configuration controls which embedding model is used by your instance for similarity search. Similarity search is used for identifying which memories should be candidates for consolidation and for similarity search-based memory retrieval. If this configuration is not provided, Memory Bank uses text-embedding-005 as the default model.

If you expect user conversations to be in non-English languages, use a model that supports multiple languages, such as gemini-embedding-001 or text-multilingual-embedding-002, to improve retrieval quality.

Dictionary

memory_bank_config = {
    "similarity_search_config": {
        "embedding_model": "EMBEDDING_MODEL",
    }
}

Class-based

from vertexai.types import ReasoningEngineContextSpecMemoryBankConfig as MemoryBankConfig
from vertexai.types import ReasoningEngineContextSpecMemoryBankConfigSimilaritySearchConfig as SimilaritySearchConfig

memory_bank_config = MemoryBankConfig(
    similarity_search_config=SimilaritySearchConfig(
        embedding_model="EMBEDDING_MODEL"
    )
)

Replace the following:

  • EMBEDDING_MODEL: The Google text embedding model to use for similarity search, in the format projects/{project}/locations/{location}/publishers/google/models/{model}.
Generation configuration

The generation configuration controls which LLM is used for generating memories, including extracting memories and consolidating new memories with existing memories.

Memory Bank uses gemini-2.5-flash as the default model.

Dictionary

memory_bank_config = {
      "generation_config": {
            "model": "LLM_MODEL",
      }
}

Class-based

from vertexai.types import ReasoningEngineContextSpecMemoryBankConfig as MemoryBankConfig
from vertexai.types import ReasoningEngineContextSpecMemoryBankConfigGenerationConfig as GenerationConfig

memory_bank_config = MemoryBankConfig(
    generation_config=GenerationConfig(
      model="LLM_MODEL"
    )
)

Replace the following:

  • LLM_MODEL: The Google LLM model to use for extracting and consolidating memories, in the format projects/{project}/locations/{location}/publishers/google/models/{model}.
Time to live (TTL) configuration

The TTL configuration controls how Memory Bank should dynamically set memories' expiration time. After their expiration time elapses, memories won't be available for retrieval and will be deleted.

If the configuration is not provided, expiration time won't be dynamically set for created or updated memories, so memories won't expire unless their expiration time is manually set.

There are two options for the TTL configuration:

  • Default TTL: The TTL will be applied to all operations that create or update a memory, including UpdateMemory, CreateMemory, and GenerateMemories.

    Dictionary

    memory_bank_config = {
        "ttl_config": {
            "default_ttl": f"TTLs"
        }
    }
    

    Class-based

    from vertexai.types import ReasoningEngineContextSpecMemoryBankConfig as MemoryBankConfig
    from vertexai.types import ReasoningEngineContextSpecMemoryBankConfigTtlConfig as TtlConfig
    
    memory_bank_config = MemoryBankConfig(
        ttl_config=TtlConfig(
            default_ttl=f"TTLs"
        )
    )
    

    Replace the following:

    • TTL: The duration in seconds for the TTL. For updated memories, the newly calculated expiration time (now + TTL) will overwrite the Memory's previous expiration time.
  • Granular (per-operation) TTL: The TTL is calculated based on which operation created or updated the Memory. If not set for a given operation, then the operation won't update the Memory's expiration time.

    Dictionary

    memory_bank_config = {
        "ttl_config": {
            "granular_ttl": {
                "create_ttl": f"CREATE_TTLs",
                "generate_created_ttl": f"GENERATE_CREATED_TTLs",
                "generate_updated_ttl": f"GENERATE_UPDATED_TTLs"
            }
        }
    }
    

    Class-based

    from vertexai.types import ReasoningEngineContextSpecMemoryBankConfig as MemoryBankConfig
    from vertexai.types import ReasoningEngineContextSpecMemoryBankConfigTtlConfig as TtlConfig
    from vertexai.types import ReasoningEngineContextSpecMemoryBankConfigTtlConfigGranularTtlConfig as GranularTtlConfig
    
    memory_bank_config = MemoryBankConfig(
        ttl_config=TtlConfig(
            granular_ttl_config=GranularTtlConfig(
                create_ttl=f"CREATE_TTLs",
                generate_created_ttl=f"GENERATE_CREATED_TTLs",
                generate_updated_ttl=f"GENERATE_UPDATED_TTLs",
            )
        )
    )
    

    Replace the following:

    • CREATE_TTL: The duration in seconds for the TTL for memories created using CreateMemory.
    • GENERATE_CREATED_TTL: The duration in seconds for the TTL for memories created using GeneratedMemories.
    • GENERATE_UPDATED_TTL: The duration in seconds for the TTL for memories updated using GeneratedMemories. The newly calculated expiration time (now + TTL) will overwrite the Memory's previous expiration time.

Deploy your agent with memory to Agent Engine

Although Memory Bank can be used in any runtime, you can also use Memory Bank with Agent Engine Runtime to read and write memories from your deployed agent.

To deploy an agent with Memory Bank on Vertex AI Agent Engine Runtime, first set up your environment for Agent Engine runtime. Then, prepare your agent to be deployed on Agent Engine Runtime with memory integration. Your deployed agent should make calls to read and write memories as needed.

AdkApp

If you're using the Agent Engine Agent Development Kit template, the agent uses the VertexAiMemoryBankService by default when deployed to Agent Engine Runtime. This means that the ADK Memory tools read memories from Memory Bank.

from google.adk.agents import Agent
from vertexai.preview.reasoning_engines import AdkApp

# Develop an agent using the ADK template.
agent = Agent(...)

adk_app = AdkApp(
      agent=adk_agent,
      ...
)

# Deploy the agent to Agent Engine Runtime.
agent_engine = client.agent_engines.create(
      agent_engine=adk_app,
      config={
            "staging_bucket": "STAGING_BUCKET",
            "requirements": ["google-cloud-aiplatform[agent_engines,adk]"],
            # Optional.
            **context_spec
      }
)

# Update an existing Agent Engine to add or modify the Runtime.
agent_engine = client.agent_engines.update(
      name=agent_engine.api_resource.name,
      agent_engine=adk_app,
      config={
            "staging_bucket": "STAGING_BUCKET",
            "requirements": ["google-cloud-aiplatform[agent_engines,adk]"],
            # Optional.
            **context_spec
      }
)

Replace the following:

  • STAGING_BUCKET: Your Cloud Storage bucket to use for staging your Agent Engine Runtime.

For more information about using Memory Bank with ADK, refer to the Quickstart with Agent Development Kit.

Custom agent

You can use Memory Bank with your custom agent deployed on Agent Engine Runtime. In this case, your agent should orchestrate calls to Memory Bank to trigger memory generation and memory retrieval calls.

If you want to use the same Agent Engine instance for both Memory Bank and the Agent Engine Runtime, you can read the environment variables GOOGLE_CLOUD_PROJECT, GOOGLE_CLOUD_LOCATION,GOOGLE_CLOUD_AGENT_ENGINE_ID to infer the Agent Engine name from the environment:

project = os.environ.get("GOOGLE_CLOUD_PROJECT")
location = os.environ.get("GOOGLE_CLOUD_LOCATION")
agent_engine_id = os.environ.get("GOOGLE_CLOUD_AGENT_ENGINE_ID")

agent_engine_name = f"projects/{project}/locations/{location}/reasoningEngines/{agent_engine_id}"

What's next