Use a LangChain agent

Before you begin

This tutorial assumes that you have read and followed the instructions in:

Get an instance of an agent

To query a LangchainAgent, you need to first create a new instance or get an existing instance.

To get the LangchainAgent corresponding to a specific resource ID:

Vertex AI SDK for Python

Run the following code:

import vertexai

client = vertexai.Client(  # For service interactions via client.agent_engines
    project="PROJECT_ID",
    location="LOCATION",
)

agent = client.agent_engines.get(name="projects/PROJECT_ID/locations/LOCATION/reasoningEngines/RESOURCE_ID")

print(agent)

where

Python requests library

Run the following code:

from google import auth as google_auth
from google.auth.transport import requests as google_requests
import requests

def get_identity_token():
    credentials, _ = google_auth.default()
    auth_request = google_requests.Request()
    credentials.refresh(auth_request)
    return credentials.token

response = requests.get(
f"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/reasoningEngines/RESOURCE_ID",
    headers={
        "Content-Type": "application/json; charset=utf-8",
        "Authorization": f"Bearer {get_identity_token()}",
    },
)

REST API

curl \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/reasoningEngines/RESOURCE_ID

When using the Vertex AI SDK for Python, the agent object corresponds to an AgentEngine class that contains the following:

  • an agent.api_resource with information about the deployed agent. You can also call agent.operation_schemas() to return the list of operations that the agent supports. See Supported operations for details.
  • an agent.api_client that allows for synchronous service interactions
  • an agent.async_api_client that allows for asynchronous service interactions

The rest of this section assumes that you have an AgentEngine instance, named as agent.

Supported operations

The following operations are supported:

  • query: for getting a response to a query synchronously.
  • stream_query: for streaming a response to a query.

Both query and stream_query methods support the same type of arguments:

  • input: the messages to be sent to the agent.
  • config: the configuration (if applicable) for the context of the query.

Query the agent

The command:

agent.query(input="What is the exchange rate from US dollars to SEK today?")

is equivalent to the following (in full form):

agent.query(input={
    "input": [ # The input is represented as a list of messages (each message as a dict)
        {
            # The role (e.g. "system", "user", "assistant", "tool")
            "role": "user",
            # The type (e.g. "text", "tool_use", "image_url", "media")
            "type": "text",
            # The rest of the message (this varies based on the type)
            "text": "What is the exchange rate from US dollars to Swedish currency?",
        },
    ]
})

Roles are used to help the model distinguish between different types of messages when responding. When the role is omitted in the input, it defaults to "user".

Role Description
system Used to tell the chat model how to behave and provide additional context. Not supported by all chat model providers.
user Represents input from a user interacting with the model, usually in the form of text or other interactive input.
assistant Represents a response from the model, which can include text or a request to invoke tools.
tool A message used to pass the results of a tool invocation back to the model after external data or processing has been retrieved.

The type of the message will also determine how the rest of the message is interpreted (see Handle multi-modal content).

Query the agent with multi-modal content

We will use the following agent (which forwards the input to the model and does not use any tools) to illustrate how to pass in multimodal inputs to an agent:

agent = agent_engines.LangchainAgent(
    model="gemini-2.0-flash",
    runnable_builder=lambda model, **kwargs: model,
)

Multimodal messages are represented through content blocks that specify a type and corresponding data. In general, for multimodal content, you would specify the type to be "media", the file_uri to point to a Cloud Storage URI, and the mime_type for interpreting the file.

Image

agent.query(input={"input": [
    {"type": "text", "text": "Describe the attached media in 5 words!"},
    {"type": "media", "mime_type": "image/jpeg", "file_uri": "gs://cloud-samples-data/generative-ai/image/cricket.jpeg"},
]})

Video

agent.query(input={"input": [
    {"type": "text", "text": "Describe the attached media in 5 words!"},
    {"type": "media", "mime_type": "video/mp4", "file_uri": "gs://cloud-samples-data/generative-ai/video/pixel8.mp4"},
]})

Audio

agent.query(input={"input": [
    {"type": "text", "text": "Describe the attached media in 5 words!"},
    {"type": "media", "mime_type": "audio/mp3", "file_uri": "gs://cloud-samples-data/generative-ai/audio/pixel.mp3"},
]})

For the list of MIME types supported by Gemini, visit the documentation on:

Query the agent with a runnable configuration

When querying the agent, you can also specify a config for the agent (which follows the schema of a RunnableConfig). Two common scenarios are:

  • Default configuration parameters:
  • Custom configuration parameters (via configurable):

As an example:

import uuid

run_id = uuid.uuid4()  # Generate an ID for tracking the run later.

response = agent.query(
    input="What is the exchange rate from US dollars to Swedish currency?",
    config={  # Specify the RunnableConfig here.
        "run_id": run_id                               # Optional.
        "tags": ["config-tag"],                        # Optional.
        "metadata": {"config-key": "config-value"},    # Optional.
        "configurable": {"session_id": "SESSION_ID"}   # Optional.
    },
)

print(response)

What's next