LangChain on Vertex AI

LangChain on Vertex AI (Preview) lets you use the LangChain open source library to build custom Generative AI applications and use Vertex AI for models, tools and deployment. With LangChain on Vertex AI (Preview), you can do the following:

  • Select the large language model (LLM) that you want to work with.
  • Define tools to access external APIs.
  • Structure the interface between the user and the system components in an orchestration framework.
  • Deploy the framework to a managed runtime.

Benefits

  • Customizable: By utilizing LangChain's standardized interfaces, LangChain on Vertex AI can be adopted to build different kinds of applications. You can customize the logic of your application and incorporate any framework, providing a high degree of flexibility.
  • Simplifies deployment: LangChain on Vertex AI uses the same APIs as LangChain to interact with LLMs and build applications. LangChain on Vertex AI simplifies and speeds up deployment with Vertex AI LLMs since the Reasoning Engine runtime supports single click deployment to generate compliant API based on your library.
  • Integration with Vertex AI ecosystems: Reasoning Engine for LangChain on Vertex AI uses Vertex AI's infrastructure and prebuilt containers to help you deploy of your LLM application. You can use the Vertex AI API to integrate with Gemini models, Function Calling, and Extensions.
  • Secure, private, and scalable: You can use a single SDK call instead of managing the development process on your own. The Reasoning Engine managed runtime frees you from tasks such as application server development, container creation, and configuration of authentication, IAM, and scaling. Vertex AI handles autoscaling, regional expansion, and container vulnerabilities.

Use cases

To learn about LangChain on Vertex AI with end-to-end examples, see the following resources:

Use Case Description Link(s)
Build generative AI applications by connecting to public APIs Convert between currencies.

Create a function that connects to a currency exchange app, allowing the model to provide accurate answers to queries such as "What's the exchange rate for euros to dollars today?"
Vertex AI SDK for Python notebook - Intro to Building and Deploying an Agent with Reasoning Engine
Designing a community solar project.

Identify potential locations, look up relevant government offices and suppliers, and review satellite images and solar potential of regions and buildings to find the optimal location to install your solar panels.
Vertex AI SDK for Python notebook - Building and Deploying a Google Maps API Agent with Vertex AI Reasoning Engine
Build generative AI applications by connecting to databases Integration with AlloyDB and CloudSQL PostgreSQL. Blog post - Announcing LangChain on Vertex AI for AlloyDB and Cloud SQL for PostgreSQL

Vertex AI SDK for Python notebook - Deploying a RAG Application with Cloud SQL for PostgreSQL to LangChain on Vertex AI

Vertex AI SDK for Python notebook - Deploying a RAG Application with AlloyDB to LangChain on Vertex AI
Query and understand structured datastores using natural language. Vertex AI SDK for Python notebook - Building a Conversational Search Agent with Vertex AI Reasoning Engine and RAG on Vertex AI Search
Query and understand graph databases using natural language Blog post - GenAI GraphRAG and AI agents using Vertex AI Reasoning Engine with LangChain and Neo4j
Query and understand vector stores using natural language Blog post - Simplify GenAI RAG with MongoDB Atlas and Vertex AI Reasoning Engine
Build generative AI applications with OSS frameworks Build and deploy agents using the OneTwo open-source framework. Blog post - OneTwo and Vertex AI Reasoning Engine: exploring advanced AI agent development on Google Cloud
Build and deploy agents using the LangGraph open-source framework. Vertex AI SDK for Python notebook - Building and Deploying a LangGraph Application with Vertex AI Reasoning Engine
Debugging and optimizing generative AI applications Build and trace agents using OpenTelemetry and Cloud Trace. Vertex AI SDK for Python notebook - Debugging and Optimizing Agents: A Guide to Tracing in Vertex AI Reasoning Engine

System components

Building and deploying a custom generative AI application using OSS LangChain and Vertex AI consists of four components:

ComponentDescription
LLM

When you submit a query to your custom application, the LLM processes the query and provides a response.

You can choose to define a set of tools that communicates with external APIs and provide them to the model. While processing a query, the model delegates certain tasks to the tools. This implies one or more model calls to foundation or fine-tuned models.

To learn more, see Model versions and lifecycle.

Tool

You can choose to define a set of tools that communicates with external APIs (for example, a database) and provide them to the model. While processing a query, the model can delegate certain tasks to the tools.

Deployment through Vertex AI's managed runtime is optimized to use tools based on Gemini Function Calling, but supports LangChain Tool/Function Calling. To learn more about Gemini Function Calling, see Function calling.

Orchestration framework

LangChain on Vertex AI lets you use the LangChain orchestration framework in Vertex AI. Use LangChain to decide how deterministic your application should be.

If you already use LangChain, you can use your existing LangChain code to deploy your application on Vertex AI. Otherwise, you can create your own application code and structure it in an orchestration framework that leverages Vertex AI's LangChain templates.

To learn more, see Develop an application.

Managed runtime LangChain on Vertex AI lets you deploy your application to a Reasoning Engine managed runtime. This runtime is a Vertex AI service that has all the benefits of Vertex AI integration: security, privacy, observability, and scalability. You can productionize and scale your application with an API call, quickly turning locally-tested prototypes into enterprise-ready deployments. To learn more, see Deploy an application.

There are many different ways to prototype and build custom Generative AI applications that use agentic capabilities by layering tools and custom functions on top of models like Gemini. When it's time to move your application to production, you need to consider how to deploy and manage your agent and its underlying components.

With the components of LangChain on Vertex AI, the goal is to help you focus on and customize the aspects of the agent capabilities that you care about most, such as custom functions, agent behavior, and model parameters, while Google takes care of deployment, scaling packaging, and versions. If you work at a lower level in the stack, you might need to manage more than you want to. If you work at a higher level in the stack, you might not have as much developer control as you'd like.

System flow at runtime

When the user makes a query, the defined agent formats it into a prompt for the LLM. The LLM processes the prompt and determines whether it wants to use any of the tools.

If the LLM chooses to use a tool, it generates a FunctionCall with the name and parameters that the tool should be called with. The agent invokes the tool with the FunctionCall and provides the results from the tool back to the LLM. If the LLM chooses not to use any tools, it generates content that is relayed by the agent back to the user.

The following diagram illustrates the system flow at runtime:

System flow at runtime 

Create and deploy a generative AI application

The workflow for building a generative AI applications is:

Steps Description
1. Set up the environment Set up your Google project and install the latest version of the Vertex AI SDK for Python.
2. Develop an application Develop a LangChain application that can be deployed on Reasoning Engine.
3. Deploy the application Deploy the application on Reasoning Engine.
4. Use the application Query Reasoning Engine for a response.
5. Manage the deployed application Manage and delete applications that you have deployed to Reasoning Engine.
6. (Optional) Customize an application template Customize a template for new applications.

The steps are illustrated by the following diagram:

Create and deploy a generative AI application 

Pricing

The pricing structure is based on vCPU hours and GiB hours used during request processing, container startup, and container shutdown. This means that you will be charged for both the compute (vCPU) and memory resources consumed by your workloads.

We recommend that you delete unused resources to avoid incurring unwanted costs.

What's next