Build gen AI apps quickly with LangChain VectorStore in Cloud SQL for PostgreSQL
Bala Narasimhan
Group Product Manager, Google Cloud
Kurtis Van Gent
Staff Software Engineer, Google Cloud Databases
We recently announced a suite of LangChain packages for the Google Cloud database portfolio. Each package will have up to three LangChain integrations:
-
Vector stores to enable semantic search for our databases that support vectors
-
Document loaders for loading and saving documents to/from your database
-
Chat Message Memory to enable chains to recall previous conversations
In this blog, we deep dive into the benefits of the VectorStore from our Cloud SQL for PostgreSQL LangChain package, and see how it helps make generative AI application development easy, secure, and flexible.
Security
The Cloud SQL for PostgreSQL LangChain packages come embedded with the Cloud SQL Python connector, which makes connecting securely to your database easy. Developers get the following benefits out of the box:
-
IAM authorization: Uses IAM permissions to control who or what can connect to your Cloud SQL instances
-
Convenience: Removes the requirement to manage SSL certificates, configure firewall rules, or enable authorized networks
-
IAM database authentication: Provides support for Cloud SQL's automatic IAM database authentication feature
Ease of use
Connect with just instance name
Now, you no longer need to construct a connection string with your IP address or pass in a myriad of arguments to connect to your PostgreSQL instance. Instead, the instance name alone will suffice as shown below:
Connection pooling by default
Connection management is an important part of scaling PostgreSQL. Cloud SQL for PostgreSQL’s LangChain packages come automatically configured with an SQLAlchemy connection pool. Our package supports custom configurations as well as reusing the pool in other parts of your application:
Schema flexibility
While the existing langchain-postgres
package offers a VectorStore, it only supports a limited and fixed schema. It uses two tables with fixed names and schemas for all vector stores initialized in a database. Any schema changes require dropping and recreating the table, losing the previous data. Nor does it currently support indexing, and it can only be used for KNN.
Table per collection
In contrast, Cloud SQL for PostgreSQL’s LangChain packages use a different table for each collection of vectors, meaning that schemas can vary as shown below.
Support indexing
The Cloud SQL for PostgreSQL LangChain packages supports ANN to speed up vector search. Below are simple code snippets to create, refresh and drop indexes using the package.
Create index
Re-index
Drop indexes
Custom schemas
The Cloud SQL for PostgreSQL LangChain packages allow you to use different schemas, which means you can both reuse any existing table, as well as more easily migrate from other implementations (such as langchain-postgres package).
Use preexisting table
When initializing a PostgresVectorStore, you can optionally specify the names of the columns that you store things like content, ids, or other metadata fields from your LangChain document. This allows you to leverage existing tables, or tables made from other integrations (such as langchain-postgres).
Extract metadata to column
Specifying metadata columns causes the integration to pull that field from the document metadata and store it in its own, properly typed column.
Filter on metadata
Having a metadata field stored in a column allows you to leverage PostgreSQL’s value as a relational database, filtering efficiently.
In summary, it’s very easy to get started with Cloud SQL for PostgreSQL as a vector database, and our native LangChain packages make gen AI development more flexible and powerful.
You can play around with the Cloud SQL for PostgreSQL VectorStore with this VectorStore Notebook, or you can check out the package on GitHub, including filing bugs or other feedback by opening an issue.