What is pgvector?

pgvector is an extension for PostgreSQL (also called Postgres) that simplifies working with vectors—enabling you to store, search, and index them directly in your relational database.

With pgvector, adding advanced capabilities like similarity search to your applications and AI agents can be both straightforward and scalable, without having to move data around or change application architectures to connect the new vector data type.

7:51

Build AI-powered apps on Google Cloud with pgvector, LangChain & LLMs

Key takeaways

pgvector is an open source extension for PostgreSQL that helps you to store, index, and search high-dimensional vectors directly within your existing PostgreSQL database. pgvector is known for supporting:

Similarity search: Comparing semantic patterns in data, rather than using keyword matching
AI applications: AI agent operations and search applications, including recommendation engines, chatbots, natural language processing, and anomaly detection

What is a vector?

A vector represents data numerically in a way that captures its key characteristics, mapping it into a virtual mathematical space. In this space, similar items—like words, images, or objects—are positioned close together.

For example, consider the words “coat” and “jacket.” Traditional keyword-based searches would not connect these two words as similar, because their letters are quite different. An e-commerce system that wants to unite these keywords would need to do so manually. However, the vector representations of these two would be very close because they share similar meanings—delivering more accurate search results for users and saving time for developers.

Similarly, if you take two different pictures of cats, then pixel by pixel they might be vastly different. However, their vector embeddings would place them very close together in the mathematical space, just as a human would easily identify both of these as images of cats:

To make this work, an embedding model transforms raw data—such as images or text—into vector embeddings. pgvector stores these embeddings in your database. When a user submits a query, that input is also converted into a vector. pgvector then calculates the distance between the query vector and stored vectors to efficiently identify the "nearest neighbors" with the highest similarity scores.

Curious about different types of nearest neighbor searches? Check out our guide to generative AI app development.

pgvector FAQs

What is the difference between PostgreSQL and pgvector?

PostgreSQL is a robust, open source relational database management system designed to handle structured data using tables, rows, and columns.

pgvector is an extension that runs inside PostgreSQL. It adds “vector,” a new data type, to the database, allowing storage and processing of vector embeddings alongside your standard operational data.

Do I need a separate database to use pgvector?

No, pgvector is an extension that integrates directly into your existing PostgreSQL database. This allows you to add advanced AI and search capabilities without managing new or separate infrastructure.

How pgvector powers AI applications with similarity search

To support today's AI-driven features, you need the ability to store and manage vector embeddings.

PostgreSQL can be powerful on its own, but because its data is rigidly structured into tables, rows, and columns, its query capability is largely limited to keyword and pattern matching.

In the world of AI, complex data like text, images, and audio is encoded as vector representations. These encodings enable AI models to grasp the context and semantic relationships within your data, forming the backbone of features like intelligent search, recommendations, and gen AI.

The pgvector extension brings semantic search to PostgreSQL, using vector embeddings to find results based on a query's meaning—rather than just keyword matches as SQL would. This process, known as similarity search, makes it straightforward to add advanced search capabilities directly into your applications without needing to re-architect or move data to a separate vector database.

Want to learn more about vector embeddings? Check out our guide to generative AI app development.

pgvector in PostgreSQL for AI and search applications

With its ability to handle high-dimensional vectors, pgvector supports a range of advanced applications.

Similarity search

Keyword matching in traditional relational databases often fails to identify meaningful connections in data. Similarity search compares vector proximity using metrics like Euclidean distance and cosine distance to find deeper patterns, critical for applications like image recognition and semantic search, where results are ranked by meaning. In e-commerce, for example, similarity search enables product recommendations by analyzing user behavior and finding related items.

AI agents, conversational search, and real-time chat

Vector-based natural language processing allows AI agents to understand context, leading to more personalized conversations and more accurate responses. Multi-lingual support enhances their performance as virtual assistants and customer service platforms.

Classification, deduplication, and anomaly detection

pgvector enhances AI workflows by enabling the storage and querying of vector embeddings, which are essential for identifying unusual patterns in data. By analyzing vector proximity, it helps detect anomalies in real time for fraud prevention, network security, or quality control.

Customer service

Sentiment analysis analyzes the intent of a message, enabling you to appropriately route negative comments for faster action—creating tailored resolutions.

Benefits of using pgvector for similarity search

By leveraging PostgreSQL’s scalability, transaction support, and robust reliability, pgvector efficiently manages high-dimensional datasets. Additionally, its usage of familiar SQL syntax makes it accessible for existing teams, eliminating the need for additional tools or infrastructure dedicated to vector indexing and search.

Easily integrates into existing PostgreSQL-based apps.

Improves PostgreSQL’s scalability for growing datasets.

Offers customizable features like distance metrics and indexing.

Inherits PostgreSQL’s trusted security and reliability.

Allows you to seamlessly query across structured and unstructured data.

Provides a developer-friendly solution for working with large-scale, high-dimensional data.

For a single database that excels in both traditional SQL queries and modern vertex search, consider AlloyDB for PostgreSQL. AlloyDB uses the ScaNN (Scalable Nearest Neighbor) vector similarity search algorithm developed by Google, delivering significantly higher performance than other cloud-based PostgreSQL services for transactional and analytical workloads within large databases.

Learn how AlloyDB performs simultaneous search on structured and unstructured data.

How to enable and use pgvector in Google Cloud databases

Cloud SQL and AlloyDB for PostgreSQL support pgvector, allowing you to store and query vector embeddings using standard SQL commands.

1. Connect to your instance

Use your preferred PostgreSQL client (such as psql, pgAdmin, or the Google Cloud console) to connect to your Cloud SQL or AlloyDB instance.

2. Enable the pgvector extension

Run the following SQL command to enable the extension on your database. You only need to do this once per database.

Loading...

3. Create a table with a vector column

Create a new table (or alter an existing one) to include a column for vector data. You must specify the dimensions of the vector. For example, to create a table for storing 3-dimensional embeddings:

Loading...

4. Insert vector data

You can insert vector embeddings just like standard data. Vectors are formatted as arrays enclosed in brackets.

Loading...

5. Query using similarity search

You can now query your data to find the nearest neighbors. The <-> operator calculates Euclidean distance (L2 distance), which is commonly used to find the most similar items.

Loading...

6. Add an index for performance

For larger datasets, adding an index can significantly speed up search performance. The HNSW and ScaNN indexes are commonly used options. Here’s an HNSW example:

Loading...

Solve your business challenges with Google Cloud

New customers get $300 in free credits to spend on Google Cloud.

What is pgvector?

Key takeaways

What is a vector?

pgvector FAQs

What is the difference between PostgreSQL and pgvector?

Do I need a separate database to use pgvector?

How pgvector powers AI applications with similarity search

pgvector in PostgreSQL for AI and search applications

Similarity search

AI agents, conversational search, and real-time chat

Classification, deduplication, and anomaly detection

Customer service

Benefits of using pgvector for similarity search

How to enable and use pgvector in Google Cloud databases

1. Connect to your instance

2. Enable the pgvector extension

3. Create a table with a vector column

4. Insert vector data

5. Query using similarity search

6. Add an index for performance

Solve your business challenges with Google Cloud

Related products and solutions

Additional resources

Take the next step

Need help getting started?

Work with a trusted partner

Continue browsing