What is pgvector?

pgvector is an extension for PostgreSQL (also called Postgres) that simplifies working with vectors—enabling you to store, search, and index them directly in your relational database.

With pgvector, adding advanced capabilities like similarity search to your applications and AI agents can be both straightforward and scalable, without having to move data around or change application architectures to connect the new vector data type.

Build AI-powered apps on Google Cloud with pgvector, LangChain & LLMs

Key takeaways

pgvector is an open source extension for PostgreSQL that helps you to store, index, and search high-dimensional vectors directly within your existing PostgreSQL database. pgvector is known for supporting:

  • Similarity search: Comparing semantic patterns in data, rather than using keyword matching
  • AI applications: AI agent operations and search applications, including recommendation engines, chatbots, natural language processing, and anomaly detection

What is a vector?

A vector represents data numerically in a way that captures its key characteristics, mapping it into a virtual mathematical space. In this space, similar items—like words, images, or objects—are positioned close together.

For example, consider the words “coat” and “jacket.” Traditional keyword-based searches would not connect these two words as similar, because their letters are quite different. An e-commerce system that wants to unite these keywords would need to do so manually. However, the vector representations of these two would be very close because they share similar meanings—delivering more accurate search results for users and saving time for developers.

Similarly, if you take two different pictures of cats, then pixel by pixel they might be vastly different. However, their vector embeddings would place them very close together in the mathematical space, just as a human would easily identify both of these as images of cats:

Cat generated image from Gemini

To make this work, an embedding model transforms raw data—such as images or text—into vector embeddings. pgvector stores these embeddings in your database. When a user submits a query, that input is also converted into a vector. pgvector then calculates the distance between the query vector and stored vectors to efficiently identify the "nearest neighbors" with the highest similarity scores.

Curious about different types of nearest neighbor searches? Check out our guide to generative AI app development.

pgvector FAQs

PostgreSQL is a robust, open source relational database management system designed to handle structured data using tables, rows, and columns.

pgvector is an extension that runs inside PostgreSQL. It adds “vector,” a new data type, to the database, allowing storage and processing of vector embeddings alongside your standard operational data.

No, pgvector is an extension that integrates directly into your existing PostgreSQL database. This allows you to add advanced AI and search capabilities without managing new or separate infrastructure.

pgvector in PostgreSQL for AI and search applications

With its ability to handle high-dimensional vectors, pgvector supports a range of advanced applications.

Keyword matching in traditional relational databases often fails to identify meaningful connections in data. Similarity search compares vector proximity using metrics like Euclidean distance and cosine distance to find deeper patterns, critical for applications like image recognition and semantic search, where results are ranked by meaning. In e-commerce, for example, similarity search enables product recommendations by analyzing user behavior and finding related items.

Vector-based natural language processing allows AI agents to understand context, leading to more personalized conversations and more accurate responses. Multi-lingual support enhances their performance as virtual assistants and customer service platforms.

pgvector enhances AI workflows by enabling the storage and querying of vector embeddings, which are essential for identifying unusual patterns in data. By analyzing vector proximity, it helps detect anomalies in real time for fraud prevention, network security, or quality control.

Sentiment analysis analyzes the intent of a message, enabling you to appropriately route negative comments for faster action—creating tailored resolutions.

For a single database that excels in both traditional SQL queries and modern vertex search, consider AlloyDB for PostgreSQL. AlloyDB uses the ScaNN (Scalable Nearest Neighbor) vector similarity search algorithm developed by Google, delivering significantly higher performance than other cloud-based PostgreSQL services for transactional and analytical workloads within large databases.

Learn how AlloyDB performs simultaneous search on structured and unstructured data.

How to enable and use pgvector in Google Cloud databases

Cloud SQL and AlloyDB for PostgreSQL support pgvector, allowing you to store and query vector embeddings using standard SQL commands.

1. Connect to your instance

Use your preferred PostgreSQL client (such as psql, pgAdmin, or the Google Cloud console) to connect to your Cloud SQL or AlloyDB instance.

2. Enable the pgvector extension

Run the following SQL command to enable the extension on your database. You only need to do this once per database.

  • SQL
Loading...

3. Create a table with a vector column

Create a new table (or alter an existing one) to include a column for vector data. You must specify the dimensions of the vector. For example, to create a table for storing 3-dimensional embeddings:

  • SQL
Loading...

4. Insert vector data

You can insert vector embeddings just like standard data. Vectors are formatted as arrays enclosed in brackets.

  • SQL
Loading...

5. Query using similarity search

You can now query your data to find the nearest neighbors. The <-> operator calculates Euclidean distance (L2 distance), which is commonly used to find the most similar items.

  • SQL
Loading...

6. Add an index for performance

For larger datasets, adding an index can significantly speed up search performance. The HNSW and ScaNN indexes are commonly used options. Here’s an HNSW example:

  • SQL
Loading...

Solve your business challenges with Google Cloud

New customers get $300 in free credits to spend on Google Cloud.

Take the next step

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Google Cloud