Build a generative AI application on Google Cloud
Google Cloud offers a range of products and tools for building generative AI applications with enterprise-grade scaling, security, and observability.
Use this page to learn the stages of developing a generative AI application, choose the best products and tools for your use case, and access the documentation you need to get started.
Learn the fundamentals of generative AI development
When to use generative AI or traditional AI
Overview of developing a generative AI application
Choose infrastructure for your generative AI application
Learn which products, frameworks, and tools are the best match for building your generative AI application. Common components in a Cloud-hosted generative AI application include:
- Application hosting: Compute to host your application. Your application can use Google Cloud's client libraries and SDKs to talk to different Cloud products.
- Model hosting: Scalable and secure hosting for a generative model.
- Model: Generative model for text, chat, images, code, embeddings, and multimodal.
- Grounding solution: Anchor model output to verifiable, updated sources of information.
- Database: Store your application's data. You might reuse your existing database as your grounding solution, by augmenting prompts via SQL query, and/or storing your data as vector embeddings using an extension like pgvector.
- Storage: Store files such as images, videos, or static web frontends. You might also use Storage for the raw grounding data (eg. PDFs) that you later convert into embeddings and store in a vector database.
The sections below walk through each of those components, helping you choose which Google Cloud products to try.
Application hosting infrastructure
Get started with:
Model hosting infrastructure
Get started with:
Model
Get started with:
- Gemini
- Codey
- Imagen
- text-embedding
- Vertex AI Model Garden (open source models)
- HuggingFace Model Hub (open source models)
Grounding
To ensure informed and accurate model responses, you may want to ground your generative AI application with real-time data. This is called retrieval-augmented generation (RAG).
You can implement grounding with your own data in a vector database, which is an optimal format for operations like similarity search. Google Cloud offers multiple vector database solutions, for different use cases.
Note: You can also ground with traditional (non vector) databases, by simply querying an existing database like Cloud SQL or Firestore, and using the result in your model prompt.
Get started with:
- Vertex AI Agent Builder (formerly Enterprise Search, Gen AI App Builder, Discovery Engine)
- Vector Search (formerly Matching Engine)
- AlloyDB for PostgreSQL
- Cloud SQL
- BigQuery
Grounding with APIs
Vertex AI Extensions (Private Preview)
Langchain Components
Grounding in Vertex AI
Start Building
Set up your development environment
Install the Google Cloud CLI
Install the Cloud Code extension in your IDE
Set up authentication
Set up LangChain
Design prompts and evaluate models
Introduction to Prompt Design
Vertex AI Studio
Generative AI Prompt Samples
Ideation with Generative Models on Vertex AI
Model Evaluation in Vertex AI
Code samples
Web chatbot: Answer questions about the Google Store
Learn to build a web-based question-answering chatbot, using Vertex AI Agent Builder and Firebase.
Chat app with Eventarc and Vertex AI
Learn to build a simple Python Flask application that calls a pre-trained foundation model in Vertex AI.
Generate a marketing campaign with Gemini
Build a web app to generate marketing campaign ideas, using Gemini on Vertex AI, Cloud Run, and Streamlit.
Question-answering app with "The Practitioner's Guide to MLOps"
Learn how to use Vertex AI Search and LangChain to ground model prompts to a verifiable knowledge source (Google Cloud whitepaper).
Weather API Request Helper: Function calling with Gemini
Learn how to implement function-calling, the process of using an LLM to populate a request body that you can then send to an external API.
Architecture guidance and jump start solutions
Infrastructure for a RAG-capable generative AI application using Vertex AI
Use this reference architecture to design the infrastructure to run a generative AI application with retrieval-augmented generation (RAG) using Vertex AI and AlloyDB for PostgreSQL.
Infrastructure for a RAG-capable generative AI application using GKE
Use this reference architecture design the infrastructure to run a generative AI application with retrieval-augmented generation (RAG) using GKE, Cloud SQL, and open source tools like Ray, Hugging Face, and LangChain.
Design storage for AI and ML workloads in Google Cloud
This document provides design guidance on how to use and integrate the variety of storage options offered by Google Cloud for key AI and ML workloads.
Jump Start Solution: Document Summarization
Deploy a one-click sample application to summarize long documents with Vertex AI.
Jump Start Solution: Generative AI RAG with Cloud SQL
Deploy a one-click sample application that uses vector embeddings stored in Cloud SQL to improve the accuracy of responses from a chat application.
Jump Start Solution: Generative AI Knowledge Base
Deploy a one-click sample application that extracts question-and-answer pairs from a set of documents, along with a pipeline that triggers the application when a document is uploaded.