AI use cases on Cloud Run

Whether you're building agents, running inference models, or integrating with various AI services, Cloud Run provides the scalability, flexibility, and ease of use needed to bring your AI innovations to life.

This page highlights some high-level use cases for hosting, building, and deploying AI workloads on Cloud Run.

Why use Cloud Run for AI workloads?

Cloud Run offers several advantages for ensuring your AI applications are scalable, flexible, and managable. Some highlights include:

  • Flexible container support: Package your app and its dependencies in a container, or use any supported language, library, or framework. Learn more about Cloud Run's Container runtime contract.
  • HTTP endpoint: After deploying a Cloud Run service, receive an out of the box, secure Cloud Run URL endpoint. Cloud Run provides streaming through supporting HTTP chunked transfer encoding, HTTP/2, and WebSockets.
  • GPU Support: Accelerate your AI models by configuring Cloud Run resources with GPUs. Cloud Run services with GPUs enabled can scale down to zero for cost savings when not in use.
  • Automatic or manual scaling: By default, Cloud Run automatically scales your service based on demand, even to zero. This ensures you only pay for what you use, making it ideal for unpredictable AI workloads. You can also set your service to manual scaling based on your traffic and CPU utilization needs.
  • Integrated ecosystem: Seamlessly connect to other Google Cloud services, such as Vertex AI, BigQuery, Cloud SQL, Memorystore, Pub/Sub, AlloyDB for PostgreSQL, Cloud CDN, Secret Manager, and custom domains to build comprehensive end-to-end AI pipelines. Google Cloud Observability also provides built-in monitoring and logging tools to understand application performance and troubleshoot issues effectively.
  • Enterprise ready: Cloud Run offers direct VPC connectivity, granular security, and networking controls.

Key AI use cases

Here are some ways you can use Cloud Run to power your AI applications:

Host AI agents and bots

Cloud Run is an ideal platform for hosting the backend logic for AI agents, chatbots, and virtual assistants. These agents can orchestrate calls to AI models like Gemini on Vertex AI, manage state, and integrate with various tools and APIs.

  • Microservices for agents: Deploy individual agent capabilities as separate Cloud Run services. See Host AI agents to learn more.
  • Agent2Agent (A2A) communication: Build collaborative agent systems using the A2A protocol. See Host A2A agents to learn more.
  • Model Context Protocol (MCP) servers: Implement MCP servers to provide standardized context to LLMs from your tools and data sources. See Host MCP servers to learn more.

Serve AI/ML models for inference

Deploy your trained machine learning models as scalable HTTP endpoints.

  • Real-time inference: Serve predictions from models built with frameworks like TensorFlow, PyTorch, scikit-learn, or using open models like Gemma. See Run Gemma 3 on Cloud Run for an example.
  • GPU acceleration: Use NVIDIA GPUs to accelerate inference for more demanding models. See Configure GPU for services to learn more.
  • Integrate with Vertex AI: Serve models trained or deployed on Vertex AI, using Cloud Run as a scalable frontend.
  • Decouple large model files from your container: The Cloud Storage FUSE adapter lets you mount a Cloud Storage bucket, and makes it accessible as a local directory inside your Cloud Run container.

Build Retrieval-Augmented Generation (RAG) systems

Build RAG applications by connecting Cloud Run services to your data sources.

  • Vector databases: Connect to vector databases hosted on Cloud SQL (with pgvector), AlloyDB for PostgreSQL, Memorystore for Redis, or other specialized vector stores to retrieve relevant context for your LLMs. See an infrastructure example of using Cloud Run for hosting a a RAG-capable generative AI application and data processing using Vertex AI and Vector Search.
  • Data access: Fetch data from Cloud Storage, BigQuery, Firestore, or other APIs to enrich prompts.

Host AI-powered APIs and backends

Create APIs and microservices that embed AI capabilities.

  • Smart APIs: Develop APIs that use LLMs for natural language understanding, sentiment analysis, translation, summarization, and so forth.
  • Automated workflows: Build services that trigger AI-driven actions based on events or requests.

Prototype and experiment on ideas

Rapidly iterate on AI ideas.

  • Rapid deployment: Quickly move prototypes from environments like Vertex AI Studio, Google AI studio, or Jupyter notebooks to scalable deployments on Cloud Run with minimal configuration.
  • Traffic splitting: Use Cloud Run's traffic splitting feature to A/B test different models, prompts, or configurations, and Google Cloud Observability for monitoring metrics (latency, error rate, cost) to measure the success of A/B testing.

What's next

Depending on your familiarity with AI concepts and your AI use case, explore the Cloud Run AI resources.