Build automation tools or run a full desktop operating system (OS) in your Cloud Run container to allow AI agents to browse and extract information from the web, and automate actions through mouse clicks and keyboard inputs.
Build browser tools on Cloud Run
To build a browser tool on Cloud Run, use one of the following approaches:
- A headless browser for efficient and large-scale tasks
- A full desktop OS for complex scenarios that require human-computer interaction
To let your AI agent navigate the web, install Chromium in your Cloud Run container, and grant the necessary permissions for the agent to access Chromium. Cloud Run provides built-in streaming support for streaming browser data back to the agent or the end user.
Headless Chrome
Automate common browser tasks programmatically with headless Chrome. You can use headless Chrome for the following use cases:
- Large-scale web scraping and data extraction
- Form submissions
- UI testing
- Create PDFs or screenshots of web pages
Implement headless Chrome using the following libraries:
High-level API libraries like Puppeteer or Playwright: use these libraries to control a browser, provide instructions to the browser to visit a website, extract content, and pass it to an AI model for summarization or structured data extraction.
Chrome DevTool protocol: provides a stable API used by Chrome DevTools. This API exposes all browser features programmatically. The agent controls actions like mouse clicks and retrieves the results as text or pixel data in the form of a screenshot.
Desktop OS with virtual network computing (VNC) streaming
Implement a full desktop OS in your Cloud Run container for complex processes, such as the following:
- Automate file uploads or downloads
- Interact with browser extensions or other desktop applications
- Test complex user journeys that involve drag-and-drop and other intricate mouse movements
This approach lets you run a full desktop OS on Cloud Run and stream the results back through Websockets.
When you install the standard Chromium browser on this desktop, the agent interacts with the OS like a human would and then retrieves the pixel configuration of the desktop.