OTTO

A catalog that listens: OTTO built a voice shopping AI across 19 million products

Results on Google Cloud
  • Cut voice response latency from ~9 seconds to under 2 seconds

  • First EMEA retailer with native voice shopping live in production

  • Voice assistant available across ~19 million products

  • Scaled expert product advisory from 5 to 50 categories—and growing

  • Custom orchestration layer keeps conversations fluent and controlled

OTTO became Europe's first retailer with a native voice shopping assistant, turning expert advice into a real conversation.

Why search wasn't the final answer

OTTO is Germany's second largest online retailer, offering—19 million products from 6,100 marketplace partners. Its history runs through every major wave of retail change, from the thousand-page printed catalog to one of Europe's earliest ecommerce platforms. Adapting has long been part of OTTO's DNA. As generative AI matured into a reliable foundation, the question for OTTO's AI team was no longer whether to build conversational shopping, but how to make it actually feel like one.

Search has always been a powerful tool, but it answers in lists. A customer typing "washing machine" gets thousands of results and starts filtering. A customer who wants advice—"What's a quiet machine for a single-person household with a dog?"—needs something search alone cannot deliver: a back-and-forth dialogue, follow-up questions, and a recommendation that explains itself. That gap is where the project began.

In November 2024, the team set out to build a true advisory experience and decided early that voice would be the goal from day one. A text chatbot would form the foundation, but the more natural interface for genuine consultation was conversation. Google had been a development partner since earlier 2023 experiments. With Gemini's emerging capabilities and direct support from Google's forward-deployed engineering team, OTTO chose Gemini Enterprise as the Agent Platform on which to build the new assistant.

AI is fundamentally changing digital shopping. With our new AI assistant, developed together with Google, we are creating, for the first time, a natural, dialogue-driven shopping experience that feels like a real conversation—just like in a specialty store.

Dr. Boris Ewenstein

CEO, OTTO

The technical bet was significant. When the team started, Gemini's native audio model was not yet production-ready. OTTO chose to start anyway, betting that the model would mature in step with the build.

A highly organized, automated warehouse storage system

Building a voice that knows the aisles

The first instinct was the obvious one: take the working text assistant and bolt on a text-to-speech layer. The team built it. In remote user tests, customers used it briefly—and never said "wow." Voice that simply reads text aloud feels mechanical, with too much latency between turns. Customers respond by clipping their own speech: "Trousers", not "I need a new pair of trousers, and by the way…". The conversational flow that makes advisory shopping work simply doesn't form.

We had a functional cascading approach that worked. But it never created that ‘wow!’ moment in user tests. We wanted customers to be genuinely surprised by the experience. That only started happening once we built on Gemini Live.

Volker Carlguth

Senior Product Manager AI Focus, OTTO

The breakthrough came in November 2025, when Gemini Live had matured enough for production. OTTO rebuilt the voice path on the Live API with native audio. Instead of cascading text into speech, two models now run in parallel: the text assistant continues to drive guardrails, recommendations, and search, while the live audio model handles the conversation in real time, coordinating turn by turn. Chirp 3 handles German speech recognition, a capability that improved measurably as Google iterated during the build. End-to-end latency dropped from 8–9 seconds in early prototypes to under two.

Keeping that fluency under control required something new. OTTO's engineers in Hamburg, with hands-on support from Google's product and engineering teams, built an internal orchestration layer based on Petri Nets—a formal model from theoretical computer science used to reason about parallel processes. It lets the team prove that critical guardrails (for example, never offering health advice) hold even when the conversation is mid-flow. A two-phase hybrid search—lexical and vector retrieval, followed by a Gemini 2.5 Flash validation pass over the top results—ensures that subjective requests like "a very quiet washing machine" return precise matches the customer can trust.

Early signals from a listening catalog

The voice assistant entered open beta in early May 2026. Production data is only days old at the time of writing, but early customer behavior already validates the central design bet. In voice mode, customers tell the assistant more than they would ever type. In one test session, a customer describes a broken lamp in passing—"a wire is sticking out"—and the model surfaces the right replacement product. Voice draws out context, text never captures.

The advisory pattern itself scales. OTTO began with five product base classes hand-crafted alongside category experts, then used Gemini to extend the same structure to fifty more, with the next wave covering the full catalog now in progress. Soon, the system will integrate user-generated product reviews into the advisory step, giving the assistant access to the language real customers use about real products.

In parallel, OTTO operates a second AI assistant for customer service—handling orders, deliveries, and returns—built in-house under stricter data-privacy requirements. Running the two tracks separately brought speed; the long-term direction is unification. An internal OneBot initiative is already underway to bring shopping and service into a single conversational experience, eventually agent-to-agent.

We started building before the model was ready, betting that Google would deliver—and Google delivered. The pace of improvement was the most impressive part of the partnership.

Volker Carlguth

Senior Product Manager AI Focus, OTTO

The further horizon is more ambitious. OTTO is exploring agentic capabilities for routine purchases (replacement orders, recurring needs) and multimodal extensions such as virtually placing furniture in a customer's own room—never on autopilot, always with the customer in control. For a company that has navigated every wave of retail change since the printed catalog, conversation is simply the next interface.

OTTO office exterior view

Germany's largest online retailer. From 1949 catalogs to AI-powered shopping, OTTO has reinvented retail at every turn.

Industry: Retail

Location: Germany

Products: Gemini Enterprise Agent Platform, Gemini Live API (Native Audio), Gemini 2.5 Flash, Gemini 3 Pro, Chirp 3

Google Cloud