Cartesia

Cartesia builds the world's fastest voice AI platform for developers

Google Cloud results
  • Unlocked <90-millisecond latency for voice AI

  • Powered millions of daily conversations with 99.99% uptime

  • Ranked #1 in naturalness by human preference evaluators

Cartesia built the world's fastest, ultra-realistic voice AI on Google Cloud, delivering sub-90ms latency and 99.99% uptime to 50K+ companies—transforming how the world speaks, listens, and does business across millions of conversations every day.

Building the world's fastest, ultra-realistic
voice AI model with Google Cloud

Voice AI is more than just converting text to speech or speech to text. At Cartesia, we're focused on creating human-centered, natural interactions that feel real in tone, intent, and context.

Arjun Desai

Co-founder, Cartesia

While the global voice AI market is projected to surpass $40 billion by 2032, models built on traditional architectures have hit a performance ceiling. Cartesia is breaking through with a next-generation approach to speech synthesis and analysis. 

Cartesia researchers pioneered State Space Models (SSMs), a new architecture which achieves what traditional transformers cannot: constant memory usage and linear generation time. This architectural breakthrough powers Cartesia's ability to generate expressive, natural voices faster than the blink of an eye. By eliminating efficiency bottlenecks, SSMs enable the sub-90-millisecond performance necessary for truly fluid, real-time human interaction.

"State Space Models allow us to deliver the expressive, natural voice capabilities that enterprises demand, while still being performant on a global scale," explains Arjun Desai, co-founder of Cartesia.

For Cartesia, technical efficiency is a means to a much more human end. "Voice AI is more than just converting text to speech or speech to text," Desai says. "At Cartesia, we're focused on creating human-centered, natural interactions that feel real in tone, intent, and context."

At the core of Cartesia's mission is a belief that innovating model architectures can deliver real-time human intelligence. To transform how we interact with the world around us, AI cannot just be smart—it must be instantaneous and intuitive. This ethos drives Cartesia to innovate at the deepest levels of model design.

Cartesia's models are already powering state-of-the-art voice agent applications across industries like financial services and healthcare, serving millions of speech generation requests a day. Scaling these capabilities and ensuring ultra-fast, reliable performance for global users required a robust AI infrastructure. That's why the team partnered with Google Cloud.

Cartesia Text-to-Speech interface with Pedro voice selected

Building the Voice AI of tomorrow
with Google Cloud

Cartesia is built for developers who require code-first flexibility and global scale. To ensure these frontier models are accessible to developers anywhere in the world, Cartesia partners with Google Cloud for its global infrastructure and scalability. With tools like Vertex AI and Google Kubernetes Engine (GKE), Cartesia can deploy its solutions across multiple data centers simultaneously, bringing high-fidelity voice models closer to customers around the globe.

Today, Cartesia's industry-leading model, Sonic 3 supports 40+ languages with full control over emotional expressiveness and accent localization. Cartesia works closely with companies around the world including through on-premise deployments. Customers can also use Cartesia's models in their end-to-end Voice Agent product, which allows developers to define Voice Agents with code-first flexibility–available on Google Cloud Marketplace. These frontier advancements are defining what's possible in Voice AI. 

State Space Models allow us to deliver the expressive, natural voice capabilities enterprises demand, while still being performant on a global scale. When combined with the robust infrastructure and edge capabilities of Google Cloud, we can bring this innovation closer to customers wherever they are.

Arjun Desai

Co-founder, Cartesia

As an innovator with its State Space Models for voice AI, Cartesia found Google Cloud infrastructure to be a seamless complement to accelerate research, development, and distribution. "When combined with the robust infrastructure and edge capabilities of Google Cloud, we can bring our models closer to customers wherever they are," Desai says.

GKE helps handle orchestration and auto-scaling, ensuring voice AI models are ready to meet global demand in real time, while Cloud Storage and Persistent Disk block storage help Cartesia manage and secure production data for deploying voice AI. 

"With this flexibility, we can scale our technology, partnerships, and customer base globally while delivering consistent, real-time, and exceptionally high-quality results," Desai says.

Cartesia Voice Agent configuration for sonic-converse agent

Redefining real time interaction and
driving enterprise growth

From the moment someone speaks or listens to voice AI, every millisecond counts in keeping the conversation flowing naturally. Thanks to Google Cloud, we can deliver ultra-realistic voice experiences that deliver on customers expectations.

Arjun Desai

Co-founder, Cartesia

Combining their ultra-fast speech models with the scalability of Google Cloud has enabled Cartesia to achieve outsize outcomes and drive radical business transformation. In the healthcare sector, providers leveraging Cartesia have improved patient engagement and slashed wait times by over 89%. In customer service, enterprises are achieving 95% containment rates, automating complex workflows without losing the human touch. One customer reported that their end users are 4x more likely to stay on a call after switching to Cartesia's voices compared to their previous voice platform.

"From the moment someone speaks or listens to voice AI, every millisecond counts in keeping the conversation flowing naturally," Desai says. "Thanks to Google Cloud, we can deliver ultra-realistic voice experiences that deliver on our customers' expectations."

Cartesia recently launched Line, an end-to-end Agent platform that allows customers to seamlessly access and deploy cutting-edge voice agents. As they continue to scale rapidly, the team partners with Google Cloud resources like Gemini for sophisticated reasoning capabilities. Further, the partnership extends beyond technical infrastructure: Cartesia also works closely with Google teams, leveraging programs like Google for Startups, which provides dedicated support for growing businesses.

"Aside from leading tools and resources, Google Cloud provides us with true partnership at every stage," says Aaron Melgar, Cartesia's head of business development. "The Google Cloud team has leaned in to help bring our state-of-the-art models to mutual customers around the world."

Cartesia is building the fundamental layer for how humans and AI communicate. They are continuously innovating the quality, accuracy, and naturalness of their voices to redefine the limits of digital interaction and what is possible with voice AI experiences. "We're building the future of real-time, multimodal intelligence," says Desai. "With Google Cloud, we're one step closer to making voice AI a seamless and transformative part of every business workflow."

Cartesia founders

Cartesia is the world's fastest and most expressive voice AI platform. Built for developers and enterprises, their technology is transforming how the world speaks, listens, and does business across millions of conversations every day.

Industry: Startup, Technology

Location: United States

Products: Google Cloud, Cloud Storage, Google Kubernetes Engine (GKE), Model Garden on Vertex AI, Persistent Disk, Vertex AI

Google Cloud