Jump to Content
Startups

A dozen startups using generative media to revolutionize how anyone can create content

August 13, 2025
https://storage.googleapis.com/gweb-cloudblog-publish/images/generative-media-models-dozen-startups-hel.max-2500x2500_LlUqyqD.jpg
Jess Jinkins

Startup Center of Excellence

Alexandra Williams

Director, U.S. Startups, Google Cloud

For organizations trying to figure out how to use gen-media models for creative production, these startups are innovating on every step of the process.

Try Gemini 2.5

Our most intelligent model is now available on Vertex AI

Try now

For content creation, the gap between a great idea and a final, polished product is often filled with laborious tasks: scripting, storyboarding, creating custom visuals, editing, developing promotional copy, and more. This work can be creatively rewarding but also time-consuming and difficult (just ask anyone who’s ever had writer’s block.)

Savvy creators always find a way to achieve their vision — and now they’re making fewer sacrifices thanks to generative AI. More and more artists, designers, writers, production teams, and really anyone is turning to generative media models to help enhance their creative process and speed of execution.

Many organizations are eager to start down this path, too, though they may not know the best place to get started or have concerns about impacting the creative process. We wanted to showcase more than a dozen trailblazing startups who are using Google Cloud's latest generative AI media models — specifically our most advanced video generation model Veo 3 (now generally available), our highest-quality image generation model Imagen 4, and our latest foundation models Gemini 2.5 Pro and Flash — to accelerate and enhance the content creation process across all modalities.

These new AI-powered tools from leading startups are closing the gap between creative vision and finished content, enabling creators to produce high-quality work faster than ever before and focus on what they do best: creating.

Let’s dive in and see if their fresh approach doesn’t spark something in our own work.

Next-generation video creation

HeyGen is an AI-powered video generation platform that makes creating, translating, and personalizing high-quality videos as easy as typing in a doc. HeyGen's core product leverages Gemini 2.5 Pro, Flash, and Flash-lite to streamline content creation. With one prompt, HeyGen automates video planning, intelligently analyzes user-generated footage, and optimizes content through advanced visual and audio processing.

Video Thumbnail

Nim.video is an AI-first platform for instant short-form video generation from a single prompt. As a multimodal platform, Nim integrated and benchmarked the best generative models — including Veo3 and Veo3 Fast — for text-to-video synthesis. Nim runs these on Vertex AI, which allows them to scale experiments and orchestrate additional services like speech recognition and TTS. Nim users especially love Veo3 for its native audio generation, which enables richer storytelling and minimizes post-production. These capabilities help the Nim team prototype faster and create compelling demos for new customer acquisition.

Hedra is an end-to-end marketing creation platform designed to generate high-quality content at scale. Hedra Studio combines their proprietary omnimodal models with other leading models like Veo and Imagen, enabling users to produce polished marketing content for any use case. And Hedra’s Live Avatars use Gemini by default to deliver dynamic, real-time interactive video experiences.

Video Thumbnail

Potrero Labs has launched Jams, an AI-first video social network empowering authentic self-expression. Their platform simplifies video creation, allowing users to record short videos and let Jams enhance them. Jams offers a simple UI with a variety of models under the hood including Gemini 2.5 Pro for script creation, multi-modal Gemini for video analysis, and Veo 3 for backgrounds, b-rolls, and audio.

Visla is an all-in-one AI-powered video platform where businesses and content creators make professional videos in minutes. Powered by Google’s Imagen 4, Gemini Flash Image 2.5 (aka Nano Banana), and Veo 3 with Visla’s AI Video Agent and Avatars, the platform adapts visuals and narration. The AI Video Agent automates creation by analyzing user content and generating polished videos for education, training, and marketing.

AI for audio and music creation

Producer.ai (formerly Riffusion) trains generative music models and builds products that empower anyone to create the music they imagine. "The Producer" music collaboration agent helps users create original, studio-quality songs from text, audio, or visual prompts. Gemini on Vertex AI assists with prompt augmentation and data pipelines, while Vertex AI APIs offer access to advanced multimodal models for experimentation.

Koolio.ai helps creators effortlessly produce high-quality podcasts and audio content. Koolio.ai leverages. Koolio.ai integrates and leverages cutting-edge models like Gemini, Lyria, and Veo to power features such as AI-generated dialogue, accurate transcription, intelligent sound effects and music selection, audio enhancement, and more, streamlining the entire audio creation workflow from concept to final production.

AI storytelling platforms & studios

Cartwheel is a generative animation platform that helps users tell stories faster and more creatively. Their tool helps animate characters for videos, 3D, games, films, ads or social media. Content is highly customizable including character creation prompt writing leveraging Gemini Flash, reference image creation for 3D character development using Imagen, and input control for video-to-animation to make the output editable by artists with Veo 3.

Video Thumbnail

ComfyUI is a modular, open-source engine for visual AI. It solves implementation challenges by allowing creators to rapidly prototype and automate media generation with pre-set models and more than 20,000 extensions. It integrates Gemini 2.5 and Veo 3 for multimodal creation, giving users pixel-level control and cloud-scale capabilities. ComfyUI has thousands of custom extensions that are hosted on an open API and registry that run on GCP.

Synthesia, the leading AI video enterprise platform, is where businesses large and small create instructional videos for employee training, customer support, sales enablement or product marketing. The company is using Veo 3 to contextually adapt visuals to the content delivered by its hyper-realistic AI avatars and voices.

Alson AI believes sometimes the most powerful stories go untold. AlsonAI lets anyone turn memory, culture, and imagination into illustrated books and animations. No publishers. No barriers. What once took months and thousands of $ now takes 20 minutes and costs less than dinner. Powered by Veo, we’re making storytelling fast, fair, and finally accessible to everyone to inspire the world.

Video Thumbnail

Velin.ai is a platform that creates content for small businesses, enhancing it with an AI agent that explains the content and its underlying strategic decisions and acts as a unified content workspace. Gemini 2.5 drafts everything from scripts to social campaigns, while Imagen 4 and Veo 3 generate aligned visuals and video clips, ensuring a consistent brand narrative across all content.

The content creation economy is being transformed by this new ecosystem of AI-powered tools. By handling the heavy lifting of content production, platforms built with Google's Gemini 2.5, Veo 3, and Imagen 4 are empowering creators to focus on innovation and storytelling.

Ready to see how these models can accelerate your own creative application or workflow? Learn about Veo 3 prompting best practices and test the models in Vertex AI for your use case today.

The team would like to acknowledge Googlers Akanksha Bhusari, Samantha Rodriguez, Lara Norman, Ben Seipel, Simon Brief, and Tomas Moreno for their contributions to this work.

Posted in