How HubX cut AI model load time with Hyperdisk ML by 30x

Google Cloud Results

30x faster pod startup times (from ~30 minutes to <1 minute)
20% savings in compute costs by reducing idle GPU time
Improved UX with responses delivered in seconds, not minutes
35% faster response and 45% lower cost per image
~40 GB base models reliably served in high-concurrency workloads

HubX sped up AI apps and cut GPU costs, giving users instant results and the business room to scale.

Making model latency a thing of the past

HubX develops AI-powered mobile and web apps that let users transform selfies, remix their voices into songs, or animate static images. The company operates 20 to 30 non-gaming apps, many of them personalized, where users upload their own media and expect almost-instant results.

Delivering that kind of user experience hasn't always been easy. Behind the scenes, it required a lot of infrastructure heavy lifting. Their base models were huge — up to 40 GB with complex text encoders — and every time demand spiked, so did the need for speed. Hundreds of pods might need to spin up all at once, each loading the same model into GPU memory, and pod initialization could stretch well beyond acceptable thresholds. During those windows, HubX risked losing user traffic and racking up compute costs just waiting for things to load.

They didn’t need a massive re-architecture. They needed a faster, smarter storage layer that could keep up with their speed and scale goals without adding complexity. HubX had already been working with Google Cloud since early 2022, modernizing their infrastructure to support scalable, high-throughput workloads. “We’d been steadily evolving our stack with Google Cloud for a few years,” says Sr. ML Engineer Muhammed Pektaş. “So when we hit I/O bottlenecks, Hyperdisk ML felt like a natural next step.” In 2025, HubX adopted Hyperdisk ML, which is purpose-built for AI workloads, highly concurrent, and optimized for read-heavy tasks. Plus as part of Google Cloud’s AI Hypercomputer it offered native integration with Google Kubernetes Engine (GKE) and didn’t require any application logic changes. It was time to make the switch.

We’d been steadily evolving our stack with Google Cloud for a few years. So when we hit I/O bottlenecks, Hyperdisk ML felt like a natural next step.

Muhammed Pektaş

Sr. ML Engineer, HubX

Swapping storage, saving time

From the beginning, the move to Hyperdisk ML was about minimizing disruption and maximizing gains. HubX was already running its AI stack on AI Hypercomputer with GKE, A2 VMs and Trillium TPUs. Their data architecture relied on Cloud Storage buckets for user-specific inputs and outputs, and they were serving large static base models using a standard file system approach. As demand grew and workloads became more read-intensive, Hyperdisk ML offered a better fit for scalability. They moved base models into Cloud Storage and used a simple utility to hydrate them into Hyperdisk ML volumes. These volumes were then mounted read-only across hundreds of pods.

Newer tooling like the GKE Volume Populator streamlined setup, evolving what started as a manual process into a near push-button workflow.

Hüsnü Sebik

ML Engineer, HubX

Hyperdisk ML’s ability to support thousands of concurrent readers made it uniquely suited to HubX’s use case. It kept pod startup times fast during peak traffic surges, when latency mattered most. Even better, says ML Engineer Hüsnü Sebik, “Newer tooling like the GKE Volume Populator streamlined setup, evolving what started as a manual process into a near push-button workflow.” Because Hyperdisk ML is tightly integrated with GKE, HubX was able to scale performance elastically, without needing to compromise on simplicity or speed. That combination of performance, scalability, and ease was exactly what they’d been searching for.

HubX also used Trillium TPUs to support compute-heavy tasks like text-to-image generation using FLUX. These sixth-generation TPUs offered the memory and parallelism necessary to handle large AI models efficiently, working in tandem with Hyperdisk ML to keep inference workflows responsive and scalable.

Today, HubX uses Hyperdisk ML primarily to serve large models, but also to accelerate user-specific training workloads. Because most personalization involves just 10–20 user images or a small voice clip, model inputs remain light. But getting the 40 GB base model into memory quickly? That’s where Hyperdisk ML continues to shine.

Ready when the traffic hits

The impact was immediate. Instead of taking 30 minutes to initialize 50–60 pods, HubX brought that number down to under a minute – a 30x improvement. "Users get results faster, and we’re not paying for GPUs to sit idle," says Pektaş. "Hyperdisk ML delivered wins on both speed and cost." Now, during peak traffic windows, the app scales smoothly. Users get real-time results, and the backend keeps pace without breaking a sweat.

HubX’s use of Trillium TPUs also saw measurable performance improvements. In a recent production run, they generated four images in just seven seconds, which translated to a 35% improved response latency and a 45% drop in cost per image compared to their previous system. “The results are amazing. We’re excited to bring these improvements to the millions of users around the world who use our apps,” says Deniz Tuna, Head of Development.

Hub X achieved an 18% cost reduction in their training pipeline thanks to faster model loading and an additional 2% in savings from areas such as new revision deployments.

The results are amazing. We’re excited to bring these improvements to the millions of users around the world who use our apps.

Denize Tuna

Head of Development, HubX

They’ve simplified infrastructure and gained the ability to scale quickly without touching their codebase. And there's more room to grow. “We’re still operating below peak Hyperdisk ML performance, which is reassuring,” says Sebik. “It means we can keep scaling without worry.” They’re also thinking ahead. As model sizes grow and app demand rises, HubX is exploring how to scale Hyperdisk ML with more volume attachments, greater IO throughput, and more frequent data updates to support additional training workflows.

For a company built on speed and personalization, Hyperdisk ML gave HubX the confidence and the capacity to scale both.

Users get results faster, and we’re not paying for GPUs to sit idle. Hyperdisk ML delivered wins on both speed and cost.

Muhammed Pektaş

Sr. ML Engineer, HubX

HubX develops AI-powered mobile apps with autonomous in-house studios and central departments.

Industry: Technology

Location: Turkey

Products: Hyperdisk ML, AI Hypercomputer, Google Kubernetes Engine (GKE), A2 VMs, Cloud Storage, GKE Volume Populator