Compute

2X price-performance and 10X throughput: Isima’s ecommerce experiment with Z3 VMs

July 9, 2024

Aisha Wang

Product Manager, Z3

Storage-dense workloads like horizontal, scale-out databases and log analytics need a high density of SSD and consistent performance. And to preserve data in the event of an outage, they also need predictable maintenance. At Google Cloud Next ‘24, we announced the general availability of the Z3 virtual machine series, our first storage-optimized VM family. Z3 features an industry-leading 6M 100% random-read and 6M write IOPs, and offers incredibly dense storage configurations of up to 409 SSD (GiB):vCPU on next-generation local SSD hardware.

Isima, a Silicon Valley startup and ecommerce analytics cloud, was among the first to try it. Their platform bi(OS) provides serverless infrastructure for real-time retail and ecommerce data and AI applications. It features a scale-out SQL-friendly database and zero-code capabilities to onboard, process, and operate data for real-time data integration, feature stores, data science, cataloging, observability, DataOps, and business intelligence.

In this blog, we summarize Isima’s tests and findings when comparing Z3 to general-purpose N2 VMs. Spoiler alert: We're talking 2X better price-performance, 10X more throughput, and a whole lot more.

The test

As a Google Cloud partner, Isima got early access to Z3, and tested it on a series of demanding, real-world ecommerce workloads, including microservice calls, ad hoc analytics, visualization queries and more, all firing simultaneously:

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_hcarbEs.max-2000x2000.png

Fig.1 - bi(OS) on Z3 ecommerce use cases

To push Z3 to its limits and mirror real-world, high-availability deployments, Isima deployed bi(OS) on three z3-highmem-88 instances, each split into five Docker containers across multiple zones. Each Docker container was allocated 16 vCPUs, 128GB RAM, and two 3TB SSDs. This setup allowed Isima to better compare Z3 with previous tests they performed using n2-highmem-16 instances.

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_txZCQTN.max-2000x2000.png

Fig.2 - bi(OS) on Z3 Test Setup

To emulate excessive load and a variety of worst case scenarios, Isima tested:

Demand spikes: They hit the system with a brief peak load that saturated system resources to ~70%, then maintained a relentless 75% of that peak for a full 72 hours — all while demanding (and achieving) 99.999% reliability.
Select queries: They tested select queries to avoid unintentional caching effects by the OS or bi(OS). That is, when Isima read data (via select queries), they intentionally went back in time (e.g., querying data inserted 30 minutes ago) to ensure they were reading data from the Local SSD and not from RAM — something crucial for Z3. This way, they were confident that performance testing results could survive real-world workloads.
Multiple deployment scenarios: Both single-tenant and multi-tenant configurations were tested, validating Z3's ability to handle diverse real-world deployments.
Simulated maintenance events: Isima even factored in planned maintenance using Docker restarts, showing Z3's ability to handle disruptions without breaking a sweat.

The verdict

Throughput: bi(OS) on Z3 handled ~2X+ more throughput than the tests performed last year using n2-highmem-16, with 2X better price-performance.

DB Request	n2-highmem-16 (~ops/sec)	One 16-vCPU docker on z3-highmem-88 (~ops/sec)	Improvement	Use-cases
inserts (single row/query)	3342	6010	1.8X	Onboarding of click-stream data
selects (single row/query)	300	613	2X	Feature store reads for personalization, ATP, etc.
upserts (single row/query)	612	1500	2.45X	Updating ML Scores
selects (multiple rows/query)	1600	4110	2.56X	Bulk reads (as part of ETL)

NVMe drive latencies: Write latencies were ~6X better, while read latencies were unchanged.

https://storage.googleapis.com/gweb-cloudblog-publish/images/3_1.max-1500x1500.png

Drive variance: Over the course of 72 hours, each drive on each z3-highmem-88 VM reported +/- 0.02ms variance in both read and write latencies.

https://storage.googleapis.com/gweb-cloudblog-publish/images/4_VfSvFvQ.max-1600x1600.png

We are thrilled to see these results for the new Z3 instances, and are confident that it will unlock the power of many more workloads. This isn't just about benchmarks, it's about real-world workloads. You can learn more about Z3 in the our detailed documentation. And if you’re looking for an agile, performant ecommerce platform, bi(OS) is available on the Google Marketplace with a free 30-day trial.

Posted in

Cost Management

Simpler billing, clearer savings: A FinOps guide to updated spend-based CUDs

By Alfonso Hernandez • 5-minute read

Serverless

High-performance inference meets serverless compute with NVIDIA RTX PRO 6000 on Cloud Run

By James Ma • 3-minute read

Compute

Unlock 2x better price-performance with Axion-based N4A VMs, now generally available

By Nate Baum • 6-minute read

Compute

Scaling WideEP Mixture-of-Experts inference with Google Cloud A4X (GB200) and NVIDIA Dynamo

By Sean Horgan • 9-minute read

2X price-performance and 10X throughput: Isima’s ecommerce experiment with Z3 VMs

Aisha Wang

The test

The verdict

Related articles

Simpler billing, clearer savings: A FinOps guide to updated spend-based CUDs

High-performance inference meets serverless compute with NVIDIA RTX PRO 6000 on Cloud Run

Unlock 2x better price-performance with Axion-based N4A VMs, now generally available

Scaling WideEP Mixture-of-Experts inference with Google Cloud A4X (GB200) and NVIDIA Dynamo