Google Cloud

Cloud SQL Second Generation performance and feature deep dive

August 16, 2016

Brett Hesterberg

Product Manager, Google Cloud Platform

Five years ago, we launched the First Generation of Google Cloud SQL and have helped thousands of companies build applications on top of it.

In that time, Google Cloud Platform’s innovations on Persistent Disk dramatically increased IOPS for Google Compute Engine, so we built Second Generation on Persistent Disk, allowing us to offer a far more performant MySQL solution at a fraction of the cost. Cloud SQL Second Generation now runs 7X faster and has 20X more storage capacity than its predecessor — with lower costs, higher scalability, automated backups that can restore your database from any point in time and 99.95% availability, anywhere in the world. This way you can focus on your application, not your IT solution.

Cloud SQL Second Generation performance gains are dramatic: up to 10TB of data, 20,000 IOPS, and 104GB of RAM per instance.

Cloud SQL Second Generation vs. the competition

So we know Cloud SQL Second Generation is a major advance from First Generation. But how does it compare with database services from Amazon Web Services?

Test: We used sysbench to simulate the same workload on three different services: Cloud SQL Second Generation, Amazon RDS for MySQL and Amazon Aurora.
Result: Cloud SQL Second Generation outperformed RDS for MySQL and performed better than Aurora when active thread count is low, as is typical for many web applications.

https://storage.googleapis.com/gweb-cloudblog-publish/images/cloud-sql-2nd-53hu0.max-700x700.PNG

Cloud SQL sustains higher TPS (transactions per second) per thread than RDS for MySQL. It outperforms Aurora in configurations of up to 16 threads.

Details
The workload compares multi-zone (highly available) instances of Cloud SQL Second Generation, Amazon RDS for MySQL and Amazon Aurora running the latest offered MySQL version. The replication technology used by these three services differs significantly, and has a big impact on performance and latency. Cloud SQL Second Generation uses MySQL’s semi-synchronous replication, RDS for MySQL uses block-level synchronous replication and Aurora uses a proprietary replication technology.

To determine throughput, a Sysbench OLTP workload was generated from a MySQL client in the same zone as the primary database instance. The workload is a set of step load tests that double the number of threads (connections) with each run. The data set used is five times larger than total memory of the database instance to ensure that reads go to disk.

Transaction per second (TPS) results show that Cloud SQL and Aurora are faster than RDS for MySQL. Cloud SQL’s TPS is higher than Aurora at up to 16 threads. At 32 threads, variance and the potential for replication lag increase, causing Aurora’s peak TPS to exceed Cloud SQL’s at higher thread counts. The workload illustrates the differences in replication technology between the three services. Aurora exhibits minimal performance variance and consistent replication lag. Cloud SQL emphasizes performance, allowing for replication lag, which can increase failover times, but without putting data at risk.

https://storage.googleapis.com/gweb-cloudblog-publish/images/cloud-sql-2nd-2oiml.max-700x700.PNG

Latency
We measured average end-to-end latency with a single client thread (i.e., “pure” latency measurement).

https://storage.googleapis.com/gweb-cloudblog-publish/images/cloud-sql-2nd-1dv2u.max-600x600.PNG

The latency comparison changes as additional threads are added. Cloud SQL exhibits lower latency than RDS for MySQL across all tests. Compared to Aurora, Cloud SQL’s latency is lower until 32 or more threads are used to generate load.

https://storage.googleapis.com/gweb-cloudblog-publish/images/cloud-sql-2nd-3ivro.max-700x700.PNG

Running the benchmark

Environment configuration and sysbench parameters for our testing.

We used the following environment configuration and sysbench parameters for our testing.

Test instances:

Google Cloud SQL v2, db-n1-highmem-16 (16 CPU, 104 GB RAM), MySQL 5.7.11, 1000 GB PD SSD + Failover Replica
Amazon RDS Multi-AZ, db.r3.4xlarge (16 CPU, 122 GB RAM), MySQL 5.7.11, 1000 GB SSD, 10k Provisioned IOPS + Multi-AZ Replica
Amazon RDS Aurora, db.r3.4xlarge (16 CPU, 122 GB RAM), MySQL 5.6 (newest) + Replica

Test overview:
Sysbench runs were 100 tables of 20M rows each, for a total of 2B rows. In order to ensure that the data set didn't fit in memory, it was set to a multiple of the ~100 GB memory per instance, allowing sufficient space for binary logs used for replication. With 100x20M rows, the data set size as loaded was ~500 GB. Each step run was 30 minutes with a one minute "cool down" period in between, producing one report line per second of the runtime.

Load the data:

Ubuntu setup
sudo apt-get update
sudo apt-get install \
  git automake autoconf libtool make gcc \
  libmysqlclient-dev mysql-client-5.6
git clone https://github.com/akopytov/sysbench.git
./autogen.sh 
./configure
make -j8
Test variables
export test_system=<test name>
export mysql_host=<mysql host>
export mysql_user=<mysql user>
export mysql_password=<mysql password>
export test_path=~/oltp_${test_system}_1
export test_name=01_baseline
Prepare test data
sysbench/sysbench \
  --mysql-host=${mysql_host} \
  --mysql-user=${mysql_user} \
  --mysql-password=${mysql_password} \
  --mysql-db="sbtest" \
  --test=sysbench/tests/db/parallel_prepare.lua \
  --oltp_tables_count=100 \
  --oltp-table-size=20000000 \
  --rand-init=on \
  --num-threads=16 \
  run
Run the benchmark:
mkdir -p ${test_path}
for threads in 1 2 4 8 16 32 64 128 256 512 1024
do 
sysbench/sysbench \
  --mysql-host=${mysql_host} \
  --mysql-user=${mysql_user} \
  --mysql-password=${mysql_password} \
  --mysql-db="sbtest" \
  --db-ps-mode=disable \
  --rand-init=on \
  --test=sysbench/tests/db/oltp.lua \
  --oltp-read-only=off \
  --oltp_tables_count=100 \
  --oltp-table-size=20000000 \
  --oltp-dist-type=uniform \
  --percentile=99 \
  --report-interval=1 \
  --max-requests=0 \
  --max-time=1800 \
  --num-threads=${threads} \
  run
Format the results:
Capture results in CSV format
grep "^\[" ${test_path}/${test_name}_*.out \
  | cut -d] -f2 \
  | sed -e 's/[a-z ]*://g' -e 's/ms//' -e 's/(99%)//' -e 's/[ ]//g' \
  > ${test_path}/${test_name}_all.csv
Plot the results in R
status <- NULL # or e.g. "[DRAFT]"
config <- "Amazon RDS (MySQL Multi-AZ, Aurora) vs. Google Cloud SQL Second Generation\nsysbench 0.5, 100 x 20M rows (2B rows total), 30 minutes per step"
steps <- c(1, 2, 4, 8, 16, 32, 64, 128, 256, 512)
time_per_step <- 1800
output_path <- "~/oltp_results/"
test_name <- "01_baseline"
results <- data.frame(
  stringsAsFactors = FALSE,
  row.names = c(
    "amazon_rds_multi_az",
    "amazon_rds_aurora",
    "google_cloud_sql"
  ),
  file = c(
    "~/amazon_rds_multi_az_1/01_baseline_all.csv",
    "~/amazon_rds_aurora_1/01_baseline_all.csv",
    "~/google_cloud_sql_1/01_baseline_all.csv"
  ),
  name = c(
    "Amazon RDS MySQL Multi-AZ",
    "Amazon RDS Aurora",
    "Google Cloud SQL 2nd Gen."
  ),
  color = c(
    "darkgreen",
    "red",
    "blue"
  )
)
results$data <- lapply(results$file, read.csv, header=FALSE, sep=",", col.names=c("threads", "tps", "reads", "writes", "latency", "errors", "reconnects"))
# TPS
pdf(paste(output_path, test_name, "_tps.pdf", sep=""), width=12, height=8)
plot(0, 0,
  pch=".", col="white", xaxt="n", ylim=c(0,2000), xlim=c(0,length(steps)),
  main=paste(status, "Transaction Rate by Concurrent Sysbench Threads", status, "\n\n"),
  xlab="Concurrent Sysbench Threads",
  ylab="Transaction Rate (tps)"
)
for(result in rownames(results)) {
  tps <- as.data.frame(results[result,]$data)$tps
  points(1:length(tps) / time_per_step, tps, pch=".", col=results[result,]$color, xaxt="n", new=FALSE)
}
title(main=paste("\n\n", config, sep=""), font.main=3, cex.main=0.7)
axis(1, 0:(length(steps)-1), steps)
legend("topleft", results$name, bg="white", col=results$color, pch=15, horiz=FALSE)
dev.off()
# Latency
pdf(paste(output_path, test_name, "_latency.pdf", sep=""), width=12, height=8)
plot(0, 0,
  pch=".", col="white", xaxt="n", ylim=c(0,2000), xlim=c(0,length(steps)),
  main=paste(status, "Latency by Concurrent Sysbench Threads", status, "\n\n"),
  xlab="Concurrent Sysbench Threads",
  ylab="Latency (ms)"
)
for(result in rownames(results)) {
  latency <- as.data.frame(results[result,]$data)$latency
  points(1:length(latency) / time_per_step, latency, pch=".", col=results[result,]$color, xaxt="n", new=FALSE)
}
title(main=paste("\n\n", config, sep=""), font.main=3, cex.main=0.7)
axis(1, 0:(length(steps)-1), steps)
legend("topleft", results$name, bg="white", col=results$color, pch=15, horiz=FALSE)
dev.off()
# TPS per Thread
pdf(paste(output_path, test_name, "_tps_per_thread.pdf", sep=""), width=12, height=8)
plot(0, 0,
  pch=".", col="white", xaxt="n", ylim=c(0,60), xlim=c(0,length(steps)),
  main=paste(status, "Transaction Rate per Thread by Concurrent Sysbench Threads", status, "\n\n"),
  xlab="Concurrent Sysbench Threads",
  ylab="Transactions per thread (tps/thread)"
)
for(result in rownames(results)) {
  tps <- as.data.frame(results[result,]$data)$tps
  threads <- as.data.frame(results[result,]$data)$threads
  points(1:length(tps) / time_per_step, tps / threads, pch=".", col=results[result,]$color, xaxt="n", new=FALSE)
}
title(main=paste("\n\n", config, sep=""), font.main=3, cex.main=0.7)
axis(1, 0:(length(steps)-1), steps)
legend("topleft", results$name, bg="white", col=results$color, pch=15, horiz=FALSE)
dev.off()

Cloud SQL Second Generation features

But performance is only half the story. We believe a fully managed service should be as convenient as it is powerful. So we added new features to help you easily store, protect and manage your data.

Store and protect data

Flexible backups: Schedule automatic daily backups or run them on-demand. Backups are designed not to affect performance.
Precise recovery: Recover your instance to a specific point in time using point-in-time recovery.
Easy clones: Clone your instance so you can test changes on a copy before introducing them to your production environment. Clones are exact copies of your databases, but they're completely independent from the source. Cloud SQL offers a streamlined cloning workflow.
Automatic storage increase: Enable automatic storage increase and Cloud SQL will add storage capacity whenever you approach your limit.

Connect and Manage

Open standards: We embrace the MySQL wire protocol, the standard connection protocol for MySQL databases, so you can access your database from nearly any application, running anywhere.
Secure connections: Our new Cloud SQL Proxy creates a local socket and uses OAuth to help establish a secure connection with your application or MySQL tool. This makes secure connections easier for both dynamic and static IP addresses. For dynamic IP addresses, such as a developer’s laptop, you can help secure connectivity using service accounts, rather than modifying your firewall settings. For static IP addresses, you no longer have to set up SSL.

https://storage.googleapis.com/gweb-cloudblog-publish/images/cloud-sql-2nd-4rfz9.max-700x700.PNG

We’re obviously very proud of Cloud SQL, but don’t just take our word for it. Here’s what a couple of customers have had to say about Cloud SQL Second Generation:

As a SaaS Company, we manage hundreds of instances for our customers. Cloud SQL is a major component of our stack and when we beta tested Cloud SQL, we were able to see fantastic performance over our large volume customers. We immediately migrated a few of our major customers as we saw 7x performance improvements of their queries.

– Rajesh Manickadas, Director of Engineering, Orangescape

Tweet this quote

As a mobile application company, data management is essential to delivering the best product for our clients. Google Cloud SQL enables us to manage databases that grow at rates such as 120 - 150 million data points every month. In fact, for one of our clients, a $6B Telecommunications Provider, their database adds ~15 GB of data every month. At peak time, we hit around 400 write operations/second and yet our API calls average return time is still under 73ms.

– Andrea Michaud, Head of Client Services, www.TeamTracking.us

Tweet this quote

Next steps

What’s next for Cloud SQL? You can look forward to continued Persistent Disk performance improvements, added virtual networking enhancements and streamlined migration tools to help First Generation users upgrade to Second Generation.

Until then, we urge you to sign up for a $300 credit to try Cloud SQL and the rest of GCP. Start with inexpensive micro instances for testing and development. When you’re ready, you can easily scale them up to serve performance-intensive applications.

You can also take advantage of our partner ecosystem to help you get started. To streamline data transfer, reach out to Talend, Attunity, Dbvisit and xPlenty. For help with visualizing analytics data, try Tableau, Looker, YellowFin and Bime by Zendesk. If you need to manage and monitor databases, ScaleArc and WebYog good bets, while Pythian and Percona are at the ready if you simply need extra support.