Network Performance Decoded: Much ado about headers, data and bitrates
Sumit Singh
Senior Product Manager
Rick Jones
Network Software Engineer
We are happy to drop the third installment of our Network Performance Decoded whitepaper series, where we dive into topics in network performance and benchmarking best practices that often come up as you troubleshoot, deploy, scale, or architect your cloud-based workloads. We started this series last year to provide you helpful tips to not only make the best of your network but also avoid costly mistakes that can drastically impact your application performance. Check out our last two installments — tuning TCP and UDP bulk flows performance, and network performance limiters.
In this installment, we provide an overview of three recent whitepapers — one on TCP retransmissions, another on the impact of headers and MTUs on data transfer performance, and finally, using netperf
to measure packets per second performance.
1. Make it snappy: Tuning TCP retransmission behaviour
The A Brief Look at Tuning TCP Retransmission Behaviour whitepaper is all about how to make your online applications feel snappier, by tweaking two Linux TCP settings, net.ipv4.tcp_thin_linear_timeouts
and net.ipv4.tcp_rto_min_us
(or rto_min
) Think of it as fine-tuning your application’s response times and how quickly your application recovers when there's a hiccup in the network.
For all the gory details, you’ll need to read the paper, but here’s the lowdown on what you’ll learn:
-
Faster recovery is possible: By playing with these settings, especially making
rto_min
smaller, you can drastically cut down on how long your TCP connections just sit there doing nothing after a brief network interruption. This means your apps respond faster, and users have a smoother experience. -
Newer kernels are your friend: If you're running a newer Linux kernel (like 6.11 or later), you can go even lower with
rto_min
(down to 5 milliseconds!). This is because these newer kernels have smarter ways of handling things, leading to even quicker recovery. -
Protective ReRoute takes resiliency to the next level: For those on Google Cloud, tuning
net.ipv4.tcp_rto_min_us
can actually help Google Cloud's Protective ReRoute (PRR) mechanism kick in sooner, making your applications more resilient to network issues. - Not just for occasional outages: Even for random, isolated packet loss, these tweaks can make a difference. If you have a target for how quickly your app should respond, you can use these settings to ensure TCP retransmits data well before that deadline.
2. Beyond network link-rate
Consider more than just "link-rate" when thinking about network performance! In our Headers and Data and Bitrates whitepaper, we discuss how the true speed of data transfer is shaped by:
-
Headers: Think of these as necessary packaging that reduces the actual data sent per packet.
-
Maximum Transmission Units (MTUs): These dictate maximum packet size. Larger MTUs mean more data per packet, making your data transfers more efficient.
In cloud environments, a VM's outbound data limit (egress cap) isn't always the same as the physical network's speed. While sometimes close, extra cloud-specific headers can still impact your final throughput. Optimize your MTU settings to get the most out of your cloud network. In a nutshell, it's not just about the advertised speed, but how effectively your data travels!
3. How many transactions can you handle?
In Measuring Aggregate Packets Per Second with netperf, you’ll learn how to use netperf
to figure out how many transactions (and thus packets) per second your network can handle, which is super useful for systems that aren't just pushing huge files around. Go beyond just measuring bulk transfers and learn a way to measure the packets per second rates which can gate the performance of your request/response applications.
Here's what you'll learn:
-
Beating skew error: Ever noticed weird results when running a bunch of
netperf
tests at once? That's "skew error," and this whitepaper describes using "demo mode" to fix it, giving you way more accurate overall performance numbers. -
Sizing up your test: Get practical tips on how many "load generators" (the machines sending the traffic) and how many concurrent streams you need to get reliable results. Basically, you want enough power to truly challenge your system.
-
Why UDP burst mode is your friend: It explains why using "burst-mode UDP/RR" is the secret sauce for measuring packets per second. TCP, as smart as it is, can sometimes hide the true packet rate because it tries to be too efficient.
-
Full-spectrum testing and analysis: The whitepaper walks you through different test types you can run with the
runemomniaggdemo.sh
script, giving you an effective means to measure how many network transactions per second the instance under test can achieve. This might help you infer aspects of the rest of your network that influence this benchmark. Plus, it shows you how to crunch the numbers and even get some sweet graphs to visualize your findings.
Stay tuned
With these resources our goal is to foster an open, collaborative community for network benchmarking and troubleshooting. While our examples may be drawn from Google Cloud, the underlying principles are universally applicable, no matter where your workloads operate. You can access all our whitepapers — past, present, and future — on our webpage. Be sure to check back for more!