Lift and shift: Lessons for video media applications
Industry Solutions Architect for Media and Entertainment, Google Cloud
Solution Architect, Automation, Vizrt Americas
When moving software applications from on-premise hardware to the cloud, it often "just works," but it's never guaranteed. This is especially the case for applications that are hardware intensive. This blog post examines what happened when a media company took software for real-time video broadcasts into the cloud. We'll share how we, Google Cloud, collaborated with media software provider Vizrt to meet the demanding requirements of an eSports broadcaster. Together, we delivered a solution that not only met but exceeded the expected performance from a cloud-based deployment.
Video broadcasting in the cloud
Video broadcasts are a very hardware-intensive workflow. By needing to process and store data streams in near-real-time, broadcasts stress GPU, memory, disk, and CPU. In addition, the performance requirements quickly increase as producers add additional video streams into the mix, such as in this case, an eSports broadcast.
Vizrt’s customer wanted to increase their broadcast production by doubling the amount of camera feeds from 8 to 16, to have a more compelling and elaborate production.
At the heart of the eSports broadcasters’ production was Viz Vectar Plus, Vizrt’s software-based 4K switcher. While the client wanted to move more of its production into the cloud, Viz Vectar Plus was designed initially for on-premise hardware. So, when they tried a straightforward "lift and shift '' to the cloud, it surprised no one that the software didn't run as well. They turned to Google Cloud and Vizrt to make it run the way they needed it to.
Troubleshooting lift and shift
Initially, we suspected that the issue could be in the design of the cloud deployment, i.e., the configuration of the VM hosting the software. So the focus of our troubleshooting was to find a cloud configuration that 1) made sure the software worked to spec and 2) did so optimally considering costs, robustness, and performance. Furthermore, we wanted to ensure that all components met performance specifications, particularly throughput, IOPS, and network bandwidth, as this was a video media application. Only after we validated the cloud deployment would we ask Vizrt to investigate the code itself. We would:
Set up a test environment.
Benchmark the environment.
Test various configurations.
Validate that the vendor software was optimally using the configuration.
This high-level methodology is straightforward. However, we approached the details in a particular order, considering we were optimizing for broadcast video. We honed in on the optimal cloud configuration by testing the following elements, prioritized in order of expected impact:
VM type: The VM type largely dictates the available memory and CPU configurations. However, because this was a VM workflow, we had to pick N1's. Today, they are the only VM type that can be attached to GPUs, which are practically a requirement for broadcast video.
Disk type: Video broadcasts require high I/O speeds to handle high-quality video streams. We went from an HDD to a much-faster SSD.
CPU size: We increased the VM CPU cores from 16 in increments up to 32. Increasing CPU size indeed increased performance, but did not return the level of performance we needed.
SSD size: We increased the SSD size (and the accompanying higher IOPS and throughput that comes with increasing the size) enabling more simultaneous recordings. Again, this only partially worked.
Disk count. We noticed read/write problems when the application was reading/writing with a single drive. There are two typical ways to approach this: 1) Separate read/writes tasks among two discs and 2) striping the data streams across discs. Implementing these had improved but marginal improvements in performances.
After our testing, we arrived at the following optimal configuration:
VM: n1-standard-32 instance w/ 500Gb boot drive
GPU: 1 T4 GPU
SSD: 1 persistent disk with 1TB*
* We would later determine that two SSDs for separate read and write operations would be more optimal
This configuration was able to produce between 6 - 12 streams. Compared to the on-premise target of 8 streams, this was about as good but was not the customer's target of 16. So we would need Vizrt to take the ball from here to optimize the software itself.
Optimizing media applications for cloud
We provided Vizrt our recommended configuration, performance notes, and the following best practices that are generally applicable to cloud-based video workloads:
Separate read and write operations to two different disks to enable higher performance for both operations.
A second 1 TB SSD persistent disk can be attached to the VM instance to increase performance.
With this information, Vizrt engineers worked their magic, providing daily patches to test; with each daily iteration the overall solution was found quickly. Not only were they able to meet the broadcaster’s request of 16 feeds, but they were also able to go even further to 44. Over a 5x improvement by optimizing for the cloud!
Teamwork in troubleshooting
Because of the specialized nature of media and entertainment, workflow situations across multiple companies are common as specialized applications hand their work from one to another.
“By working in partnership with Google Cloud we managed to build a system that can scale in ways that probably none of us thought would be possible. This allowed Viz Vectar Plus to run fully in the cloud using NDI and opened up amazing possibilities for making shows,” Dr. Andrew Cross, President R&D, Vizrt Group. “We ended up with great feedback from the customer, who were appreciative of how Google Cloud and Vizrt collaborated on a solution.”
The results speak for themselves: A satisfied customer with over a 5x improvement in results. That's what we call a good game.