During the Google IO keynote, Urs Hölzle, Senior Vice President of Infrastructure at Google, introduced Google Compute Engine, an infrastructure-as-a-service product that lets businesses and researchers tap into the scalability, power, and efficiency of Google’s data centers using virtual machines.
The demo he showed is a joint effort between Google and the Institute for Systems Biology (ISB). For the first demo, we ported ISB’s genomics research application to run on Google Compute Engine’s Linux virtual machines using a total of 10,000 cores (1250 8-core VMs). This port required little effort because Google Compute Engine offers an environment that is similar to ISB’s own cluster.
The demo during the keynote described a visualization tool used as a circular representation of a human genome. It allows researchers to visually explore associations between factors such as gene expression, patient attributes, and mutations - a tool that will ultimately help find better ways to cure cancer. The primary computation that Google Compute Engine cluster performs is the RF-ACE code, a sophisticated machine learning algorithm which learns associations between genomic features, using an input matrix provided by ISB.
When running on the 10,000 cores on Google Compute Engine, a single set of association can be computed in seconds rather than ten minutes, the time it takes when running on ISB’s own cluster of nearly 1000 cores. The entire computation can be completed in an hour, as opposed to 15 hours.
Urs then went even further and scaled the application to run on 600,000 cores across Google’s global data centers. This uses Compute Engine’s Exacycle technology, for which Google donated 1 billion compute hours last year to promote innovations that can be enabled with this kind of compute power. To use 600,000 cores, the RF-ACE code was modified so that the computation could be split to 600,000 parallel units - around 30,000 targets with 20 permutations each. Exacycle is best for this kind of Embarrassingly Parallel computing problems which require a very high number of independent work units, a high CPU to I/O ratio, and no inter-process communication.
Computing at this scale has the power to fundamentally change our ability to solve some of the most challenging scientific and business problems. This is what Cloud can bring you, and we love to help you deliver your dream. Let us know if you would like to give Google Compute Engine a spin for tens of cores or tens of thousands of cores, or if you have an interesting challenge that can use hundreds of thousands of cores. Also, check out cloud.google.com to learn more about all the other cool products we offer.