Google Cloud enables the National Cancer Institute's Institute for Systems Biology-Cancer Gateway in the Cloud to support breast cancer research with fast and secure data sharing
Joe Miles
JManaging Director, Global HCLS Industry Solutions, Google Cloud
Research organizations today recognize the challenge of sifting through siloed data sets, and analyzing and sharing this data with the global research community—all while staying secure and compliant within a range of national and international standards.
It is precisely these constraints that led the U.S. National Cancer Institute (NCI) to create Cloud Resources, which are components of the NCI Cancer Research Data Commons that allow scientists to analyze cancer datasets in a cloud environment (vs. having to download data and use custom hardware). Included in these resources is the Institute for Systems Biology-Cancer Gateway in the Cloud (ISB-CGC).
ISB-CGC relies on Google Cloud to securely host terabytes of genomic and proteomic data, and provide flexible and scalable analytics tools that can be integrated into research models. Complex computations that traditionally required days to complete are now executed in just minutes or hours. And ISB-CGC can now deliver open data, compute, and analytics resources to the global research community.
Enabling faster time-to-discovery
Speed and scale can make all the difference when it comes to potentially life-saving research. Take breast cancer for example. It’s the world’s most prevalent cancer and according to the World Health Organization, more than two million women were diagnosed with it in 2020 alone. With such a large number of impacted women, each with unique biological features and personal paths through the disease, breast cancer research is particularly data intensive. And processing this on-premises is too slow, expensive, and burdensome to patients.
By working with Google, NCI’s ISB has not only made data more useful to cancer researchers around the world, but also has fundamentally changed how cancer investigators conduct research. BigQuery, Google Cloud’s highly scalable mulitcloud data warehouse, underpins the cloud-based platform that connects researchers to a wide collection of cancer datasets, as well as the analytical and computational infrastructure to analyze that data quickly.
“We are spreading the message of the cost-effectiveness of the cloud,'' said Dr. Kawther Abdilleh, lead bioinformatics scientist at General Dynamics Information Technology, a partner of ISB. “With Google Cloud’s BigQuery, we’ve successfully demonstrated that researchers can inexpensively analyze large amounts of data, and do so faster than ever before.”
Integrating diverse tools and datasets
Traditionally, researchers have downloaded source data and performed analysis locally on their personal machines using programming languages like R and Python. As the volume and complexity of cancer data has grown, this approach has become unsustainable. Through the use of Google Cloud services, like Notebooks and BigQuery application programming interfaces (APIs), researchers can now use their desired methods to analyze data on the ISB-CGC platform, directly in the cloud—without the need to download data.
For example, in their September 2020 paper on data integration and analysis in the cloud, Dr. Abdilleh and Dr. Boris Aguilar, senior research scientists at ISB, demonstrated how cloud-based data analysis can be used to identify novel biological associations between clinical and molecular features of breast cancer.
“Google’s AI platform, for example, allows us to easily create notebooks to use R or Python in combination with BigQuery or machine learning to perform large-scale statistical analysis of genomic data, all in the cloud,” Aguilar wrote. “This type of analysis is particularly effective when the data is large and heterogenous, which is the case for cancer-related data.”
Drs. Abdilleh and Aguilar developed a set of BigQuery user-defined functions (UDFs) to perform statistical tests and gain a more holistic picture of breast cancer. Performing these statistical functions directly on the massive data stored in BigQuery vs. in an on-premises computer program later in the analysis workflow saved a significant amount of time. In fact, by using UDFs with BigQuery, analysis that typically required supercomputers and days of computation was complete in minutes.
Drs. Abdilleh and Aguilar have now made their UDFs available for use by the broader research community via BigQuery, opening doors for fellow breast cancer researchers to build on this progress and make strides in their life-saving work.
Global access to critical cancer data
With so many lives and families impacted by cancer—and researchers worldwide diligently seeking answers—the need to accelerate and improve the means by which cancer research is conducted is critical. ISB-CGC’s success using Google Cloud as the foundation of its infrastructure and data cloud strategy has opened the door for the cancer research community to gain real-time, secure access to data that plays a significant role in the early detection of cancer. Read the case study for more detail on how Google Cloud is supporting breast cancer research.