American Cancer Society: Analyzing breast cancer images faster and better with machine learning

About American Cancer Society

The American Cancer Society is on a mission to free the world from cancer by funding and conducting research, sharing expert information, supporting patients, and spreading the word about prevention.

Industries: Life Sciences
Location: United States

About Slalom

Based in Seattle, Slalom is a Google Cloud Premier Partner and a Specialization Partner in Data Analytics. Slalom is one of a select few Google Cloud Partners in North America that holds over 100 Google Cloud certifications.

To identify novel patterns in digital pathology images, the American Cancer Society partnered with Slalom and used Cloud ML Engine on Google Cloud Platform to improve timeliness and accuracy.

Google Cloud Results

  • Identifies patterns in digital images of breast cancer tissues to potentially improve patient outcomes
  • Enhances quality and accuracy of image analysis by removing human limitations, fatigue, and bias
  • Protects valuable tissue samples by backing up image data to the cloud
  • Provides a reliable and scalable platform for future image analysis

12x faster image analysis with machine learning

Cancer is the second most common cause of death in the United States, accounting for nearly one in four deaths. Among women, breast cancer is the most commonly diagnosed type of cancer and the second leading cause of cancer death in the United States. If detected early, breast cancer is one of the most survivable cancers: the five- and ten-year relative survival rates for women with invasive breast cancer are 90 percent and 83 percent, respectively. However, some molecular subtypes of breast cancer have a poor prognosis and there is limited understanding of these subtypes.

Since 1992, the American Cancer Society has conducted the Cancer Prevention Study-II (CPS-II) Nutrition cohort, a prospective study of more than 188,000 American men and women. CPS-II provides valuable information for researchers to explore how factors such as height, weight, demographic characteristics, personal and family history, use of medicines and vitamins, occupational exposures, dietary habits, alcohol and tobacco use, and reproductive history can affect cancer etiology and prognosis.

The CPS-II Nutrition cohort provides a rich resource for researchers studying cancer. Mia M. Gaudet, PhD, is Scientific Director of Epidemiology Research at the American Cancer Society, and her research is focused on breast cancer. For approximately 1,700 CPS-II participants diagnosed with breast cancer, she was able to obtain medical records and surgical tissue samples, giving her valuable data to help answer pressing questions: What lifestyle, medical, and genetic factors are related to molecular subtypes of breast cancer? Do different features in the breast cancer tissue translate to a better survival rate?

Mining high-resolution image data

Initially, Dr. Gaudet faced technical challenges in analyzing the high-resolution tissue images because of their uncompressed, proprietary format. She was also concerned that even if the images could be converted into a usable format, a team of highly trained pathologists would be required to spot novel patterns in the data. Even if such a team were available and affordable, it would take years to analyze all the images, and the results would inevitably be subject to human fatigue and bias. Some patterns might not even be detectable by humans, potentially decreasing the value of the study.

"Slalom worked with us to determine that Google Cloud Platform would be the best place to store and analyze our digital pathology images. They were patient, gracious, and brilliant throughout the process."

Mia M. Gaudet, PhD, Scientific Director of Epidemiology Research, American Cancer Society

To solve these problems, Dr. Gaudet partnered with Slalom, a Google Cloud Premier Partner with experience applying machine learning (ML) to digital images. Slalom recommended converting the images to TIF format, and then running ML models on Google Cloud Platform (GCP) to facilitate an unsupervised type of deep learning that allows algorithms to determine the accuracy of their predictions and make adjustments without an engineer stepping in. Recognizing that the study has the potential to advance medical science, Google signed on to help fund the project.

"Slalom worked with us to determine that Google Cloud Platform would be the best place to store and analyze our digital pathology images," says Dr. Gaudet. "They were patient, gracious, and brilliant throughout the process."

Adds Michelle Yi, Solution Principal at Slalom: "Google Cloud brings a lot of advantages for image analysis with its strong AI and ML capabilities, including accuracy, scale, expertise, ease of use, and data security. Also, Google is committed to open source technologies, so the American Cancer Society can always use the latest tools and technologies."

Building an end-to-end ML pipeline

The quality of preprocessing standardization was critical. All 1,700 images needed to be translated consistently, with colors normalized. The interpretation of colors across images was standardized through the reduction of color variance. Every image was also broken into evenly sized tiles to both distribute the workload and optimize the data structure required to train the models. To this end, Slalom built an end-to-end machine learning pipeline on GCP, including preprocessing, feature engineering, and clustering. Slalom decided to use Cloud Machine Learning Engine (Cloud ML Engine) for model training and batch prediction, storing images on Cloud Storage, and using Compute Engine to orchestrate image conversion and initiate training and prediction jobs using Cloud ML Engine in the correct sequence.

"By leveraging Cloud ML Engine to analyze cancer images, we're gaining more understanding of the complexity of breast tumor tissue and how known risk factors lead to certain patterns. Our results might provide clinicians with more information to enable additional research that could translate to different treatment options."

Mia M. Gaudet, PhD, Scientific Director of Epidemiology Research, American Cancer Society

After initial conversion and preprocessing, Slalom created an auto-encoder model, using Keras with a TensorFlow backend for prototyping. It then used distributed training on Cloud ML Engine to convert the images into feature vectors that represent patterns in the images as a sequence of numbers. The features were then clustered with TensorFlow, once again using ML Engine. The result is a set of cluster assignments, one for each tile in the image, that American Cancer Society plans on using in follow-up analyses.

Using Cloud ML Engine allowed Slalom to achieve fast time to market for a task that might not have even been possible with local infrastructure. Image conversion alone would have been exceedingly difficult and time consuming for 1,700 images with file sizes up to 10GB, not to mention deep learning at scale.

"Cloud ML Engine allows us to achieve supercomputing scale to solve customers' problems without wasting time on operational tasks like infrastructure provisioning and neural network tuning, reducing model training time to just a few hours," says Jake Evans, Consultant at Slalom. "It really set us up for success."

Faster, more consistent image analysis

The Slalom team was able to complete the entire project on GCP in just three months, first applying deep learning on a sample set and then scaling and distributing models across the full set of images. In addition to faster analysis, the American Cancer Society benefits from a higher-level of consistency and objectivity that only machines can provide.

"If we hadn't taken a machine learning approach, it would have taken us three years instead of three months to analyze 1,700 tissue samples, even with a team of dedicated pathologists," says Dr. Gaudet. "And because people get tired and bring their own bias to any analysis, we're also achieving better consistency and quality."

"The ability to perform image analysis via deep learning for epidemiologic breast cancer studies opens a new frontier of research. Applying digital image analysis to human pathology may reveal new insights into the biology of breast cancer, and Google Cloud makes it easier. We're excited about what we'll find."

Mia M. Gaudet, PhD, Scientific Director of Epidemiology Research, American Cancer Society

Exploring new treatment options

The analysis found what Dr. Gaudet was hoping for: potentially impactful patterns in the cancer tissue images, even at lower magnifications. By analyzing these patterns, she hopes to relate the data to breast cancer survival and risk factors associated with the tissue patterns.

"By leveraging Cloud ML Engine to analyze cancer images, we're gaining more understanding of the complexity of breast tumor tissue and how known risk factors lead to certain patterns," she says. "Our results might provide clinicians with more information to enable additional research that could translate to different treatment options."

A platform for future research

The American Cancer Society is now equipped with processes and a cloud infrastructure that will be reusable on similar projects, providing a foundation for future work. It is now in the data collection phase for CPS-3, a study that will build on the knowledge gained in CPS-II to help researchers make greater strides against breast cancer. Google Cloud will provide a reliable and scalable platform for future image analysis, with the additional benefit of protecting valuable tissue samples and data in the cloud.

"The ability to perform image analysis via deep learning for epidemiologic breast cancer studies opens a new frontier of research," says Dr. Gaudet. "Applying digital image analysis to human pathology may reveal new insights into the biology of breast cancer, and Google Cloud makes it easier. We're excited about what we'll find."

About American Cancer Society

The American Cancer Society is on a mission to free the world from cancer by funding and conducting research, sharing expert information, supporting patients, and spreading the word about prevention.

Industries: Life Sciences
Location: United States

About Slalom

Based in Seattle, Slalom is a Google Cloud Premier Partner and a Specialization Partner in Data Analytics. Slalom is one of a select few Google Cloud Partners in North America that holds over 100 Google Cloud certifications.

Google Cloud Platform logo

12 Months FREE TRIAL

Try Kubernetes Engine, BigQuery, and other Cloud Platform products with $300 in free credit and 12 months.

TRY IT FREE
Google Cloud Platform logo

12 Months FREE TRIAL

Try Kubernetes Engine, BigQuery, and other Cloud Platform products with $300 in free credit and 12 months.

TRY IT FREE