Pathology digitization and the fight against cancer
Dr. Karen DeSalvo
Chief Health Officer, Google
Everyone who has experienced a loved one’s cancer diagnosis knows how important it is to help patients get the best treatment, faster. Cancer is a leading cause of death, killing nearly 10 million people globally each year. But if caught early, and treated appropriately, many people can recover. We’re proud to have worked in this area for many years, applying our technology and tools to fight cancer, in partnership with leading public and private sector organizations.
In 2018, we published a peer-reviewed paper in the Archives of Pathology & Laboratory Medicine about how we applied deep learning to improve breast cancer diagnostic accuracy, based on gigapixel-sized pathology slides of lymph nodes from de-identified patients. In 2019, we published a paper in Nature Medicine about using advances in 3D volumetric modeling and artificial intelligence (AI) to better predict lung cancer. And, in 2021, we announced a new clinical research study exploring whether AI models can reduce the time to diagnosis for breast cancer patients, narrowing the assessment gap and improving the patient experience.
This is hard, important work, and due to the complexity and diversity of cancer, building accurate AI models requires large, diverse, and high-quality datasets, such as images of de-identified pathology slides. Digitized pathology slides can play a critical role in building machine learning (ML) and AI models, making it possible for researchers to test and compare the power of different diagnostic approaches. More importantly, they allow researchers to build models that reflect diverse patient populations and include multiple types of cancer.
Google’s work on government pathology datasets
We’re honored to have partnered with Naval Medical Center San Diego (NMCSD), Marine Corps Base Camp Pendleton, and U.S. Navy Medicine Readiness and Training Command Guam (U.S. NMRTC Guam) to manually clean, catalog, and scan pathology slides, effectively digitizing them to make them more helpful in supporting cancer research. All of this was done through a CRADA, or Cooperative Research and Development Agreement, a common framework for research collaboration between government agencies and companies.
Google paid for digitizing these slides and making them available for research, and the CRADA for NMCSD (and other Department of Defense facilities) provided the government joint ownership of any inventions made collaboratively. Once the data was digitized, NMCSD had the option of paying cloud storage fees to Google (or any other cloud provider). In other words, the government and the research community at large were (and still are) the primary beneficiaries of this critical effort.
It’s also important to note that all of our work with NMCSD—and any project involving data in our cloud—is subject to strict controls to maintain patient privacy and security. We work with de-identified patient data and use strong encryption. Our customers own their data, and we cannot, and do not, use it for any purpose other than explicitly agreed upon by them. (These principles are outlined in our Privacy Resource Center.)
Google’s involvement with the Joint Pathology Center biorepository
Established in 1862, the Armed Forces Institute of Pathology (AFIP) has amassed the world’s largest collection of human pathologic specimens. Samples from the AFIP were instrumental in solving many public health challenges, like sequencing the genome of the 1918 influenza virus. In 2005, AFIP’s biorepository was transferred to the newly created Joint Pathology Center (JPC). During the transition, the DoD asked the U.S. Institute of Medicine (now called the National Academy of Medicine) to provide advice on operating the biorepository. The resulting study recommended digitizing the archives immediately to ensure that the repository did not continue to degrade and could be made accessible to researchers for years to come.
In March 2020, the Defense Innovation Board, a group of industry experts that serves the Secretary of Defense, wrote a report outlining the urgency of creating a fully digitized repository of slides for the JPC and making recommendations on how to achieve this goal. Two other organizations—the DoD Joint AI Center and the Defense Innovation Unit—then released a solicitation seeking industry help. They canvassed the country, using an open buying process, to identify and contract with vendors who could build algorithms and digitize slides. Google was ultimately selected to develop algorithms to detect four types of cancer and digitize at scale, using the proven techniques we had developed previously with NMCSD and NMRTC Guam.
The JPC, however, pursued its own process. This started with the organization putting forth an open call to vendors who could assist the digitization. But then, in August 2020, JPC went forward with a non-competitive, non-publicly-sourced contract. We’re obviously disappointed we weren’t given the opportunity to compete for this important project, but we continue to this day to offer our help and collaboration for this and other digitization initiatives.
Helping billions of people live healthier lives
We remain optimistic that if the repository can be digitized and leveraged to its full potential—and used by researchers and clinicians—it would advance diagnosis and treatment for thousands of illnesses (including cancer), saving American lives, such as those of our service members. And, if digitized quickly and effectively, the repository would be a highly effective use of taxpayer dollars, providing enormous downstream benefits for the U.S. healthcare system. We stand ready to assist.
Google supports a number of projects designed to help people live healthier lives. We are proud of the work we have done in partnership with many public and private organizations, to democratize access to early diagnostics and treatments for service members, their families, and communities around the world.