Friedrich Schiller University Jena: How Deep Learning is helping to understand life at the molecular level
About Friedrich Schiller University Jena
Friedrich Schiller University Jena is a public research university in Jena, Germany. The university was established in 1558 and is one of the oldest universities in Germany. The research group for cheminformatics and computational metabolomics, headed by Professor Christoph Steinbeck, is based at the university and is dedicated to natural products research.
Tell us your challenge. We're here to help.
Contact usWith Cloud TPUs, researchers at the University of Jena built open-source tools to convert images of chemical structures into a machine readable format, helping scientists rediscover natural products.
Google Cloud results
- Accelerates training with Cloud TPUs, reducing training times from months to days
- Enables the use of extremely large training sets, improving output accuracy
- Supports research innovation with free access to clusters of Cloud TPUs through the TPU Research Cloud
Models trained with more than 400 million images
Chemical ecology helps us understand exactly how living organisms such as plants, insects, fungi and microbes use chemicals to comunicate with one another. Studying chemicals that occur in nature can also help us to fight disease, as up to half of the approved drugs from the past 30 years have been derived from natural substances.
In order to search for and categorize new naturally-occurring chemical substances, scientists need to be able to access research that has already been undertaken. In the past, this has been difficult to do. "Within the field of chemical information management, access to open databases and research has been very fragmented," says Professor Christoph Steinbeck, Professor for Analytical Chemistry, Cheminformatics and Chemometrics, Friedrich Schiller University Jena.
The mission of Prof Steinbeck and his colleagues is to develop digital tools that encourage greater collaboration, knowledge-sharing, and transparency. This can open up new ways of working which can have many potential practical applications, from the discovery of life-changing new drugs, to agriculture and biotechnology.
"The tools we have developed for imaging chemical structures are the best system of their kind that exist today," says Prof. Steinbeck. "Having better open access resources for chemistry will help everyone who benefits from natural products, including the pharmaceutical industry at one end of the chain, and patients taking the medication at the other."
"The tools we have developed for imaging chemical structures are the best system of their kind that exist today. Having better open access resources for chemistry helps everyone who benefits from natural products, including the pharmaceutical industry at one end of the chain, and patients taking the medication at the other."
—Prof. Christoph Steinbeck, Professor for Analytical Chemistry, Cheminformatics and Chemometrics, Friedrich-Schiller University Jena, GermanyInspired by DeepMind's AlphaGo project, which leverages cloud computing and deep learning to tackle the complex and ancient game of Go, Prof. Steinbeck and his colleagues in the Steinbeck research group have developed DECIMER. DECIMER is a web application that detects images of chemical structures within documents, then segments and processes those structures, making them available as computer-readable representations. Using STOUT, another open-source tool, researchers can then automatically assign the correct IUPAC (International Union of Pure and Applied Chemistry) name to existing compounds.
To develop these tools, the Steinbeck research group initially used GPUs within its existing infrastructure to train deep learning algorithms to recognize and model chemical structures. But they soon realized that they needed more computational power in order to speed up the training. By reaching out to Google Cloud and becoming part of the TPU Research Cloud program, the team was able to accelerate its research and develop its open source tools faster.
"We're working to make Cheminformatics more open and developing datasets and technologies for public use, to help science grow faster," explains Dr. Kohulan Rajan, a postdoctoral researcher at Friedrich-Schiller University Jena. "Working with Google Cloud enables us to both leverage more computational power, and build on innovations such as novel neural network models."
"We're working to make Cheminformatics more open and developing datasets and technologies for public use, to help science grow faster. Working with Google Cloud enables us to both leverage more computational power, and build on innovations such as novel neural network models."
—Dr. Kohulan Rajan, Postdoctoral Researcher, Friedrich-Schiller University Jena, GermanyScaling up to training sets of 100 million images and more
DECIMER began its life as the doctoral research project of Dr. Rajan, who is now undertaking postdoctoral research as part of the Steinbeck Group team. He began his research in 2017, and by 2019 it had become clear that the project required more computational power for the next phase of development. "At a certain point, we needed to scale up to use training set sizes of 100 million images or more," says Dr. Rajan. "Even having invested in specialized GPUs, it would have taken half a year or longer to complete the project using only GPUs."
The department was already working with resources on Google Cloud, and appreciated the intuitive nature of the tools. "We were using Google Cloud for regular research computations, as I found the console very straightforward to work with," says Prof. Steinbeck. Having evaluated the viability of running the project using other cloud providers, the team decided that Google Cloud offered the best resources and availability, as well as the greatest cost-efficiency.
Reducing training times from months to days
Between late 2019 and mid-2020, the team moved all its scripts and codes over to TPUs on Google Cloud, with support from the Google Cloud team. "At the time, TPUs were still in Beta stage, so it was great to have support from the experts," recalls Dr. Rajan. Since then, the team has switched to more and more high-performance TPUs as they have become available.
Initially, Dr. Rajan had been training using 15 million images on a single GPU. Switching to TPU types v3-32 enabled him to train the models 16 times faster, meaning a model that previously would have taken several months to train could be trained in one or two days. He has now trained a model using 400 million images in two months, which would previously have taken more than a year.
Being able to feed the models with hundreds of millions of images has led to improved accuracy, meaning the DECIMER segmentation tool is now able to recognise and segment chemical structure depictions within printed literature with more than 90% accuracy, while the DECIMER image transformer is up to 96% accurate.
"The ability to employ extremely large data training volumes revolutionized what we were able to achieve," says Prof. Achim Zielesny, Professor of Chemistry, Chemoinformatics and Bioinformatics at the Westphalian University of Applied Sciences in Recklinghausen, Germany. "Thanks to TPUs and Dr Rajan's hard work, we are perhaps five years ahead of where we expected to be with the project."
"The ability to employ extremely large data training volumes revolutionized what we were able to achieve. Thanks to TPUs and Dr Rajan's hard work, we are perhaps five years ahead of where we expected to be with the project."
—Prof. Achim Zielesny, Professor of Chemistry, Chemoinformatics and Bioinformatics, Westphalian University of Applied Sciences, Recklinghausen, GermanyDelivering tools that can help scientists find new drug candidates
The tools provided by the Steinbeck group are enabling scientists to work with chemical structures in new and innovative ways, such as identifying existing structures based on hand-drawn molecules. The information gained from these tools can also be fed into open access databases such as COCONUT, another one of the group's projects, which is one of the biggest annotated databases of natural products available free of charge. Taken altogether, the goal is to transform Chemical Information Management for the benefit of all.
"Working with Google Cloud enables us to work right at the cutting-edge, with early access to new tools and project support through free access to clusters of TPU devices," says Prof. Zielesny. "The plan now is to explore new areas of science where equally large datasets can open up new opportunities."
Tell us your challenge. We're here to help.
Contact usAbout Friedrich Schiller University Jena
Friedrich Schiller University Jena is a public research university in Jena, Germany. The university was established in 1558 and is one of the oldest universities in Germany. The research group for cheminformatics and computational metabolomics, headed by Professor Christoph Steinbeck, is based at the university and is dedicated to natural products research.