Innoplexus: relying on GCP to capture and analyse life sciences data

About Innoplexus

Founded in 2011 and headquartered in Germany with offices in India, Innoplexus AG provides artificial intelligence (AI) systems that produce intelligence and insights for businesses such as life science companies.

Industries: Technology
Location: India

Leveraging Google Cloud Platform, Innoplexus has built a platform that enables faster decision making for the pre-clinical, clinical, regulatory and commercial stages of drug development.

Google Cloud Results

  • Reduced costs by 80% relative to a hosted physical infrastructure and 20% relative to another cloud service.
  • Increased capacity to crawl public life sciences content from 1,000 pages per second to 20,000 pages per second.
  • Accelerated training of information extraction models by a factor of 20.
  • Supported eightfold increase in scalability.

Founded in 2011 and headquartered in Germany, with offices in India, Innoplexus AG offers Data as a Service (DaaS) and Continuous Analytics as a Service (CAaS) products. These products leverage Artificial Intelligence (AI) and advanced analytics to significantly reduce drug development time, from synthesis to approval.

Reduced costs by 80% and increased content capture and processing capability by a factor of 20

Innoplexus provides a platform called iPlexus that gives global life sciences and pharmaceutical organisations access to relevant data, real-time intelligence and intuitive insights. These insights span the pre-clinical, clinical, regulatory and commercial stages of drug development.

iPlexus automates the collection, curation, aggregation and analysis, of billions of data points from thousands of data sources, using machine learning, network analysis, ontologies, computer vision and entity normalisation.

“When we started looking at market opportunities, we realised that many businesses were not aware of the volume of data available in the public domain, or were spending heavily on manually developed and manually curated products,” says Gaurav Tripathi, Co-founder and Chief Technology Officer, Innoplexus. “They needed access to an automated system that could manage data volumes at scale and apply AI to capture relevant information.”

Over two years, Tripathi and his fellow Co-founder (and Chief Executive Officer) of Innoplexus, Dr. Gunjan Bhardwaj, experimented with product development and consulted with C-level executives across a range of industries. By the end of the second year, the pair’s activities had sparked interest from leaders in the pharmaceutical industry and they decided to focus on the life sciences sector.

“Myself and my team are very pleased with Google’s reliability. We have also created an architecture that scaled easily and enabled us to increase the number of pages per second we cover from 1,000 to 20,000.”

Gaurav Tripathi, Co-founder and Chief Technology Officer, Innoplexus

“That’s when we started developing the concept that is now iPlexus and looking at the structured and unstructured data that was available,” Tripathi says. “Rather than build on existing work, we had to invest time and effort in inventing automated ways of curating life sciences data from sources such as publications and publication abstracts, PDFs and web pages.” This process involved connecting all the concepts and categories in life sciences content, and develop systems and ontologies to automate those connections.

“One important point about data from life sciences is that in many cases it’s very dense,” Tripathi says. “For example, a single sentence in an extract or publication may be backgrounded by 10 papers and 20 years of research. Furthermore, that sentence does not stand alone within the publication or extract itself and has different meanings to people working in different areas in life sciences.”

Innoplexus started working with a traditional hosting provider, but as data volumes exploded from the 10TB used during initial experiments to 200TB or more, the business had to look for a more scalable platform. “Another key requirement was the platform was ‘always on’ and flexible enough that we could vary the resources we were using based on the type of task we were undertaking,” Tripathi says. Furthermore the platform had to help Innoplexus reduce the cost and time of ‘crawling’ public life science information sources.

After experimenting with public cloud services, Innoplexus opted to deploy iPlexus onto Google Cloud Platform (GCP). “The types of services Google offers with GCP—such as the Google BigQuery analytics data warehouse—are not available elsewhere,” Tripathi says. Google BigQuery’s ease of use enabled Tripathi’s team to start working with the product in just one day. The team also advised Tripathi that they had found the interfaces to GCP products to be simple and intuitive. Innoplexus moved to GCP in 2016 and is now running 90% of its workflow and data load on the Google platform. iPlexus now crawls, aggregates, analyses and visualises data from a range of formats and structures from multiple regions and providers. All ‘crawled’ data is passed through an information extraction pipeline that converts semi-structured and unstructured sources to a structured format. This task is performed by leveraging natural language, computer vision, machine learning and deep learning tools. All the information extraction models are implemented using TensorFlow and the Keras open source neural network library. These models are then trained using Google Cloud Machine Learning Engine before being deployed. Google Cloud Dataflow is used for batch processing and Google Kubernetes Engine API to run iPlexus applications in containers.

“Google has been pushing hard into deep learning and making powerful tools and technologies available on Google Cloud Platform. We really appreciate the stability and scalability of the Google Cloud Platform. As a fast-growing startup, we can scale our platform up and down in minutes without any worries.”

Dr. Sven Niedner, Chief Operations Officer, Innoplexus

“Google helped us enormously during the migration process by reviewing and making suggestions that would optimise our architecture,” Tripathi says. “For example, we wanted to minimise the costs related to data movement and results. The Google team explained how Google Cloud Bigtable could help us do that.”

A 20-fold increase in pages crawled per second

GCP has met all of Innoplexus’ key business and technical requirements for iPlexus. “Myself and my team are very pleased with Google’s reliability,” Tripathi says. “We have also created an architecture that scaled easily and enabled us to increase the number of pages crawled per second from 1,000 to 20,000.” Furthermore, Google Cloud Machine Learning Engine has reduced the time needed to ‘train’ models to extract the right information by a factor of 20 compared to similar services.

Costs reduced by 80%

Using GCP has enabled Innoplexus to reduce its costs by 80%, when compared to hosted physical infrastructures, and by 20% compared to other cloud providers. “Google BigQuery alone enables us to process a terabyte of data in seconds at a cost up to a hundred orders of magnitude lower than performing the same operation on regular instances in the cloud,” Tripathi says.

The iPlexus platform running on GCP now generates continuous intelligence and insights across discovery, clinical development and the regulatory and commercial stages of drug development. It includes modules for competitive intelligence, clinical intelligence, regulatory intelligence, gene and intervention landscapes. Researchers can minimise the risk of missing information that may impact their work. Innoplexus’ use of Google products extends beyond GCP to G Suite. “It is mandatory for all our team members to retain all their documents on Google Drive—if they experience a hardware malfunction no data is lost. We can simply give them a new notebook and they can get on with it,” Tripathi says. The company also runs Google Hangouts groups for its workforce as a whole, as well as for individual teams. “This enables us to build a ‘one big happy family’ culture at Innoplexus and enables our teams to collaborate effectively,” he adds.

In future, Tripathi and his colleagues see GCP as being extremely helpful in making Innoplexus’ products more efficient and cost effective. “Google has been pushing hard into deep learning and making powerful tools and technologies available on GCP,” says Dr. Sven Niedner, Chief Operations Officer, Innoplexus. “We really appreciate the stability and scalability of the GCP platform—as a fast-growing startup, we can scale our platform up and down in minutes without any worries.”

About Innoplexus

Founded in 2011 and headquartered in Germany with offices in India, Innoplexus AG provides artificial intelligence (AI) systems that produce intelligence and insights for businesses such as life science companies.

Industries: Technology
Location: India
Google Cloud Platform logo

12 Months FREE TRIAL

Try Kubernetes Engine, BigQuery, and other Cloud Platform products with $300 in free credit and 12 months.

TRY IT FREE
Google Cloud Platform logo

12 Months FREE TRIAL

Try Kubernetes Engine, BigQuery, and other Cloud Platform products with $300 in free credit and 12 months.

TRY IT FREE

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.