InsideDNA powers on-demand analysis and method-sharing service with Google Cloud Platform
InsideDNA founders Anna Kostikova and Andrey Khmelevskiy aimed to build a service that would make genome science more reproducible, provide reliable access to bioinformatics tools for biologists, and facilitate collaboration across labs and institutes. They built their service on Google Cloud Platform for the best performance and most cost-effective infrastructure.
Building a platform for better science
Computational biologist Anna Kostikova had a problem: her blog on phylogenetics and comparative methods was generating too many follow-up queries from biologists seeking to use her scripts and analysis pipelines but challenged by IT infrastructure. Andrey Khmelevskiy, a computer scientist and engineer, suggested building a platform that would provide on-demand access to fully functional scripts and data published on the blog.
That concept led to InsideDNA, an effort to create an online environment that would help scientists collaborate, analyze data and make their results more transparent. “The more we worked on our idea, the more we found out these are problems that every bioinformatician or biologist experiences,” Khmelevskiy says. Launched in 2015, the service began with its analysis offering: a portfolio of more than 1,000 open-source bioinformatics tools that users could access and instantly run via web form or command line.
The other two parts of InsideDNA are services for tool publishing and method sharing. The tool publishing service allows bioinformaticians to quickly share new algorithms (tools, scripts and software), and it’s what the founders used to publish hundreds of tools already included on the site.
The sharing service enables users to publish an analysis as an executable bundle under a permanent web URL. The bundle contains a tool, tool settings and datasets. When accessed via URL, it lets a user instantly re-run an entire analysis. The sharing service aims to help scientists ensure that anyone can access and repeat published analysis consistently, regardless of the operating system, tool version, or infrastructure access. Kostikova says this will help to address the ongoing challenge of reproducible research and allow scientists to step through an analysis exactly as it was run at the time of publication. “It’s so painful for researchers to try to repeat what has been done, with data scattered everywhere, poorly documented method sections in a research article, and limited computational power,” she says.
As the vision for InsideDNA grew, so did the resources it needed to thrive. Kostikova and Khmelevskiy realized the only way to achieve their goal was to build the service on a cloud computing foundation for ultimate scalability.
‘The heart of our service’
The duo pitched their idea to Google and were given the opportunity to test the InsideDNA prototype on its cloud environment. The Google team also gave them technical support along the way. When Kostikova and Khmelevskiy evaluated other options, such as Microsoft’s Azure or Amazon Web Services, GCP had the best performance and lowest cost.
“We wouldn’t be able to do this without cloud computing,” Khmelevskiy adds. “The way we manage this content, the cloud is the heart of our service.”
What they’ve done has been very well received by the community, even in the first few months of operation. At its six-month mark, more than 1,000 people used its analysis offering, some of whom had already published or submitted scientific publications with the permanent URL. For example, together with the research group of Nadir Alvarez at the University of Lausanne, InsideDNA has published a novel reduced-representation genome sequencing method in PLoS One with an article featuring its permanent URLs. In addition, 5,000 users subscribe to the company’s bioinformatics tutorial service. And among the tools included in the InsideDNA repository are several built on the Google Genomics API.
The founders say that scientists switch to InsideDNA because of GCP’s simplicity. “We have a number of users who switched from powerful clusters in their university to our service just because it’s quicker and more scalable,” Khmelevskiy says. For others who never had access to heavy-duty local compute resources before, InsideDNA and its link to GCP provide an affordable route to speedy, high-powered computational infrastructure that can be run whenever needed.
Converting skeptics with cloud scalability
The real proof of InsideDNA’s success is in winning converts from former skeptics, says Khmelevskiy, who notes that telling people the service is built on GCP has been important in convincing them that the concept is practical. “A number of people said this wasn’t even possible,” he adds. “Now those people are asking for extra credits to use our service.”
Cost is a major consideration among scientists, especially for those starting new labs. With the affordability of cloud computing, Kostikova says, researchers realize that they can get a lot more computing time and processing power for less money and hassle than it would take to buy and maintain their own server or cluster.
Beyond the benefits they initially envisioned, InsideDNA has been a boon to team leaders who can use it to organize and track research projects. They can easily add new collaborators to any project as well as monitor analyses, member activities and cloud usage. Within projects, the platform lets members share data, tools and analysis pipelines, reducing data duplication and promoting collaboration in geographically distributed teams.