Desktop Genetics: Handling huge data with scaling on Google Cloud Platform

By applying machine learning to new genetics technology, London-based startup Desktop Genetics has become a frontrunner in the field of bioinformatic software development. In genomic analysis, a single cell typically generates around a terabyte of raw data, meaning Desktop Genetics has to process very high CPU loads. And because every experiment is unique, data traffic on the company's servers can vary enormously from one project to the next. Desktop Genetics makes this possible by scaling at speed on Google Cloud Platform, so that their DESKGEN platform and analytical tools can now handle even the most data-intensive lab work.

“Science is fundamentally unpredictable, so it's very difficult to forecast how many servers you're going to need six months from now. Being able to tap into a pool is truly valuable. Google Cloud Platform allows us to expand rapidly in order to meet demand.” - Riley Doyle, CEO and Technical Lead, Desktop Genetics

Meeting unpredictable demand with maximum efficiency

The revolutionary CRISPR-Cas9 gene-editing technique has made genetic engineering fast and affordable, but it also produces massive amounts of data and requires major processing capacity. Desktop Genetics specialises in handling and analysing this information for laboratories, but maintaining a bank of high-spec servers is inefficient if they’re not used and is insufficient for handling the CPU demands of major experiments. Desktop Genetics has to match processing capacity to unpredictable and extreme loads, and with maximum efficiency.

Google Cloud Platform is a perfect solution. Google Compute Engine can spin up VMs in seconds, so capacity even at major scale can be activated and deactivated according to need – a major change from the company’s dedicated private servers, which took a day to spin up and 30 days to stop. Google Cloud Storage buckets further improve efficiency by saving machines and standard reference files, keeping automatic backups, and allowing servers to work across each other, instead of as autonomous nodes, reducing the need to replicate functionality. The company can guarantee security and privacy to third parties with confidential buckets and Google Cloud’s granular access controls, auditing and HIPAA compliance. And the development team can easily switch between each other’s servers without a need to provision SSH keys every time, improving teamwork.

“We might only have one request an hour on a server but that one request is so data intensive that it’s like a tsunami in terms of CPU load. We had physical servers that were grossly over spec just sitting around most of the time, wasting resources. Google Cloud Platform is fantastic because we can dynamically turn servers on and off and resize them.” - Riley Doyle, CEO and Technical Lead, Desktop Genetics

Working fully at the genomic scale

With Google Cloud Platform, Desktop Genetics can immediately take on large-scale bioinformatics projects that had previously demanded bespoke solutions. And with GCP’s multizone architecture they can set up demo servers and present their services to global clients without connection issues. Now the company is using GCP billing oversight to attribute CPU resources to particular products, to further optimise expenditure.

“What we're able to achieve with GCP is incredible. Nothing else scales like it, and you need this capability to work with genomic data. You won’t be able to develop personalised gene therapies and cell therapies to cure cancer and other diseases without it. GCP has really allowed us to make the jump to working fully at the genomic scale.” - Riley Doyle, CEO and Technical Lead, Desktop Genetics