AI & Machine Learning

RAD Lab AlphaFold module for researchers

January 12, 2023

Mona Mona

AI/ML Specialist

Mukul Gupta

Solutions Developer

Try Google Cloud

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Free trial

Understanding protein geometries is essential to understanding a protein’s function, and thus essential to a range of disruptive research projects, from creating more sustainable materials to developing more effective treatments to diseases. For this reason, the “protein-folding problem” ranked for decades among biological science’s grand challenges—until Google’s DeepMind developed AlphaFold.

Widely hailed as a breakthrough in biological research and a leap in the development of vaccines and synthetic materials, AlphaFold is an AI model developed by DeepMind for predicting the 3D structure of a protein based on its 1D amino acid sequence. DeepMind open-sourced the AlphaFold algorithm, code, and methodology with the scientific community in 2021, and AlphaFold’s Protein Structure Database now includes nearly all proteins known to science.

In this article, we’ll explore AlphaFold on RAD Lab, which provides researchers a quick and easy way to spin up Google Cloud environments, called projects, that have the Vertex AI APIs enabled and an AlphaFold container notebook deployed on Vertex AI Workbench.

Why AlphaFold on Google Cloud?

The license for AlphaFold allows for commercial use, letting researchers and biopharma companies leverage DeepMind’s neural network model weights for use cases such as creation of drug discovery pipelines, determination of experimental structures, and even generation of tomograms of whole cells and molecular replacement. AlphaFold Since its release, over 750,000 researchers and biologists in over 190 countries have used the database to view millions of structures.

However, to run AlphaFold, you will need to download over 2.2 TB in database files and store them on the server’s disk. CPU, memory, and GPU requirements must also be met to run AlphaFold, which can make it uncomfortable to build and configure a cloud server to use AlphaFold and keep it running for long periods of time. That’s why we’ve built services to make it easy to run AlphaFold on Google Cloud, helping organizations and researchers to meet the system’s infrastructure demands. To this end, we published a blog post in 2022 to help customers quickly get AlphaFold up and running. It covers running in a Vertex AI Workbench notebook provisioned with powerful GPU accelerators to enable fast predictions for any given sequence.

Biopharma companies and researchers who want to get started often face the following challenges:

They want to use Google Cloud but don’t know where or how to proceed
They want to be able to replicate and share their work with collaborators and the community but are not sure how

To address these needs, we created a RAD Lab module for AlphaFold. Let's first understand what RAD Lab entails.

What is RAD Lab?

RAD Lab is a solution designed to make research with Google Cloud faster and easier through automation. Leveraging infrastructure-as-code resources to automate deployments and built on best practices for managing Google Cloud infrastructure, RAD Lab consists of modules designed to meet research and organizational needs, from building an innovation lab to prototype application development to designing modern DevOps processes. Some illustrative use cases include:

A patent agency accelerating its review process by creating a cloud sandbox environment that lets technology teams iterate faster through new applications.
A research agency developing an environment for scientific collaboration, with scientists leveraging the best of Google Cloud for discovery.
A systems integrator accelerating progress across strategic priorities, leveraging the flexibility of RAD Lab to share technology across departments and teams.

All RAD Lab modules are open source and accessible via Google Cloud’s GitHub. In order to use RAD Lab you need to sign up for a Google Cloud account using Gmail or use your org billing ID, then open a project to deploy in GitHub. It will spin up the containers, notebooks, storage, and anything else you need for your specific workflow. RAD Lab can be deployed with a subscription to simplify billing.

Using RAD Lab to accelerate research with AlphaFold

AlphaFold on RAD Lab provides researchers a quick, easy way to spin up Google Cloud environments called “projects.” These projects have the Vertex AI APIs enabled and an AlphaFold container notebook deployed on Vertex AI workbench for quick and easy access.

The module does the following:

Creates a Google Cloud project for researchers.
Enables Vertex AI APIs.
Deploys the AlphaFold container as a notebook in Vertex AI Workbench, using a customized Docker image in Artifact Registry, with preinstalled packages for launching a notebook instance in Vertex AI Workbench and prerequisites for running AlphaFold.

Under the hood, this RAD Lab module utilizes our Data Science module and deploys the AlphaFold configurations over Vertex AI Workbench.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_RAD_Lab_AlphaFold_v1.max-900x900.jpg

As per the architecture diagram above there are two personas associated to RAD Lab:

RAD Lab admins will set up GCP environments and the RAD Lab AlphaFold module.
RAD Lab users will have access to the RAD Lab AlphaFold environment created by admin.

Once the module is deployed, Researchers or users will have a GCP project created by the admin. They will go to the GCP project and navigate to Vertex AI workbench to start experimenting with AlphaFold module.They will also have access to BigQuery and Google Cloud storage to store and transform their sequencing data in a scalable manner.

AlphaFold Deployment via RAD Lab

Click here for a video of the complete setup.

How to setup RAD Lab environment

1. Navigate to the Public RAD Lab Repo.

2. Make sure all the IAM roles/policies pre-reqs are satisfied.

3. Click on Open in Google Cloud Shell .

Public Repo gets cloned into your GCP Cloud Shell.

https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2_RAD_Lab_AlphaFold.gif

How to install RAD Lab Launcher Prerequisites

4. Navigate to the radlab-launcher folder.

5. Install the libraries by running python3 installer_prereq.py.

https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3_RAD_Lab_AlphaFold.gif

How to trigger guided setup for RAD Lab Module deployment

6. Trigger the launcher by running: python3 radlab.py.

7. Follow the guided setup and select/set:

User identity (which has all required IAM roles) to spin up the module
GCP project (RAD Lab Management project)
RAD Lab module you would like to deploy
Select the Action you want to perform for the corresponding RAD Lab Model
(Select/Create) GCS Bucket to store Terraform states for deployment
Org ID
Folder ID
Billing Account ID

8. Save the Outputs of the deployment (like newly created RAD Lab Project-ID).

https://storage.googleapis.com/gweb-cloudblog-publish/original_images/4_RAD_Lab_AlphaFold.gif

How you will interact with the project deployed

9. Navigate to https://console.cloud.google.com/ and select/set the RAD Lab Project-ID to access the RAD Lab project.

10. Based on the module you deployed, navigate to the specific resources.

https://storage.googleapis.com/gweb-cloudblog-publish/original_images/5_RAD_Lab_AlphaFold.gif

Protein Folding Example Run

In this notebook on Vertex AI Workbench, in order to get started and explore the AlphaFold model, we input a protein sequence and get an output 3D protein.

We are going to show you how to fold the Ubiquitin-like protein, which plays a key role in the innate immune response to viral infection either via its conjugation to a target protein (ISGylation) or via its action as a free or unconjugated protein.

1. Open AlphaFold.ipynb in the workbench and run the cells Download AlphaFold data and configure GPU acceleration.

2. In the Making a prediction section, input the below protein sequence and run the cells. If you enter only a single sequence, the monomer model will be used. If you enter multiple sequences, the multimer model will be used. We are going to use a single protein sequence and fold a monomer model. We are going to use IG15 protein for an example for the protein folder. Copy paste the protein sequence below to your notebook cell and run the cell.

# Input sequences (type: str)

sequence_1 = "MGWDLTVKMLAGNEFQVSLSSSMSVSELKAQITQKIGVHAFQQRLAVHPSGVALQDRVPLASQGLGPGSTVLLVVDKCDEPLSILVRNNKGRSSTYEVRLTQTVAHLKQQVSGLEGVQDDLFWLTFEGKPLEDQLPLGEYGLKPLSTVFMNLRLRGGGTEPGGRS"

3. Now in the notebook run the cells under “Search against genetic database.” You will get the below response:

https://storage.googleapis.com/gweb-cloudblog-publish/images/6_RAD_Lab_AlphaFold_v1.max-1600x1600.jpg

4. The next step is to “Run AlphaFold” section in the notebook. Once this cell has been executed, a zip-archive "prediction.zip" with the obtained prediction will be saved on the VM, and available for download to your computer in the sidebar. Below is the 3D folded protein structure for the IG15 protein.

https://storage.googleapis.com/gweb-cloudblog-publish/images/7_RAD_Lab_AlphaFold_v1.max-700x700.jpg

Take the next steps

In this blog we covered how to use the RAD Lab solution to quickly set up the AlphaFold module for researchers. The next generation of RAD Lab includes RAD Lab UI, which provides a modern interface for less technical users to deploy Google Cloud resources

Here is our GitHub repo. We welcome your feedback and contributions. Thank you to Michelle Holko for her contributions to this blog.

Posted in