Accelerate medical research with PubMed data now available in BigQuery
Willis Zhang
Healthcare Architect, Google Public Sector
Stone Jiang
Field Solutions Architect, Google Public Sector
Medical professionals regularly consult PubMed, a database maintained by the National Library of Medicine (NLM) at the National Institutes of Health (NIH) that contains over 35 million biomedical articles and is growing by 1.5 million articles annually. However, traditional keyword searches may miss crucial connections. A pediatric oncologist treating a rare leukemia mutation might never find a relevant case study from another country simply because it uses slightly different terms. Pharmaceutical companies spend years manually reviewing literature for drug repurposing opportunities. Clinical trial matching remains a manual, time-consuming process that delays patient access to potentially life-saving treatments.
At Google Cloud, we're addressing this challenge by making PubMed data available as a BigQuery public dataset with vector search capabilities from Vertex AI (both BigQuery and Vertex AI Vector Search are FedRAMP High authorized), enabling semantic search of medical concepts beyond simple keyword matching.
Transforming oncological literature reviews with BigQuery
The Princess Máxima Center for Pediatric Oncology in the Netherlands developed Capricorn, a system that combines PubMed data in BigQuery with Gemini models to revolutionize their international Leukemia Tumor Board (iLTB). By consolidating medical literature analysis into BigQuery, they can now provide comprehensive literature reviews in minutes rather than hours.


Dr. Uri Ilan, pediatric oncologist at The Princess Máxima Center for Pediatric Oncology explains: "The power to provide reports summarizing all relevant information in the literature to tumor boards is remarkable. We're now working with partners to develop similar systems for Neuro-Oncology and Neuroblastoma."
Other systems like Rutgers Health are now expanding upon this foundation. Dr. David J. Foran, Chief Informatics Officer, who leads the Rutgers team, notes: “Rutgers Cancer Institute is poised to integrate Capricorn with our tumor boards to enable us to access these AI-powered oncology tools and evaluate the performance of the underlying algorithms for personalizing treatment for cancer patients. We will also assess its potential use in our global health initiatives in Botswana.”
Combining PubMed with BigQuery unlocks new healthcare capabilities
Healthcare organizations are discovering other transformative use cases by combining PubMed's comprehensive medical literature with BigQuery's analytics and AI capabilities:
Clinical decision support: Leading pediatric hospitals may use semantic search to match leukemia patients with relevant clinical trials based on complete genetic profiles. Continued collaboration and knowledge sharing through these tools will be essential for advancing precision medicine.
Drug discovery and repurposing: Pharmaceutical companies analyze literature patterns to identify existing drugs that could treat new conditions. Research indicates repurposed drugs cost 80% less and reach the market 70% faster, and health economist Federico Felizzi highlights that machine learning also helps refine the economic assessments needed to validate these candidates. Analyzing citation patterns between drugs can identify thousands of repurposing candidates, transforming a years-long manual process into automated discovery.
Public services: Government health agencies like the FDA can use this repository to synthesize evidence on drug efficacy and food safety, streamlining approvals to ensure a safe domestic food and drug supply while accelerating new therapies and proactively identifying trends in food safety and nutrition to combat chronic disease.
Rare disease diagnosis: Medical institutions can now match complex symptom patterns against global case reports, potentially reducing diagnostic time from years to months for the 300 million people affected by rare diseases.
Insurance and payer intelligence: American and European bodies like Germany's G-BA, UK's NICE, and France's HAS leverage comprehensive literature analysis for evidence-based coverage decisions, with literature reviews forming the foundation for health technology assessments and reimbursement decisions across both US and EU markets.
Getting started with PubMed data in BigQuery
Healthcare organizations can immediately begin leveraging PubMed data in BigQuery. Here's a simple example of how to perform semantic search on medical literature.
After first enabling the Vertex AI API in your Google Cloud project, inside of BigQuery, create a model for text embeddings.
Now you’re ready to directly query PubMed Central articles! Here’s an example query:
The results will look something like this:


What’s next
Complete end-to-end examples of agentic literature review are available at github.com/google/pubmed-rag. These include methods for customizing ranking criteria with a points system and normalizing results using Scimago Journal Impact scores.


Major organizations worldwide are already seeing transformative results. Gaurav Trivedi, Staff Engineer at Suki AI notes, "This announcement signals a new era where datasets evolve from static files into building blocks for AI-ready applications."
By making PubMed data available in BigQuery with vector search capabilities, we're democratizing access to medical knowledge. A physician in a rural community now has the same ability to find relevant research as specialists at major academic medical centers. This aligns with NIH's mission to "seek knowledge and apply it to enhance health, lengthen life, and reduce illness and disability."
Medical researchers and healthcare organizations can now leverage semantic search across full-text PubMed Central articles using BigQuery and Vertex AI to accelerate drug discovery, improve clinical decisions, and advance precision medicine. To get started with PubMed data in BigQuery, visit cloud.google.com/datasets.
Catch the highlights from our recent Google Public Sector Summit where we shared how Google Cloud’s AI and data technologies can help advance your mission.



