Charting the cosmos: How Google Cloud is helping the NSF–DOE Vera C. Rubin Observatory map the universe

Google Cloud Results

Manages a 500-petabyte dataset—the largest astronomical dataset in history—made available to scientists worldwide using Google Cloud
Ensures reliable, real-time sky alerts in under 60 seconds using an on-premises Google Kubernetes cluster

By providing the exploration and analysis platform for a 500-petabyte dataset–the largest astronomical dataset in history–Google Cloud helps Rubin Observatory chart the cosmos and democratize science for 30,000 researchers worldwide.

The NSF–DOE Vera C. Rubin Observatory, funded by the U.S. National Science Foundation (NSF) and U.S. Department of Energy's Office of Science (DOE), is a testament to modern “Big Science.” Rubin Observatory operated cooperatively by NSF NOIRLab and the DOE's SLAC National Accelerator Laboratory. Its singular mission is to conduct the Legacy Survey of Space and Time (LSST), an ambitious 10-year project to create what has been described as the “biggest, most data-rich movie ever made” of the night sky.

This endeavor will systematically survey the entire available southern sky every few nights, creating a time-lapse movie of the cosmos that will reveal objects that change or move on timescales from seconds to years. The survey is designed to address four of the most pressing questions in modern astrophysics: probing the nature of dark energy and dark matter; taking an inventory of our solar system; exploring the transient optical sky; and mapping the Milky Way.

A data challenge of galactic proportions

NSF-DOE Rubin Observatory’s grand vision is powered by a 3.2-gigapixel camera the size of a car, which SLAC National Accelerator Laboratory led a two-decade effort to design and construct – it’s the largest digital camera in the world. To put it in perspective, if you were to display a full-resolution image, you would need a wall of approximately 378 4K Ultra HD TVs to see the entire picture at its native resolution. This camera generates a staggering 10 terabytes of data every night, which will culminate in a 500-petabyte dataset after ten years including derived science data products (equivalent to 100-million DVD-quality movies).

When the project was first conceived over two decades ago, its data management plans were based on an era of on-premise High-Performance Computing (HPC); the initial processing requirement of 150 TFLOPS was equivalent to the world's most powerful supercomputer in 2004. As technology evolved, the project recognized a strategic pivot was necessary to make scientific discovery with Rubin data accessible to scientists worldwide. The original model was insufficient for the mission’s core promise of providing equitable access to an estimated 10,000 researchers worldwide. A modern, cloud-native solution was required so that scientists wouldn’t have to download petabytes of data locally, a prohibitively expensive and time-consuming barrier that would limit participation to only the most well-resourced institutions.

Building a “supercomputer in a browser” on Google Cloud

The answer to Rubin Observatory’s data challenge is the Rubin Science Platform , a pioneering research portal that brings the analysis directly next to the data. This user-friendly platform is built on Google Cloud and serves as the primary interface for scientists worldwide. The back-end data repository powering the Rubin Science Platform is the US Data Facility, hosted at SLAC National Accelerator Laboratory, which serves as the primary data and computing center for Rubin Observatory. This hybrid architecture effectively provides a “supercomputer in a browser,” empowering a global community and removing the most significant historical barrier to entry in big-data science: the requirement for access to a local supercomputer.

Researchers can log into the Rubin Science Platform from a standard laptop and use familiar tools like Jupyter Notebooks and custom Application Programming Interfaces (APIs) to interact with the data. The immense, on-demand computational power of Google Compute Engine allows thousands of scientists to run complex analyses simultaneously without managing physical hardware. This service provides the raw, scalable processing power necessary to analyze the vast and complex astronomical data coming from deep space.

Previously, astronomers wanting to analyze a massive dataset would have to download enormous portions of it to powerful, local High-Performance Computing (HPC) centers. This model was often cost-prohibitive and logistically impossible for most, limiting participation in cutting-edge science to researchers at a handful of elite institutions.

The Rubin operations team uses Google Kubernetes Engine to efficiently deploy, manage, and scale the applications that make up the Rubin Science Platform. In a strategic hybrid approach, Google Cloud infrastructure supports essentially all user-driven science analysis – representing approximately 10% of the observatory’s total compute spend. This dedicated cloud capacity empowers scientists to not only analyze the primary dataset but also create and share their own value-added data products.

From global access to an AI-powered future

The true impact of this work is the democratization of discovery. A graduate student or researcher at a small university now has the same level of access as their counterparts at major national labs. This fosters a more broadly collaborative scientific ecosystem, empowering researchers to get answers faster and ask new, more ambitious questions.

These ambitious questions are at the heart of the observatory’s mission. They’re structured around the four most pressing and impactful questions in modern astrophysics that this powerful new observatory was uniquely positioned to answer. Every aspect of the observatory – from the camera’s design to the survey’s strategy and the Google Cloud data platform – was specifically built to achieve these four goals:

Probing Dark Energy and Dark Matter: Studying the mysterious forces that make up 95% of the cosmos.
Taking an Inventory of the Solar System: Discovering millions of new asteroids and creating a critical tool for planetary defense.
Exploring the Transient Optical Sky: Capturing dynamic events like supernovae in near-real time.
Mapping the Milky Way: Charting billions of stars to understand the formation and structure of our own galaxy.

For instance, to support the Transient Optical Sky mission, the SLAC-based Kubernetes system will capture events like a supernova and issue alerts to the global astronomical community in under 60 seconds, enabling rapid follow-up observations by other telescopes. To achieve the goals of the Solar System Inventory, it will enable the discovery of millions of new asteroids, creating a critical tool for planetary defense. The full set of underlying data products behind these transient and moving-object discoveries will be accessible through the Google Cloud-based platform shortly after the initial alerts.

This new paradigm empowers a global community that already includes over 2,800 active members. These researchers have organized into eight official Science Collaborations – specialized international teams from institutions of all sizes – allowing them to accelerate their research and tackle novel scientific challenges.

Rubin Observatory’s work with Google Cloud is also a foundation for the future. As the dataset grows, the observatory is investigating the use of BigQuery to manage and query the massive catalogs of celestial objects and plans on leveraging Vertex AI to develop natural language interfaces, promising to unlock new, AI-driven scientific insights for decades to come.

The NSF–DOE Vera C. Rubin Observatory is an astronomical observatory located in Chile whose primary mission is to carry out the ten-year Legacy Survey of Space and Time (LSST) to create a detailed, time-lapse record of the southern hemisphere sky.. The observatory is expected to make significant contributions to our understanding of the universe, including the nature of dark matter and dark energy, the formation of the Milky Way, and the inventory of our solar system. It’s named after astronomer Vera Rubin, who provided the first convincing evidence for the existence of dark matter.

The U.S. Department of Energy's SLAC National Accelerator Laboratory is a pioneering research center featuring a two-mile-long particle accelerator. The lab is renowned for its history of Nobel Prize-winning discoveries in particle physics and now operates the Linac Coherent Light Source (LCLS), a revolutionary X-ray laser capable of capturing atomic and molecular processes in real-time.

NSF NOIRLab, the U.S. National Science Foundation center for ground-based optical-infrared astronomy, operates the International Gemini Observatory (a facility of NSF, NRC–Canada, ANID–Chile, MCTIC–Brazil, MINCyT–Argentina, and KASI–Republic of Korea), NSF Kitt Peak National Observatory (KPNO), NSF Cerro Tololo Inter-American Observatory (CTIO), the Community Science and Data Center (CSDC), and NSF–DOE Vera C. Rubin Observatory (in cooperation with DOE’s SLAC National Accelerator Laboratory). It is managed by the Association of Universities for Research in Astronomy (AURA) under a cooperative agreement with NSF and is headquartered in Tucson, Arizona.

Industry: Federal Government

Location: United States

Products: BigQuery, Vertex AI Search, Google Kubernetes Engine, Compute Engine