Helping researchers at CERN to analyze powerful data and uncover the secrets of our universe
About CERN
The European Organization for Nuclear Research (CERN) uses the world’s most complex scientific instruments, including the Large Hadron Collider, to study subatomic particles and advance the boundaries of human knowledge by delving into the smallest building blocks of nature. Founded in 1954, CERN was one of Europe’s first joint ventures and now has 23 member states.
Tell us your challenge. We're here to help.
Contact usCERN analyzes petabytes of data per year, including from experiments on the world’s largest particle accelerator. A joint project has shown how it’s possible to burst this infrastructure with Google Cloud.
Google Cloud results
- Sped up terabyte-size workloads by reading data at 200 GB per second with Cloud Storage
- Compute power was scaled automatically, as needed, with Google Kubernetes Engine
- Used the public cloud for the public good by making more data open source for researchers, scientists, and educators
Researchers analyze 70 TB Higgs boson data in minutes
Straddling the border between France and Switzerland, thousands of researchers are using some of the biggest, most complex scientific instruments in the world to examine the smallest particles in our universe. The European Organization for Nuclear Research (CERN), based in Geneva, is one of the world’s largest and most respected research centers, funded by 23 member states. “Our mission is to uncover the secrets of the fundamental building blocks of nature,” says Ricardo Rocha, Computing Engineer at CERN. “We’re looking at some of the biggest questions in science about dark matter, for instance, or about what the universe looked like moments after the Big Bang.”
CERN is a laboratory for particle physics, but its discoveries have had a powerful impact on many areas of everyday life as we know it. From transformation in medical scanning technology and advancements in the aerospace industry to art restoration, CERN has pioneered a host of modern-day achievements. It was also at CERN in 1989 that a scientist named Tim Berners-Lee conceived and developed the World Wide Web.
However, CERN is most famous for being the home of the Large Hadron Collider (LHC), the world’s largest particle accelerator. A hundred meters underground in a 27 km tunnel, the LHC made headlines around the world in 2012 when data from its collisions was used to prove the existence of the Higgs boson particle, a mystery that physicists had been trying to solve since 1964, when it was first proposed. The proof of its existence, thanks to the ATLAS and CMS experiments on the LHC, led to Professor Higgs and François Englert winning the Nobel Prize in Physics in 2013.
“The resource-intensive tasks we’re exploring, such as machine learning, often require specialized hardware. Google Cloud opens up new options for our IT infrastructure to access GPUs and other types of accelerator hardware.”
—Ricardo Rocha, Computing Engineer, CERNThe experiments on the LHC continue to generate massive volumes of data: around 60 petabytes per year. With upgrades to the LHC and more sensitive instruments, the compute and storage capacity of the existing IT infrastructure is being stretched to its limits. This is despite the CERN Data Centre already sitting at the heart of a distributed global computing system, called the Worldwide LHC Computing Grid, that provides a million computer cores and an exabyte of storage. One of the options being explored by CERN, as well as increasing its on-site infrastructure, is the possibility of boosting its resources with public cloud. Google is a member of CERN openlab, the laboratory’s public-private partnership for R&D related to computing technologies. A joint project has been running since 2019 to explore the potential of Google Cloud.
“The resource-intensive tasks we’re exploring, such as machine learning, often require specialized hardware,” says Ricardo. “Google Cloud opens up new options for our IT infrastructure to access GPUs and other types of accelerator hardware.”
Extreme scale, extreme variability
The most high-profile and the most resource-intensive projects at CERN are the ones based around the LHC. Two particle beams travel at close to the speed of light, causing around a billion particle collisions every second. Inside the 27 km tunnel, sensors collect as much data as possible from the collisions. “The experiments generate around a petabyte of data every second, but we don’t have the capacity to analyze all of that, and most of it is noise anyway. Complex filtering systems slow down the capture rate to tens of gigabytes per second. Even then, we keep adding a huge amount of data every year,” Ricardo says. “We store more than 300 petabytes at the CERN Data Centre.”
It’s also important at CERN to account for the cyclical nature of scientific work. Researchers continually ask new questions, which leads to new explorations and discoveries, which in turn raise new questions, and the cycle repeats. Researchers need to analyze the captured data from experiments to make sense of it and generate new insights, but they don’t always work at a steady pace over the years. Some projects ramp up and down with conferences, so the demand on infrastructure can vary from month to month.
Scaling up with Google Kubernetes Engine
CERN has already redesigned some of its architecture to run on Kubernetes, the open source system developed at Google. Container-based systems such as Kubernetes, which focus on portability of services, allow IT infrastructures to deploy more quickly and scale more easily than traditional architectures. This made it relatively simple to incorporate Google Cloud for these investigations.
The core of the cloud deployment being evaluated through this project is Google Kubernetes Engine (GKE). One of the main features of GKE is the ability to scale up its clusters to very large sizes very quickly. GKE can automate administrative work, reducing administrative overhead and autoscaling clusters up and down as needed. Meanwhile, the addition of Preemptible VMs to GKE can provide flexibility in the way that workloads are handled. Preemptible instances are short lived and easily repeatable, making them ideal for small, repetitive tasks.
“Preemptible instances in Google Kubernetes Engine help us work so much faster,” says Ricardo. “They’re very quick jobs, taking just minutes or hours, and they’re very easy to repeat if needed. They’re up to five times more cost-effective for us, compared to normal instances.”
Another key part of the Google Cloud deployment investigated through the CERN openlab project has been Cloud Storage. “The performance of the network and the storage in Google Cloud opens up new possibilities for some of our workflows,” says Ricardo. “This is something that we’re keen to explore further.”
“In 2012, the analysis of the full dataset took more than 24 hours. When we reran the full analysis on Google Cloud, it took just over five minutes. Through Google Cloud, we were able to access 25,000 cores very quickly, run the analysis, and then shut everything down again.”
—Ricardo Rocha, Computing Engineer, CERNUsing the public cloud for the public good
To test the new Google Cloud infrastructure, members of the project team at CERN reran the analysis that led to the 2012 Nobel-prize winning discovery of the Higgs boson particle. “In 2012, the analysis of the full dataset took more than 24 hours,” says Ricardo. “When we reran the full analysis on Google Cloud, it took just over 5 minutes. Through Google Cloud, we were able to access 25,000 cores very quickly, run the analysis, and then shut everything down again.”
Public cloud resources can boost CERN’s on-premises resources and speed up its analyses, running huge workloads at speed. For the physicists and engineers working in the laboratory, this can translate into time savings, which is especially important during times of great demand, such as before conferences.
“Everything boils down to improving the number of events we can process per second,” says Ricardo. “Google Cloud has the potential to help us throw extra resources at our workloads, resulting in more analysis and faster results for our researchers.”
Ricardo and his colleagues working on the investigation found that Google Cloud could potentially be used to help some computer engineers at CERN dedicate more of their time to developing better code, working with higher workloads, and building new tools for researchers. CERN is committed to the open-source community. By using an open source tool like Kubernetes, the organization can make its own tools available under the open source license to whoever wants to use them, alongside open data, and be confident that they can replicate analyses or create their own using compatible tools that CERN used in the public cloud.
"We’ve discovered the Higgs boson, but more work is needed to understand it. As we explore new frontiers of high-energy physics, public cloud, such as Google Cloud, could play a role in helping us ensure that we have the right tools to keep learning about the nature of the universe."
—Ricardo Rocha, Computing Engineer, CERNA strong foundation for future discoveries
CERN is currently exploring machine learning for a variety of use cases. “There’s a lot of interest in using machine learning to spot anomalies in our internal operations or to simulate results at speed when we want to verify results, but the most exciting uses are where we can find new ways to benefit the research,” explains Ricardo. “We can train the models to find patterns in the data that we wouldn’t even know to look for. That can have knock-on effects not just on the scientific models but on the applications of those models and the people who use them.”
Cutting-edge tasks such as model training require cutting-edge tools. To accelerate this work, CERN has been experimenting with Google Cloud GPUs and the Cloud TPU, or Tensor Processing Unit, a customized chip for training and running machine learning models at speed.
CERN is currently working to upgrade the LHC into an even more powerful particle accelerator. The successor, named the High Luminosity Large Hadron Collider (HL-LHC), is set to come online in 2026 and will increase the amount of data the organization handles by several factors of ten. “We are investigating ways to rapidly boost capacity when required, for a variety of different workloads,” says Ricardo. "We’ve discovered the Higgs boson, but more work is needed to understand it. As we explore new frontiers of high-energy physics, public cloud has the potential to boost CERN’s resources and therefore play a role in helping us ensure that we have the right tools to keep learning about the nature of the universe."
Tell us your challenge. We're here to help.
Contact usAbout CERN
The European Organization for Nuclear Research (CERN) uses the world’s most complex scientific instruments, including the Large Hadron Collider, to study subatomic particles and advance the boundaries of human knowledge by delving into the smallest building blocks of nature. Founded in 1954, CERN was one of Europe’s first joint ventures and now has 23 member states.