Nobl9's Reliability AI, Powered by Google
Steve McGhee
Reliability Advocate, Google
Brian Singer
Chief Product Officer, Nobl9
Customers who want to leverage AI technology in Google Cloud to define and understand SLOs can now do so through Vertex AI, thanks to Nobl9 and the new tool they developed, SLOgpt.ai
Google Site Reliability Engineering (SRE) helped to popularize Service Level Objectives (SLOs) by releasing several SRE Books. Since then, the SRE approach to understanding the health of modern distributed systems has helped many teams modernize their operations. By defining personalized SLOs, Google Cloud customers can advance their technology and grow their customer base without sacrificing reliability or potentially burning out their team by chasing false positives. SLOs help teams understand which problems are directly negatively impacting customer experience — a tremendous insight. However, the adoption of SLOs can be difficult because reliability requirements constantly change with customer expectations, and the inherent tradeoffs in setting goals are counter-intuitive to the uninitiated. Identifying the right metrics and developing processes around effectively using SLOs is not just a plug-and-play solution, it also involves changing the way you run the software services supporting your business.
Could AI technologies offer a better way? Specifically large language models (LLMs), which have been shown to be useful, simple interfaces to a potentially large set of structured and unstructured data? Built using Google Cloud, Nobl9's new LLM-based product, SLOgpt.ai, is built to answer these questions, providing users expert-level interpretation of signals via a natural human language interface.
What if you could use the power of LLMs to simplify your understanding of SLOs?
Understanding the reliability of a large organization is difficult, even with the right tooling and data at hand. Tracking every product or subsystem can be exhausting. Speaking of tooling, 72% of companies use six or more monitoring and observability tools. That's a lot! But as shown in the latest DORA report, reliability is an important part of software delivery and is required in order to achieve high organizational outcomes.
When an organization has multiple teams building and operating many products, each of which might be composed of many different services, the number of SLOs can be staggering.
Decision-makers know they can't just, say, average them all or get "one number" to represent such a complex system. However, they also shouldn't have to become experts in their teams' various monitoring and reporting systems. Instead, they can leverage AI to provide a human-language interface to this growing set of reliability data.
Our partner Nobl9 has developed a reliability platform based on SLOs. Last year, during SLOconf, they introduced an AI interface called SLOgpt.ai, which allows anybody to upload an arbitrary graph image, then outputs an SLO based on the line presented. Further, it lets people ask questions about the resulting SLO, Error Budget, and what should be done next.
Today, they are introducing their new Nobl9 Reliability Center, a new way to interact not only with those example SLOs, but also with data based on real services and monitoring data. For example, you could set up a service in GKE and monitor using Google Cloud Monitoring to see if the service is meeting a 99.9% availability goal by defining an SLO in Nobl9 against that data source.
Customers of Nobl9 can now interact directly with their reliability report and ask questions like:
- What is my worst SLO?
- Which SLOs are out of Error Budget?
- Which SLOs are the newest?
- Is my service reliable enough right now?
This functionality is delivered through the power of Google Cloud's VertexAI. Customers can run their applications anywhere (for example, Cloud Run), track their SLOs using Nobl9 (which runs on Google Cloud), and have the results analyzed by SLOgpt.
Our customers and partners like Nobl9 are leveraging Google's tech to provide reliable services to their customers around the world.
Learn more about SLOs and SLOgpt
- SLOgpt.ai
- Learn more about SLOs: servicelevelobjectives.com
- Learn how Google SRE uses SLOs: The SRE book - Implementing SLOs
- Visit Nobl9
- Check out Google Cloud's VertexAI and PaLM2
- Learn more about Google Cloud’s generative AI partner ecosystem