Jump to Content
DevOps & SRE

Success through culture: why embracing failure encourages better software delivery

October 27, 2023
https://storage.googleapis.com/gweb-cloudblog-publish/images/partners_2022.max-2500x2500.jpg
James Pashutinski

Digital Transformation Consultant

Things break. That’s life. When things don’t go as planned, it’s what happens next that’s important.

Internal research by Google, and by Google’s DevOps Research and Assessment (DORA) organization, shows that teams that encourage a culture of trust — one that allows for questioning, risk-taking and mistakes — perform better. The way an organization responds to opportunity is a big part of its culture. And for software delivery and overall team effectiveness, equally important is how an organization responds to failure.

By adopting specific behaviors and ways of working that encourage resilience, we can increase our teams’ effectiveness and achieve better organizational performance.

How do we know what drives effective software teams and organizational cultures?

At Google, we not only make a lot of technology, we also study how technology gets made.

DORA is an academically and statistically rigorous research program that seeks to answer the questions: “How does technology help organizations succeed, and how do we get better at software delivery and operations?”

Internal research projects across hundreds of Google teams, such as Project Aristotle, have also allowed us to study the drivers of highly effective teams.

In this blog series, we’ve taken years of this Google research and are distilling down the findings into five dimensions that you can apply to drive success within your own organization:

  1. Resilience (the focus for this blog)
  2. Communication
  3. Collaboration
  4. Innovation
  5. Empowerment

Let’s jump in, and consider what resilience is, how it improves performance, and how your team can get more of it.

Resilience: congratulations on the failure, now let’s investigate.

For the purposes of this blog, when we talk about resilience we are referring to cultural resilience. In this context, we define resilience as the ability of teams to take smart risks, share failures openly and continuously improve based on feedback. Teams that exhibit resilience are demonstratively more successful than teams who don’t. This idea that a culture with resilient characteristics can drive desirable organizational outcomes isn’t new. Sociologist Dr. Ron Westrum’s study of how culture influences team behavior when things go wrong typified three distinct organizational cultures, and cultures in which failure led to inquiry, rather than justice or scapegoating, were found to be more performance-oriented. Westrum referred to these as “generative” cultures.

This research has been reinforced by our DORA findings since the first State of DevOps Report was published in 2014. Our 2023 Accelerate State of DevOps Report demonstrates that the presence of a generative culture continues to predict higher software delivery and organizational performance. We believe this is because, at its core, DevOps is fundamentally about people and the ways those people work. And people drive culture.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Graphic_2.max-1700x1700.png

Source: DORA 2023 Accelerate State of DevOps Report

Take, for example, security development practices. Our research found organizations with high-trust, resilient cultures are 1.6x more likely to have above-average adoption of emerging security practices than those who did not. We believe these generative traits, including aspects of resilience, may lead to a more desirable security posture due to their influence on teams’ ways of working. For example, generative organizations may be more likely to actively minimize the inconvenience or risk associated with reporting security issues by fostering an atmosphere of “blamelessness,” among other things. The bottom line is, if you want to improve your organization’s security posture (and beyond), consider evaluating your team’s culture first.

We can further break resilience down into two additional mindsets:

  1. Launching and iterating: getting started, gathering feedback and continuously improving
  2. Psychological safety: a shared belief that a team is safe for interpersonal risk-taking

Launching and iterating: perfect is the enemy of good.

Would you be comfortable sharing an idea with your leadership if it were only 20% formulated?

Part of resilience is gathering input and continuously improving. Our research shows that teams who adopt a mindset of continuous improvement perform better. This includes starting quickly, adapting to changing circumstances, and experimenting.

For example, in the context of software delivery, DORA research supports the philosophy of continuous delivery so that software is always in a releasable state. Maintaining this “golden” state requires creating mechanisms for fast feedback and rapidly recovering from failures. We’ve found that teams that prioritize these feedback mechanisms have better software delivery performance. Our research has also found that working in small batches improves the way teams receive and use such feedback, as well as the ability to recover from failure, among other things.

Launching and iterating is not only about improving the software that you ship. It’s also about a teams’ more general ability to self-assess, pivot, and adopt new ways of working when it makes sense based on the data. Inevitably, this experimentation will include both successes and failures. In each case, teams stand to learn valuable lessons.

Psychological safety: celebrating failure as success

Would you be comfortable openly failing on your team?

Extensive research inside Google found that psychological safety provides a critical foundation for highly effective teams. In general, our research demonstrates that who is on a team matters less than how team members interact when it comes to predicting team effectiveness.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Graphic_3.max-700x700.png

In order of importance, Google researchers found these five variables were what mattered most when it came to team effectiveness. Source: Google re:Work Guide: Understand team effectiveness

Project Aristotle examined hundreds of Google teams to answer the question “what makes a team effective?” Statistical analysis of the resulting data revealed the most important team dynamic is psychological safety, or creating an environment where taking smart risks is encouraged. An environment where members trust they will not embarrass or punish each other for ideas, questions or mistakes. Further DORA analysis found that these practices also benefit teams outside of Google, uncovering that a culture of psychological safety is broadly predictive of better software delivery performance, organizational performance and productivity.

It’s important to remember that culture flows downstream from leadership. DORA research shows that effective leadership has a measurable, significant impact on software delivery outcomes. If we want to foster a blameless, psychologically safe environment, leaders must provide their teams with the necessary trust, voice, and opportunities to experiment and fail.

How can you practice being resilient?

Adopting a mindset of continuous improvement can help you achieve better organizational performance. Likewise, embracing psychological safety within your organization may help your teams work more effectively. This is what we mean when we say using resilience to drive success through culture.

So, what does resilience look like when it is applied practically in our behaviors and reinforced through our daily work?

We can continuously improve by launching early, defining success metrics, gathering input (including through crowdsourcing), and taking what we learn to heart, both to improve our products and the way we work. This ability can be underpinned by technical practices such as continuous integration, automated testing, continuous delivery and monitoring, to name a few. These practices provide the foundation and guardrails that allow for safe, rapid iteration and reliability.

We can also normalize failure by conducting both “premortems” (anticipating the myriad ways an idea may fail), and “blameless postmortems'' — candid conversations about times when things haven’t gone according to plan and what could be done to improve, without assigning blame. For example, we’ve found that teams who leverage reliability practices, including blameless postmortems, report higher productivity and job satisfaction, and lower levels of burnout, than their counterparts who use more traditional operations approaches. We suspect this is because, among other things, a sustained fear of making mistakes can lead to poor well-being.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Graphic_4.png.max-1900x1900.png

Blameless postmortems help prevent issues from recurring, help avoid multiplying complexity, and allow you to learn from mistakes and those of others.

These ways of working are exemplified by our latest Google Cloud DevOps Award winners. These organizations have demonstrated how they are implementing these and other practices to drive organizational success and elite performance. For example, consider how one company leveraged cross-functional teams to remove bottlenecks, address blockers, and improve communication — the focus of our next blog in this series.

In the meantime, be prepared for failure as you experiment with new ways of working, including new approaches to software delivery, operations and beyond. And ask yourself, how will you react next time something goes wrong? To learn more, take the DevOps Quick Check and read the latest State of DevOps Report, both at dora.dev.

Posted in