Chaos Engineering Adoption
Chaos engineering tests the resilience of complex systems, but how many tech leaders have adopted it in their organizations? Read on to find out.
One minute insights:
- Increasing system complexity is a common reason to adopt chaos engineering
- Respondents were satisfied with the blast radius of their chaos engineering experimentation but dissatisfied with the vulnerabilities uncovered
- Most use real-world or live-environment testing and choose the system access level to introduce failures
- Respondents called out improving MTTR as a top benefit of chaos engineering, while fear of causing disruptions is a common challenge
- Respondents prefer that software engineers have experience in chaos engineering
Many engineering leaders are deploying chaos engineering as a way to manage increasing system complexity
Of those whose organizations haven’t deployed chaos engineering, one-third (33%) are in the process of doing so. 56% feel their organization should deploy chaos engineering, even though it isn’t planning to.
Increasing system complexity (68%) was the most common reason for adopting chaos engineering. Half (50%) of respondents cited lack of preparedness seen during a system failure, and 49% attributed chaos engineering adoption to unclear technical debt.
Chaos engineering is crucial for infrastructure reliability and resilience. It simulates failure to measure system robustness, helping build systems that can manage chaotic events. Proper monitoring and alerting, understanding system boundaries and defining a recovery process are all vital elements.
Would love to be able to start to incorporate it with our current Agile methodology.
Tech leaders are satisfied with the blast radius of their chaos engineering experimentation but dissatisfied with the vulnerability uncovered
Among those whose organizations have deployed chaos engineering, 63% were satisfied with their experiment blast radius. 16% were dissatisfied with the vulnerabilities they uncovered during their deployment.
There is a lot of apprehension in unintended disruption—we’ve been using a ‘warm’ instance to work out the kinks.
Having a separate environment to start testing chaos engineering makes the process significantly easier. Once you move to production, kickstart the chaos testing at off-peak times!
Most use real-world or live-environment testing and introduce failures at the system access level
Almost three-quarters (72%) of respondents use real-world or live-environment testing during chaos engineering. 63% intentionally introduce realistic bugs.
Respondents most commonly introduce system failures at the level of system access (57%), application (54%), API (53%) and virtual machines (52%).
Chaos engineering makes sense for your highly available and core customer-facing systems. Since it introduces additional cost, it only makes sense once you have certain scale and system maturity.
It is important to address the most significant weaknesses proactively, before they affect our customers in production. We need a way to manage the chaos inherent in these systems, take advantage of increasing flexibility and velocity, and have confidence in our production deployments despite the complexity that they represent. Hence we have enforced this now and are seeing how consistently we can remediate failure.”
Improving MTTR is a top benefit of chaos engineering, while fear of causing disruptions is a core challenge
Half (50%) of respondents said that improving MTTR is one of the main benefits of chaos engineering. Other top benefits included uncovering system weaknesses (46%), improving team culture (45%) and improving failure detection (44%).
62% of respondents cited the fear of causing disruptions as one of the main barriers to chaos engineering. Lacking an understanding of system steady state (49%) and skill gaps (49%) are also key challenges.
There is a steep learning curve and initial fear, but the potential improvements and results are worth the investment.
Chaos engineering is a complex principle which is very difficult for organizations to embrace completely. It feels very risky and challenging and potentially embarrassing. Organizations instead need to embrace this as a way to solve their problems, not as a way to embarrass people or create new problems.
Software engineers may need experience in chaos engineering, and most respondents think it will impact software development
60% of respondents think that chaos engineering will have a role to play in software engineering, while 20% see it becoming fundamental.
Chaos engineering is one of those once-in-a-generation frameworks that will revolutionize how software engineering should work.
My final thought on chaos engineering is that it is an essential practice for modern organizations, as it allows them to anticipate and plan for potential system failures. It is really important to ensure the chaos engineering process is structured correctly and that engineers are given the support and resources needed to execute the tests effectively.
Want more insights like this from leaders like yourself?
Click here to explore the revamped, retooled and reimagined Gartner Peer Community. You'll get access to synthesized insights and engaging discussions from a community of your peers.