๐๐ต๐ฎ๐ผ๐ ๐๐ป๐ด๐ถ๐ป๐ฒ๐ฒ๐ฟ๐ถ๐ป๐ด ๐๐๐ค๐
Frequently Asked Questions
According to,
I gave you the basics from what I knew. Since then, Iโve looked around the web to fill in the gaps, and these FAQs are the resultโstuff Iโve learned and questions Iโve seen floating around. Hereโs what I found!
๐๐ฟ๐ฒ๐พ๐๐ฒ๐ป๐๐น๐ ๐๐๐ธ๐ฒ๐ฑ ๐ค๐๐ฒ๐๐๐ถ๐ผ๐ป๐
๐๐ ๐๐ต๐ฎ๐ผ๐ ๐๐ป๐ด๐ถ๐ป๐ฒ๐ฒ๐ฟ๐ถ๐ป๐ด ๐๐ฒ๐๐ข๐ฝ๐?
DevOps teams are increasingly turning towards chaos engineering to build more resilient software or systems to solve issues before they cause incidents.
๐ช๐ต๐ผโ๐ ๐๐ต๐ฒ ๐ฏ๐ฒ๐๐ ๐๐ฒ๐ ๐ผ๐ณ ๐ฝ๐ฒ๐ผ๐ฝ๐น๐ฒ ๐๐ผ ๐๐๐ฎ๐ฟ๐ ๐น๐ผ๐ผ๐ธ๐ถ๐ป๐ด ๐ถ๐ป๐๐ผ ๐๐ต๐ฎ๐ผ๐ ๐๐ป๐ด๐ถ๐ป๐ฒ๐ฒ๐ฟ๐ถ๐ป๐ด ๐ถ๐ป ๐ฎ ๐๐ฒ๐ฎ๐บ?
The easiest way to kick off chaos engineering is to launch a program and pick a championโsomeone with solid software engineering skills and a love for chaos, whether theyโre a new hire or not. Everything else can be figured out along the way!
๐ช๐ต๐ฎ๐ ๐ถ๐ ๐ฎ๐ป ๐๐ ๐ฎ๐บ๐ฝ๐น๐ฒ ๐ผ๐ณ ๐๐ต๐ฎ๐ผ๐ ๐๐ป๐ด๐ถ๐ป๐ฒ๐ฒ๐ฟ๐ถ๐ป๐ด?
Chaos engineering is the practice of intentionally injecting faults into a system to test its resilience. A great example of chaos engineering is randomly terminating a critical microservice in a distributed system to evaluate how the system adapts and maintains functionality under unexpected failure conditions.
๐ช๐ต๐ ๐๐ผ ๐ช๐ฒ ๐ฃ๐ฟ๐ฎ๐ฐ๐๐ถ๐ฐ๐ฒ ๐๐ต๐ฎ๐ผ๐ ๐๐ป๐ด๐ถ๐ป๐ฒ๐ฒ๐ฟ๐ถ๐ป๐ด?
The goal is to identify potential failure points and correct them before they cause an actual outage or other disruption. By proactively testing how a system responds under stress, we can identify and fix failures before they end up in the news.
๐ช๐ต๐ฎ๐โ๐ ๐ฌ๐ผ๐๐ฟ ๐๐ฑ๐๐ถ๐ฐ๐ฒ ๐ณ๐ผ๐ฟ ๐๐ผ๐บ๐ฝ๐ฎ๐ป๐ถ๐ฒ๐ ๐๐ต๐ฎ๐ ๐ฆ๐ฎ๐, โ๐ช๐ฒ ๐๐ฟ๐ฒ ๐ก๐ผ๐ ๐ฅ๐ฒ๐ฎ๐ฑ๐ ๐๐ผ ๐ฅ๐๐ป ๐๐ต๐ฎ๐ผ๐ ๐๐ ๐ฝ๐ฒ๐ฟ๐ถ๐บ๐ฒ๐ป๐๐ ๐ถ๐ป ๐ฃ๐ฟ๐ผ๐ฑ๐๐ฐ๐๐ถ๐ผ๐ปโ?
When companies tell me theyโre not ready for Chaos Experiments in production, it often stems from a fear of uncertaintyโthey donโt know what will happen in production because theyโve never done Chaos Experiments in any environment. Just like with code, starting Chaos Engineering in a development environment, then moving to staging will prepare you for what will happen in production and allow you to take less risk. And thereโs a lot that you can learn about your applications by testing in development and staging.
But ultimately you need to graduate to production.
๐ช๐ต๐ฎ๐ ๐ถ๐ ๐ฎ ๐๐ต๐ฎ๐ผ๐ ๐๐ฎ๐?
A Chaos Day is an event that runs over one or more days where teams can explore how their service responds to failures safely. During a Chaos Day, teams design and run controlled experiments in pre-production or production environments.
๐ช๐ต๐ ๐ฑ๐ผ ๐๐ฒ ๐ผ๐ฟ๐ด๐ฎ๐ป๐ถ๐๐ฒ ๐๐ต๐ฎ๐ผ๐ ๐๐ฎ๐๐?
Chaos Days provide a focal event for team to practice chaos engineering. They are especially useful to teams that might be less familiar with this discipline.
๐ช๐ต๐ฎ๐ ๐ฎ๐ฟ๐ฒ ๐๐ผ๐บ๐ฒ ๐๐ผ๐ฝ ๐๐ถ๐ฝ๐ ๐ณ๐ผ๐ฟ ๐๐ต๐ฎ๐ผ๐ ๐๐ฎ๐๐?
Start small, with one or two teams and a few experiments, not tens of teams and tens of experiments. This allows to adapt and learn how to run a Chaos Day in specific context, before scaling out to multiple teams and many experiments.
"Chaos Engineering: Because breaking stuff on purpose is the best way to prove it wonโt break by accident."
๐๐
๐ฝ๐น๐ผ๐ฟ๐ฒ & ๐๐ผ๐ป๐ป๐ฒ๐ฐ๐
Kolton Andrus, Ask Me Anything about Chaos Engineering