Azure Chaos Studio – Chaos Engineering in the Cloud

If you’re looking to stress test your application on Azure, then Azure Chaos Studio is a tool you’ll want to check out. In this blog post, we’ll give an overview of what Azure Chaos Studio is and some of its key features. We’ll also discuss the benefits of stress testing your application with Azure Chaos Studio and how to get started.

Overview of Azure Chaos Studio and Its Core Features.

Azure Chaos Studio is a cloud-based tool that helps developers stress test their applications on Azure. It provides a set of tools and services that can be used to generate load on an application, monitor application performance, and identify potential vulnerabilities. Describe the Key Features of Azure Chaos Studio.

Azure Chaos Studio provides a number of features that are designed to help developers stress test their applications on Azure. These features include:

– A web-based interface that makes it easy to setup and configure tests

– A library of pre-built tests that can be used to generate load on an application

– A set of tools for monitoring application performance during a test

– A reporting system that helps identify potential vulnerabilities in an application

Getting started in Azure

Navigate to portal.azure.com and type in the search bar “Chaos”

You’ll select Chaos Studio you can see the Chaos Experiments/Targets Management is a part of the studio we will jump into that area.

Next we have to Onboard Resources

You can target resources throughout multiple subscriptions (tenants) this is known as your Targets

Then navigate to Enable Targets

I’ve provisioned a AKS cluster to test the services you’ll navigate to the area “Manage Action”

The blade will pop out and look like this for Service Direct Capabilities

You can also see what is available as a agent this part is greyed out right now.

Next we will navigate to the Experiments page to create our experiment on our targets that were on-boarded.

We will go to Create and go through the wizard

We are most interested in the Experiment designer portion this essentially will look familiar if you spend time in the Logic Apps Designer UI

We are going to use a YAML to JSON Configuration to get the experiment visit this reference

https://chaos-mesh.org/docs/simulate-pod-chaos-on-kubernetes/#create-experiments-using-yaml-configuration-files

Using the docs we can see the default YAML provided we are taking the action portion and we can customize this

This will be our jsonSpec to help and then go to target resources and select our cluster.

Second Step

Then let’s add a delay as shown you can lessen this if you’d have a specific SLI you’re trying to achieve.

In another tab Set up Chaos Mesh on AKS Cluster by logging into your cluster

We need to give our experiment permission to your AKS cluster

Navigate back to our AKS cluster IAM portion

Hit Add Role Assignment then we are going to assign Azure Kubernetes Cluster Admin Role to the experiment that we are creating

Assign to what we named the experiment

Once you review and assign and this finishes the operation let’s navigate back to Chaos Studio – > Experiments and we will see the selections below

This will prompt you the banner message you’ll see

So…. let’s wait about 10 minutes for the delay to view the results.

If we go the metrics in monitoring on the resource and add our AKS cluster with the “restartingContainerCount(preview)” we can see our experiment actions.

We are looking at Active Pod Count

We then look at the impacts of the experiment by drilling down on the number of pods in ready state through the period of time

Lowest point is 27 and we only start to recover to 30 from that shortly after so we can start to see how we are self-healing

For adding the Chaos Mesh on the node directly follow the steps below

If you want to the add the Chaos Mesh via helm on the cluster management plane grab your aks credentials.

Run the following code

helm repo add chaos-mesh https://charts.chaos-mesh.org
helm repo update
kubectl create ns chaos-testing
helm install chaos-mesh chaos-mesh/chaos-mesh --namespace=chaos-testing --set chaosDaemon.runtime=containerd --set chaosDaemon.socketPath=/run/containerd/containerd.sock
kubectl get po -n chaos-testing

Verify this is running in our namespace.

Benefits of Stress Testing Your Application with Azure Chaos Studio.

One of the main benefits of using Azure Chaos Studio to stress test your application is that it can help you identify potential vulnerabilities in your code. By running your application under intense load, you can see how it responds and identify any areas where it breaks down. This information can be used to improve the overall stability of your application.

Determining Application Resilience.

Another benefit of stress testing with Azure Chaos Studio is that it can help you determine the resilience of your application. By subjecting your application to extreme conditions, you can see how well it holds up and identify any areas where it needs improvement. This information can be used to make your application more resilient in the face of future challenges.

Improving Overall Performance.

Finally, stress testing with Azure Chaos Studio can also help you improve the overall performance of your application. By identifying bottlenecks and other issues, you can make changes to improve efficiency and speed up your application. This can result in a better user experience for those who use your software.

When this tool came out initially I was actually surprised because it was the first I heard of the concept in a hosted form at least from a cloud service provider at the time.

Conclusion

As we’ve seen, Azure Chaos Studio is a powerful tool that can help you stress test your application on Azure. By running stress tests and analyzing the results, you can identify potential vulnerabilities, determine resilience, and improve performance. So if you’re looking to get the most out of your application on Azure, don’t hesitate to give Azure Chaos Studio a try. As a first-party native and you could use as you need Azure monitor is pretty robust but like anything measure cost of the service as you adopt.