Fault Injection
Fault Injection using SMI in Linkerd 🔗
Application failure injection is a form of chaos engineering where we artificially increase the error rate of certain services in a microservice application to see what impact that has on the system as a whole. Traditionally, you would need to add some kind of failure injection library into your service code in order to do application failure injection. Thankfully, the service mesh gives us a way to inject application failures without needing to modify or rebuild our services at all.
Using SMI Traffic Split API to inject errors 🔗
We can easily inject application failures by using the Traffic Split API of the Service Mesh Interface. This allows us to do failure injection in a way that is implementation agnostic and works across service meshes.
We will do this first by deploying a new service which only return errored responses. We will be using a simple NGINX service which has configured to only return HTTP 500 responses.
We will then create a traffic split which would redirect the service mesh to send a sample percentage of traffic to the error service instead, let’s say 20% of service’s traffic to error, then we would have injected an artificial 20% error rate in service.
Deploy Linkerd Books Application 🔗
We will be deploying Linkerd Books application for this part of the demo
Use meshery to deploy the bookinfo application :
- In Meshery, navigate to the Linkerd adapter’s management page from the left nav menu.
- On the Linkerd adapter’s management page, please enter default in the Namespace field.
- Then, click the (+) icon on the Sample Application card and select Books Application from the list. Inject linkerd into sample application using
linkerd inject https://run.linkerd.io/booksapp.yml | kubectl apply -f -
In the following, one of the service has already beeen configured with the error let’s remove the error rate from the same :
kubectl edit deploy/authors
Remove the lines
- name: FAILURE_RATE
value: "0.5"
Now if you will see linkerd stat, the success rate would be 100%
linkerd stat deploy
Create the errored service 🔗
Now we will create our error service, we have NGINX pre-configured to only respond with HTTP 500 status code
apiVersion: apps/v1
kind: Deployment
metadata:
name: error-injector
labels:
app: error-injector
spec:
selector:
matchLabels:
app: error-injector
replicas: 1
template:
metadata:
labels:
app: error-injector
spec:
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80
name: nginx
protocol: TCP
volumeMounts:
- name: nginx-config
mountPath: /etc/nginx/nginx.conf
subPath: nginx.conf
volumes:
- name: nginx-config
configMap:
name: error-injector-config
---
apiVersion: v1
kind: Service
metadata:
labels:
app: error-injector
name: error-injector
spec:
clusterIP: None
ports:
- name: service
port: 7002
protocol: TCP
targetPort: nginx
selector:
app: error-injector
type: ClusterIP
---
apiVersion: v1
data:
nginx.conf: |2
events {
worker_connections 1024;
}
http {
server {
location / {
return 500;
}
}
}
kind: ConfigMap
metadata:
name: error-injector-config
After deploying the above errored service, we will create a traffic split resource which will be responsible to direct 20% of the book service to the error.
apiVersion: split.smi-spec.io/v1alpha3
kind: TrafficSplit
metadata:
name: fault-inject
spec:
service: books
backends:
- service: books
weight: 800m
- service: error-injector
weight: 200m
You can now see an 20% error rate for calls from webapp to books
linkerd routes deploy/webapp --to service/books
You can also see the error on the web browser
kubectl port-forward deploy/webapp 7000 && open http://localhost:7000
If you refresh page few times, you will see Internal Server Error.
Cleanup 🔗
kubectl delete trafficsplit/error-split
Remove the book info application from the Meshery Dashboard by clicking on the trash icon in the sample application card on the linkerd adapters’ page.