Fault Injection

Learn how to inject faults into your services using SMI and Linkerd.

Fault Injection using SMI in Linkerd 🔗

Application failure injection is a form of chaos engineering where we artificially increase the error rate of certain services in a microservice application to see what impact that has on the system as a whole. Traditionally, you would need to add some kind of failure injection library into your service code in order to do application failure injection. Thankfully, the service mesh gives us a way to inject application failures without needing to modify or rebuild our services at all.

Using SMI Traffic Split API to inject errors 🔗

We can easily inject application failures by using the Traffic Split API of the Service Mesh Interface. This allows us to do failure injection in a way that is implementation agnostic and works across service meshes.

We will do this first by deploying a new service which only return errored responses. We will be using a simple NGINX service which has configured to only return HTTP 500 responses.

We will then create a traffic split which would redirect the service mesh to send a sample percentage of traffic to the error service instead, let’s say 20% of service’s traffic to error, then we would have injected an artificial 20% error rate in service.

Deploy Linkerd Books Application 🔗

We will be deploying Linkerd Books application for this part of the demo

Use meshery to deploy the bookinfo application :

In Meshery, navigate to the Linkerd adapter’s management page from the left nav menu.
On the Linkerd adapter’s management page, please enter default in the Namespace field.
Then, click the (+) icon on the Sample Application card and select Books Application from the list. Inject linkerd into sample application using

linkerd inject https://run.linkerd.io/booksapp.yml | kubectl apply -f -

In the following, one of the service has already beeen configured with the error let’s remove the error rate from the same :

kubectl edit deploy/authors

Remove the lines

- name: FAILURE_RATE
  value: "0.5"

Now if you will see linkerd stat, the success rate would be 100%

linkerd stat deploy

Create the errored service 🔗

Now we will create our error service, we have NGINX pre-configured to only respond with HTTP 500 status code

apiVersion: apps/v1
kind: Deployment
metadata:
  name: error-injector
  labels:
    app: error-injector
spec:
  selector:
    matchLabels:
      app: error-injector
  replicas: 1
  template:
    metadata:
      labels:
        app: error-injector
    spec:
      containers:
        - name: nginx
          image: nginx:alpine
          ports:
          - containerPort: 80
            name: nginx
            protocol: TCP
          volumeMounts:
            - name: nginx-config
              mountPath: /etc/nginx/nginx.conf
              subPath: nginx.conf
      volumes:
        - name: nginx-config
          configMap:
            name: error-injector-config
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: error-injector
  name: error-injector
spec:
  clusterIP: None
  ports:
  - name: service
    port: 7002
    protocol: TCP
    targetPort: nginx
  selector:
    app: error-injector
  type: ClusterIP
---
apiVersion: v1
data:
 nginx.conf: |2

    events {
        worker_connections  1024;
    }

    http {
        server {
            location / {
                return 500;
            }
        }
    }
kind: ConfigMap
metadata:
  name: error-injector-config

After deploying the above errored service, we will create a traffic split resource which will be responsible to direct 20% of the book service to the error.

apiVersion: split.smi-spec.io/v1alpha3
kind: TrafficSplit
metadata:
  name: fault-inject
spec:
  service: books
  backends:
  - service: books
    weight: 800m
  - service: error-injector
    weight: 200m

You can now see an 20% error rate for calls from webapp to books

linkerd routes deploy/webapp --to service/books

You can also see the error on the web browser

kubectl port-forward deploy/webapp 7000 && open http://localhost:7000

If you refresh page few times, you will see Internal Server Error.

Cleanup 🔗

kubectl delete trafficsplit/error-split

Remove the book info application from the Meshery Dashboard by clicking on the trash icon in the sample application card on the linkerd adapters’ page.