Pod IO stress
Pod I/O stress is a Kubernetes pod-level chaos fault that causes I/O stress on the application pod by increasing the number of input and output requests. Applying stress on the disk with continuous and heavy I/O degrades the reads and writes with respect to the microservices. Scratch space consumed on a node may lead to lack of memory for new containers to be scheduled. All these aspects increase resilience to stress.

Use cases
Pod IO stress:
- Aims to verify the resilience of applications that share the disk resource for ephemeral (or persistent) storage.
- Simulates slower disk operations by the application.
- Simulates noisy neighbour problems by hogging the disk bandwidth.
- Verifies the disk performance on increasing I/O threads and varying I/O block sizes.
- Checks how the application functions under high disk latency conditions and when I/O traffic is very high.
- Checks how the application functions under large I/O blocks, and when other services monopolize the I/O disks.
Permissions required
Below is a sample Kubernetes role that defines the permissions required to execute the fault.
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: hce
  name: pod-io-stress
spec:
  definition:
    scope: Cluster # Supports "Namespaced" mode too
permissions:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["create", "delete", "get", "list", "patch", "deletecollection", "update"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "get", "list", "patch", "update"]
  - apiGroups: [""]
    resources: ["pods/log"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["deployments, statefulsets"]
    verbs: ["get", "list"]
  - apiGroups: [""]
    resources: ["replicasets, daemonsets"]
    verbs: ["get", "list"]
  - apiGroups: [""]
    resources: ["chaosEngines", "chaosExperiments", "chaosResults"]
    verbs: ["create", "delete", "get", "list", "patch", "update"]
  - apiGroups: ["batch"]
    resources: ["jobs"]
    verbs: ["create", "delete", "get", "list", "deletecollection"]
Prerequisites
- Kubernetes > 1.16
- The application pods should be in the running state before and after injecting chaos.
Optional tunables
| Tunable | Description | Notes | 
|---|---|---|
| FILESYSTEM_UTILIZATION_PERCENTAGE | Specifies the size as a percentage of free space on the file system. | Default: 10 %. For more information, go to file system utilization percentage | 
| FILESYSTEM_UTILIZATION_BYTES | Specifies the size in gigabytes (GB). FILESYSTEM_UTILIZATION_PERCENTAGEandFILESYSTEM_UTILIZATION_BYTESare mutually exclusive. If both the values are provided,FILESYSTEM_UTILIZATION_PERCENTAGEtakes priority. For more information, go to  file system utilization bytes | |
| NUMBER_OF_WORKERS | Number of IO workers involved in IO disk stress. | Default: 4. For more information, go to workers for stress | 
| TOTAL_CHAOS_DURATION | Duration for which to insert chaos (in seconds). | Default: 120 s. For more information, go to duration of the chaos | 
| NODE_LABEL | Node label used to filter the target node if TARGET_NODEenvironment variable is not set. | It is mutually exclusive with the TARGET_NODEenvironment variable. If both are provided, the fault usesTARGET_NODE. For more information, go to node label. | 
| VOLUME_MOUNT_PATH | Fill the given volume mount path. | For more information, go to mount path | 
| TARGET_PODS | Comma-separated list of application pod names subject to pod IO stress. | If not provided, the fault selects target pods randomly based on provided appLabels. For more information, go to target specific pods | 
| PODS_AFFECTED_PERC | Percentage of total pods to target. Provide numeric values. | Default: 0 (corresponds to 1 replica). For more information, go to pod affected percentage | 
| LIB_IMAGE | Image used to inject chaos. | Default: harness/chaos-go-runner:main-latest. For more information, go to image used by the helper pod. | 
| CONTAINER_RUNTIME | Container runtime interface for the cluster. | Default: containerd. Supports docker, containerd and crio. For more information, go to container runtime | 
| SOCKET_PATH | Path of the containerd or crio or docker socket file. | Default: /run/containerd/containerd.sockFor more information, go to socket path | 
| RAMP_TIME | Period to wait before and after injecting chaos (in seconds). | For example, 30 s. For more information, go to ramp time | 
| SEQUENCE | Sequence of chaos execution for multiple target pods. | Default: parallel. Supports serial and parallel. For more information, go to sequence of chaos execution | 
File system utilization percentage
Amount (in percentage) of free space in the pod. Tune it by using the FILESYSTEM_UTILIZATION_PERCENTAGE environment variable.
The following YAML snippet illustrates the use of this environment variable:
# stress the i/o of the targeted pod with FILESYSTEM_UTILIZATION_PERCENTAGE of total free space
# it is mutually exclusive with the FILESYSTEM_UTILIZATION_BYTES.
# if both are provided then it will use FILESYSTEM_UTILIZATION_PERCENTAGE for stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  annotationCheck: "false"
  appinfo:
    appns: "default"
    applabel: "app=nginx"
    appkind: "deployment"
  chaosServiceAccount: litmus-admin
  experiments:
    - name: pod-io-stress
      spec:
        components:
          env:
            # percentage of free space of file system, need to be stressed
            - name: FILESYSTEM_UTILIZATION_PERCENTAGE
              value: "10" #in GB
            - name: TOTAL_CHAOS_DURATION
              VALUE: "60"
File system utilization bytes
Amount of free space available in the pod in gigabytes (GB). Tune it by using the FILESYSTEM_UTILIZATION_BYTES environment variable.
FILESYSTEM_UTILIZATION_PERCENTAGE and FILESYSTEM_UTILIZATION_BYTES environment variables are mutually exclusive. If both the values are provided, FILESYSTEM_UTILIZATION_PERCENTAGE takes priority.
The following YAML snippet illustrates the use of this environment variable:
# stress the i/o of the targeted pod with given FILESYSTEM_UTILIZATION_BYTES
# it is mutually exclusive with the FILESYSTEM_UTILIZATION_PERCENTAGE.
# if both are provided then it will use FILESYSTEM_UTILIZATION_PERCENTAGE for stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  annotationCheck: "false"
  appinfo:
    appns: "default"
    applabel: "app=nginx"
    appkind: "deployment"
  chaosServiceAccount: litmus-admin
  experiments:
    - name: pod-io-stress
      spec:
        components:
          env:
            # size of io to be stressed
            - name: FILESYSTEM_UTILIZATION_BYTES
              value: "1" #in GB
            - name: TOTAL_CHAOS_DURATION
              VALUE: "60"
Container runtime and socket path
The CONTAINER_RUNTIME and SOCKET_PATH environment variables to set the container runtime and socket file path, respectively.
- CONTAINER_RUNTIME: It supports- docker,- containerd, and- crioruntimes. The default value is- containerd.
- SOCKET_PATH: It contains path of containerd socket file by default(- /run/containerd/containerd.sock). For- docker, specify path as- /var/run/docker.sock. For- crio, specify path as- /var/run/crio/crio.sock.
The following YAML snippet illustrates the use of this environment variable:
## provide the container runtime and socket file path
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  annotationCheck: "false"
  appinfo:
    appns: "default"
    applabel: "app=nginx"
    appkind: "deployment"
  chaosServiceAccount: litmus-admin
  experiments:
    - name: pod-io-stress
      spec:
        components:
          env:
            # runtime for the container
            # supports docker, containerd, crio
            - name: CONTAINER_RUNTIME
              value: "containerd"
            # path of the socket file
            - name: SOCKET_PATH
              value: "/run/containerd/containerd.sock"
            - name: TOTAL_CHAOS_DURATION
              VALUE: "60"
Mount path
Volume mount path that is to be filled. Tune it by using the VOLUME_MOUNT_PATH environment variable.
The following YAML snippet illustrates the use of this environment variable:
# provide the volume mount path, which needs to be filled
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  annotationCheck: "false"
  appinfo:
    appns: "default"
    applabel: "app=nginx"
    appkind: "deployment"
  chaosServiceAccount: litmus-admin
  experiments:
    - name: pod-io-stress
      spec:
        components:
          env:
            # path need to be stressed/filled
            - name: VOLUME_MOUNT_PATH
              value: "/some-dir-in-container"
            - name: TOTAL_CHAOS_DURATION
              VALUE: "60"
Workers for stress
Number of workers for the stress. Tune it by using the NUMBER_OF_WORKERS environment variable.
The following YAML snippet illustrates the use of this environment variable:
# number of workers for the stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  annotationCheck: "false"
  appinfo:
    appns: "default"
    applabel: "app=nginx"
    appkind: "deployment"
  chaosServiceAccount: litmus-admin
  experiments:
    - name: pod-io-stress
      spec:
        components:
          env:
            # number of io workers
            - name: NUMBER_OF_WORKERS
              value: "4"
            - name: TOTAL_CHAOS_DURATION
              VALUE: "60"