Pod CPU hog exec
Pod CPU hog exec is a Kubernetes pod-level chaos fault that consumes excess CPU resources of the application container. This fault applies stress on the target pods by simulating a lack of CPU for processes running on the Kubernetes application. This degrades the performance of the application.

Use cases
CPU hog exec:
- Simulates conditions where the application pods experience CPU spikes due to expected (or undesired) processes thereby testing the behaviour of application stack.
- Verifies metrics-based horizontal pod autoscaling as well as vertical autoscale, that is, demand based CPU addition.
- Facilitates scalability of nodes based on the growth beyond budgeted pods.
- Verifies the autopilot functionality of cloud managed clusters.
- Verifies multi-tenant load issues, that is, when the load increases on one container, this does not cause downtime in other containers.
Permissions required
Below is a sample Kubernetes role that defines the permissions required to execute the fault.
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: hce
  name: pod-cpu-hog-exec
spec:
  definition:
    scope: Cluster # Supports "Namespaced" mode too
permissions:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["create", "delete", "get", "list", "patch", "deletecollection", "update"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "get", "list", "patch", "update"]
  - apiGroups: [""]
    resources: ["pods/log"]
    verbs: ["get", "list", "watch"]
    - apiGroups: [""]
    resources: ["pods/exec"]
    verbs: ["get", "list", "create"]
  - apiGroups: [""]
    resources: ["deployments, statefulsets"]
    verbs: ["get", "list"]
  - apiGroups: [""]
    resources: ["replicasets, daemonsets"]
    verbs: ["get", "list"]
  - apiGroups: [""]
    resources: ["chaosEngines", "chaosExperiments", "chaosResults"]
    verbs: ["create", "delete", "get", "list", "patch", "update"]
  - apiGroups: ["batch"]
    resources: ["jobs"]
    verbs: ["create", "delete", "get", "list", "deletecollection"]
Prerequisites
- Kubernetes > 1.16 is required to execute this fault.
- The application pods should be in the running state before and after injecting chaos.
Optional tunables
| Tunable | Description | Notes | 
|---|---|---|
| CPU_CORES | Number of CPU cores subject to CPU stress. | Default: 1. For more information, go to CPU cores | 
| TOTAL_CHAOS_DURATION | Duration for which to insert chaos (in seconds). | Default: 60 s. For more information, go to duration of the chaos | 
| NODE_LABEL | Node label used to filter the target node if TARGET_NODEenvironment variable is not set. | It is mutually exclusive with the TARGET_NODEenvironment variable. If both are provided, the fault usesTARGET_NODE. For more information, go to node label. | 
| TARGET_PODS | Comma-separated list of application pod names subject to pod CPU hog. | If not provided, the fault selects the target pods randomly based on the provided appLabels. For more information, go to target specific pods | 
| TARGET_CONTAINER | Name of the target container under stress. | If this value is not provided, the fault selects the first container of the target pod. For more information, go to target specific container | 
| PODS_AFFECTED_PERC | Percentage of total pods to target. Provide numeric values. | Default: 0 (corresponds to 1 replica). For more information, go to pods affected percentage | 
| CHAOS_INJECT_COMMAND | Command to inject CPU chaos. | Default: md5sum /dev/zero. For more information, go to chaos inject command | 
| CHAOS_KILL_COMMAND | Command to kill the chaos process. | Defaults to kill (find /proc -name exe -lname '*/md5sum' 2>&1 | grep -v 'Permission denied' | awk -F/ '{print (NF-1)}'). Another useful one that generally works (in case the default doesn't) iskill -9 (ps afx | grep \"[md5sum] /dev/zero\" | awk '{print 1}' | tr '\n' ' '). In case neither works, please check whether the target pod's base image offers a shell. If yes, identify appropriate shell command to kill the chaos process. For more information, go to chaos kill command | 
| RAMP_TIME | Period to wait before injecting chaos (in seconds). | For example, 30 s. For more information, go to ramp time | 
| LIB_IMAGE | Image used to inject chaos. | Default: harness/chaos-go-runner:main-latest. For more information, go to image used by the helper pod. | 
| SEQUENCE | Sequence of chaos execution for multiple target pods. | Default: parallel. Supports serial and parallel. For more information, go to sequence of chaos execution | 
CPU cores
Number of CPU cores to target. Tune it by using the CPU_CORE environment variable.
The following YAML snippet illustrates the use of this environment variable:
# CPU cores for the stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  annotationCheck: "false"
  appinfo:
    appns: "default"
    applabel: "app=nginx"
    appkind: "deployment"
  chaosServiceAccount: litmus-admin
  experiments:
    - name: pod-cpu-hog-exec
      spec:
        components:
          env:
            # CPU cores for stress
            - name: CPU_CORES
              value: "1"
            - name: TOTAL_CHAOS_DURATION
              value: "60"
Chaos inject and kill commands
The CHAOS_INJECT_COMMAND and CHAOS_KILL_COMMAND environment variables to set the chaos inject and chaos kill commands, respectively.
Default value for CHAOS_INJECT_COMMAND is "md5sum /dev/zero" and for CHAOS_KILL_COMMAND, it is "kill (find /proc -name exe -lname '\*/md5sum' 2>&1 | grep -v 'Permission denied' | awk -F/ '\{print (NF-1)}')"
The following YAML snippet illustrates the use of these environment variables:
# provide the chaos kill, used to kill the chaos process
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  annotationCheck: "false"
  appinfo:
    appns: "default"
    applabel: "app=nginx"
    appkind: "deployment"
  chaosServiceAccount: litmus-admin
  experiments:
    - name: pod-cpu-hog-exec
      spec:
        components:
          env:
            # command to create the md5sum process to stress the cpu
            - name: CHAOS_INJECT_COMMAND
              value: "md5sum /dev/zero"
            # command to kill the md5sum process
            # alternative command: "kill -9 $(ps afx | grep \"[md5sum] /dev/zero\" | awk '{print$1}' | tr '\n' ' ')"
            - name: CHAOS_KILL_COMMAND
              value: "kill $(find /proc -name exe -lname '*/md5sum' 2>&1 | grep -v 'Permission denied' | awk -F/ '{print $(NF-1)}')"
            - name: TOTAL_CHAOS_DURATION
              value: "60"