Kube-monkey is an implementation of Netflix’s Chaos Monkey specifically for Kubernetes clusters. It works by randomly removing Kubernetes (k8s) pods in the cluster, encouraging and validating the development of failure-resilient services.
According to GitHub, kube-monkey runs at a pre-configured hour on weekdays and creates a schedule of deployments that will experience a pod death at a random point during the same day.
Additionally, kube-monkey operates on an opt-in model, meaning it will only schedule terminations for k8s apps that have explicitly agreed to have pods terminated. The opt-in model works by setting the following labels on a k8s app
- kube-monkey/enabled: Set to “enabled” to opt-in
- kube-monkey/mtbf: Number of days in between failures
- kube-monkey/identifier: A specific identifier for the k8s apps
- kube-monkey/kill-mode: Default behavior to kill one pod (options to override the default also possible)
- kube-monkey/kill-value: Specify value for kill-mode
Scheduling only occurs once a day on weekdays. During this period, kube-monkey generates a list of eligible k8s apps, flips a biased coin to determine if a pod for that k8s app should be killed, and calculates the random time when the pod will be killed if it is selected as a victim.
At the time of termination kube-monkey will check if the k8s app is still eligible, check if the app has updated kill-mode and kill value, and then execute the selected pods.