+++ This bug was initially created as a clone of Bug #1858008 +++ Description of problem: KubePodCrashLooping is alerting on critical severity. As of current best practices this should be on a warning level instead since it's a cause based alert rather than a symptom based alert Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Cause a crashloop 2. 3. Actual results: `severity: critical` Expected results: `severity: warning` Additional info: this has been fixed upstream [1] already and needs to be implemented into cluster monitoring [1] https://github.com/kubernetes-monitoring/kubernetes-mixin/commit/050dedeba07b0ebd782beebef63f6c0168713ff3
4.6.0-0.nightly-2020-09-12-230035, time range for KubePodCrashLooping expr is 5m expr: | rate(kube_pod_container_status_restarts_total{namespace=~"(openshift-.*|kube-.*|default|logging)",job="kube-state-metrics"}[5m]) * 60 * 5 > 0 4.5.0-0.nightly-2020-09-12-063044, time range for KubePodCrashLooping expr is 15m, I think it is better to change to 5m expr: | rate(kube_pod_container_status_restarts_total{namespace=~"(openshift-.*|kube-.*|default|logging)",job="kube-state-metrics"}[15m]) * 60 * 5 > 0
4.5.0-0.nightly-2020-09-12-063044 - alert: KubePodCrashLooping annotations: message: Pod {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container }}) is restarting {{ printf "%.2f" $value }} times / 5 minutes. expr: | rate(kube_pod_container_status_restarts_total{namespace=~"(openshift-.*|kube-.*|default|logging)",job="kube-state-metrics"}[15m]) * 60 * 5 > 0 for: 15m labels: severity: warning
the fix is in 4.5.0-0.nightly-2020-10-31-200727, since we had verified it with the not merged PR, move it to VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5.18 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4425