Description of problem: The PodDisruptionBudgetAtLimit[1] alert looks at: kube_poddisruptionbudget_status_expected_pods == kube_poddisruptionbudget_status_desired_healthy With maxUnavailable (and MinAvailable using a percentage), expectedPods is equal to the number of replicas: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/disruption/disruption.go#L626 With a MinAvailable int, expectedPods is equal to the actual current number of pods: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/disruption/disruption.go#L642 If you have, for example, a DC with 3 replicas and maxUnavailable = 2, desired healthy will be 1. The pdb is at its limit, but expected (3)will never equal healthy (1) so it will never fire. Ex: spec: maxUnavailable: 2 status: currentHealthy: 1 desiredHealthy: 1 disruptionsAllowed: 0 expectedPods: 3 observedGeneration: 1 In the first 2 cases, expectedPods can never == desiredHealthy. You would never get a PodDisruptionBudgetAtLimit alert for etcd-quorum-guard The same allies with the critical alert PodDisruptionBudgetLimit : kube_poddisruptionbudget_status_expected_pods < kube_poddisruptionbudget_status_desired_healthy Expected will never be less then expected maxUnavailable (and MinAvailable using a percentage) Is the alert wrong or should all areas of the PDB code set expected to actual running pods? To fix the alerts, we should compare current healthy: kube_poddisruptionbudget_status_current_healthy to desired healthy [1] https://github.com/openshift/cluster-kube-controller-manager-operator/blob/master/manifests/0000_90_kube-controller-manager-operator_05_alerts.yaml#L22 Version-Release number of selected component (if applicable): 4.6 How reproducible: Steps to Reproduce: 1. cordon a master 2. delete an etcd quaorum guard pod 3. no alerts Actual results: No alerts Expected results: Alerts Additional info:
This caused a regression in upgrade jobs - it assumes that all master nodes must upgrade within 15 mins. Instead this alert should use a most sophisticated metric: count_over_time((kube_poddisruptionbudget_status_current_healthy < kube_poddisruptionbudget_status_desired_healthy)[15m:10s]) > 0 To ensure that PDB was not violated for more than 10 seconds within 15 mins window
A better idea - check for `cluster_version` metric, if `type` is `updating` then the alert should not be fired
Not firing the alert during upgrades would be an issue as well. That is how we found the issue with the alert. Customer had some bad PDBs that cause the MCP rollout to hang for hours on the 4.6.25 upgrade before someone noticed. Then we realized the alerts were broken. Matt
Can see the alert now with the latest payload: [root@localhost ~]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-06-11-024306 True False 9m30s Cluster version is 4.8.0-0.nightly-2021-06-11-024306 steps: 1) cordon one of the node: [root@localhost ~]# oc adm cordon yinzhou-bug-pkv6w-master-0.c.openshift-qe.internal node/yinzhou-bug-pkv6w-master-0.c.openshift-qe.internal cordoned [root@localhost ~]# oc get node NAME STATUS ROLES AGE VERSION yinzhou-bug-pkv6w-master-0.c.openshift-qe.internal Ready,SchedulingDisabled master 50m v1.21.0-rc.0+a5ec692 2) Delete one of the etcd pod: [root@localhost ~]# oc delete po etcd-quorum-guard-b8668f655-28c4x -n openshift-etcd pod "etcd-quorum-guard-b8668f655-28c4x" deleted [root@localhost ~]# oc get po NAME READY STATUS RESTARTS AGE etcd-quorum-guard-b8668f655-5z524 1/1 Running 0 49m etcd-quorum-guard-b8668f655-ck6ps 0/1 Pending 0 14s 3) wait for some time , check the alert : [root@localhost ~]# token=`oc sa get-token prometheus-k8s -n openshift-monitoring` [root@localhost ~]# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/alerts' | jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 4278 0 4278 0 0 97227 0 --:--:-- --:--:-- --:--:-- 97227 { "status": "success", "data": { "alerts": [ { "labels": { "alertname": "KubePodNotReady", "namespace": "openshift-etcd", "pod": "etcd-quorum-guard-b8668f655-ck6ps", "severity": "warning" }, "annotations": { "description": "Pod openshift-etcd/etcd-quorum-guard-b8668f655-ck6ps has been in a non-ready state for longer than 15 minutes.", "summary": "Pod has been in a non-ready state for more than 15 minutes."
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days