Bug 1954790 - KCM Alert PodDisruptionBudget At and Limit do not alert with maxUnavailable or MinAvailable by percentage [NEEDINFO]
Summary: KCM Alert PodDisruptionBudget At and Limit do not alert with maxUnavailable o...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-controller-manager
Version: 4.6
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: 4.8.0
Assignee: ravig
QA Contact: zhou ying
URL:
Whiteboard:
Depends On:
Blocks: 1968532 1968555
TreeView+ depends on / blocked
 
Reported: 2021-04-28 18:59 UTC by Matthew Robson
Modified: 2021-09-13 16:27 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:04:34 UTC
Target Upstream Version:
mrobson: needinfo? (rgudimet)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-controller-manager-operator pull 527 0 None Merged Bug 1954790: Use appropriate metric for PDB alerts 2022-08-01 11:48:28 UTC
Github openshift cluster-kube-controller-manager-operator pull 534 0 None Merged Bug 1954790: pdb: Increase PDBAtLimit timeout 2022-08-01 11:48:29 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:05:00 UTC

Description Matthew Robson 2021-04-28 18:59:09 UTC
Description of problem:

The PodDisruptionBudgetAtLimit[1] alert looks at: kube_poddisruptionbudget_status_expected_pods == kube_poddisruptionbudget_status_desired_healthy

With maxUnavailable (and MinAvailable using a percentage), expectedPods is equal to the number of replicas:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/disruption/disruption.go#L626

With a MinAvailable int, expectedPods is equal to the actual current number of pods:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/disruption/disruption.go#L642

If you have, for example, a DC with 3 replicas and maxUnavailable = 2, desired healthy will be 1. The pdb is at its limit, but expected (3)will never equal healthy (1) so it will never fire.

Ex:
spec:
  maxUnavailable: 2
status:
  currentHealthy: 1
  desiredHealthy: 1
  disruptionsAllowed: 0
  expectedPods: 3
  observedGeneration: 1

In the first 2 cases, expectedPods can never == desiredHealthy. You would never get a PodDisruptionBudgetAtLimit alert for etcd-quorum-guard 

The same allies with the critical alert PodDisruptionBudgetLimit : kube_poddisruptionbudget_status_expected_pods < kube_poddisruptionbudget_status_desired_healthy

Expected will never be less then expected maxUnavailable (and MinAvailable using a percentage)

Is the alert wrong or should all areas of the PDB code set expected to actual running pods?

To fix the alerts, we should compare current healthy: kube_poddisruptionbudget_status_current_healthy to desired healthy

[1] https://github.com/openshift/cluster-kube-controller-manager-operator/blob/master/manifests/0000_90_kube-controller-manager-operator_05_alerts.yaml#L22


Version-Release number of selected component (if applicable):
4.6

How reproducible:


Steps to Reproduce:
1. cordon a master
2. delete an etcd quaorum guard pod
3. no alerts

Actual results:
No alerts

Expected results:
Alerts

Additional info:

Comment 4 Vadim Rutkovsky 2021-06-09 13:41:22 UTC
This caused a regression in upgrade jobs - it assumes that all master nodes must upgrade within 15 mins.

Instead this alert should use a most sophisticated metric:

count_over_time((kube_poddisruptionbudget_status_current_healthy < kube_poddisruptionbudget_status_desired_healthy)[15m:10s]) > 0

To ensure that PDB was not violated for more than 10 seconds within 15 mins window

Comment 5 Vadim Rutkovsky 2021-06-09 13:54:18 UTC
A better idea - check for `cluster_version` metric, if `type` is `updating` then the alert should not be fired

Comment 6 Matthew Robson 2021-06-09 14:20:51 UTC
Not firing the alert during upgrades would be an issue as well. That is how we found the issue with the alert.

Customer had some bad PDBs that cause the MCP rollout to hang for hours on the 4.6.25 upgrade before someone noticed. Then we realized the alerts were broken.

Matt

Comment 8 zhou ying 2021-06-11 09:31:44 UTC
Can see the alert now with the latest payload:
[root@localhost ~]# oc get clusterversion 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-06-11-024306   True        False         9m30s   Cluster version is 4.8.0-0.nightly-2021-06-11-024306


steps:
1) cordon one of the node:
[root@localhost ~]# oc adm cordon yinzhou-bug-pkv6w-master-0.c.openshift-qe.internal
node/yinzhou-bug-pkv6w-master-0.c.openshift-qe.internal cordoned
[root@localhost ~]# oc get node
NAME                                                       STATUS                     ROLES    AGE   VERSION
yinzhou-bug-pkv6w-master-0.c.openshift-qe.internal         Ready,SchedulingDisabled   master   50m   v1.21.0-rc.0+a5ec692


2) Delete one of the etcd pod:
[root@localhost ~]# oc delete po etcd-quorum-guard-b8668f655-28c4x -n openshift-etcd
pod "etcd-quorum-guard-b8668f655-28c4x" deleted
[root@localhost ~]# oc get po 
NAME                                                                   READY   STATUS      RESTARTS   AGE
etcd-quorum-guard-b8668f655-5z524                                      1/1     Running     0          49m
etcd-quorum-guard-b8668f655-ck6ps                                      0/1     Pending     0          14s


3) wait for some time , check the alert :

[root@localhost ~]# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
[root@localhost ~]# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/alerts' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4278    0  4278    0     0  97227      0 --:--:-- --:--:-- --:--:-- 97227
{
  "status": "success",
  "data": {
    "alerts": [
      {
        "labels": {
          "alertname": "KubePodNotReady",
          "namespace": "openshift-etcd",
          "pod": "etcd-quorum-guard-b8668f655-ck6ps",
          "severity": "warning"
        },
        "annotations": {
          "description": "Pod openshift-etcd/etcd-quorum-guard-b8668f655-ck6ps has been in a non-ready state for longer than 15 minutes.",
          "summary": "Pod has been in a non-ready state for more than 15 minutes."

Comment 11 errata-xmlrpc 2021-07-27 23:04:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.