Bug 1954790

Summary:	KCM Alert PodDisruptionBudget At and Limit do not alert with maxUnavailable or MinAvailable by percentage
Product:	OpenShift Container Platform	Reporter:	Matthew Robson <mrobson>
Component:	kube-controller-manager	Assignee:	ravig <rgudimet>
Status:	CLOSED ERRATA	QA Contact:	zhou ying <yinzhou>
Severity:	high	Docs Contact:
Priority:	high
Version:	4.6	CC:	aos-bugs, dhellmann, knarra, maszulik, mfojtik, rgudimet, steven.barre, vrutkovs, wking
Target Milestone:	---
Target Release:	4.8.0
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-07-27 23:04:34 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1968532, 1968555

Description Matthew Robson 2021-04-28 18:59:09 UTC

Description of problem:

The PodDisruptionBudgetAtLimit[1] alert looks at: kube_poddisruptionbudget_status_expected_pods == kube_poddisruptionbudget_status_desired_healthy

With maxUnavailable (and MinAvailable using a percentage), expectedPods is equal to the number of replicas:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/disruption/disruption.go#L626

With a MinAvailable int, expectedPods is equal to the actual current number of pods:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/disruption/disruption.go#L642

If you have, for example, a DC with 3 replicas and maxUnavailable = 2, desired healthy will be 1. The pdb is at its limit, but expected (3)will never equal healthy (1) so it will never fire.

Ex:
spec:
  maxUnavailable: 2
status:
  currentHealthy: 1
  desiredHealthy: 1
  disruptionsAllowed: 0
  expectedPods: 3
  observedGeneration: 1

In the first 2 cases, expectedPods can never == desiredHealthy. You would never get a PodDisruptionBudgetAtLimit alert for etcd-quorum-guard 

The same allies with the critical alert PodDisruptionBudgetLimit : kube_poddisruptionbudget_status_expected_pods < kube_poddisruptionbudget_status_desired_healthy

Expected will never be less then expected maxUnavailable (and MinAvailable using a percentage)

Is the alert wrong or should all areas of the PDB code set expected to actual running pods?

To fix the alerts, we should compare current healthy: kube_poddisruptionbudget_status_current_healthy to desired healthy

[1] https://github.com/openshift/cluster-kube-controller-manager-operator/blob/master/manifests/0000_90_kube-controller-manager-operator_05_alerts.yaml#L22


Version-Release number of selected component (if applicable):
4.6

How reproducible:


Steps to Reproduce:
1. cordon a master
2. delete an etcd quaorum guard pod
3. no alerts

Actual results:
No alerts

Expected results:
Alerts

Additional info:

Comment 4 Vadim Rutkovsky 2021-06-09 13:41:22 UTC

This caused a regression in upgrade jobs - it assumes that all master nodes must upgrade within 15 mins.

Instead this alert should use a most sophisticated metric:

count_over_time((kube_poddisruptionbudget_status_current_healthy < kube_poddisruptionbudget_status_desired_healthy)[15m:10s]) > 0

To ensure that PDB was not violated for more than 10 seconds within 15 mins window

Comment 5 Vadim Rutkovsky 2021-06-09 13:54:18 UTC

A better idea - check for `cluster_version` metric, if `type` is `updating` then the alert should not be fired

Comment 6 Matthew Robson 2021-06-09 14:20:51 UTC

Not firing the alert during upgrades would be an issue as well. That is how we found the issue with the alert.

Customer had some bad PDBs that cause the MCP rollout to hang for hours on the 4.6.25 upgrade before someone noticed. Then we realized the alerts were broken.

Matt

Comment 8 zhou ying 2021-06-11 09:31:44 UTC

Can see the alert now with the latest payload:
[root@localhost ~]# oc get clusterversion 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-06-11-024306   True        False         9m30s   Cluster version is 4.8.0-0.nightly-2021-06-11-024306


steps:
1) cordon one of the node:
[root@localhost ~]# oc adm cordon yinzhou-bug-pkv6w-master-0.c.openshift-qe.internal
node/yinzhou-bug-pkv6w-master-0.c.openshift-qe.internal cordoned
[root@localhost ~]# oc get node
NAME                                                       STATUS                     ROLES    AGE   VERSION
yinzhou-bug-pkv6w-master-0.c.openshift-qe.internal         Ready,SchedulingDisabled   master   50m   v1.21.0-rc.0+a5ec692


2) Delete one of the etcd pod:
[root@localhost ~]# oc delete po etcd-quorum-guard-b8668f655-28c4x -n openshift-etcd
pod "etcd-quorum-guard-b8668f655-28c4x" deleted
[root@localhost ~]# oc get po 
NAME                                                                   READY   STATUS      RESTARTS   AGE
etcd-quorum-guard-b8668f655-5z524                                      1/1     Running     0          49m
etcd-quorum-guard-b8668f655-ck6ps                                      0/1     Pending     0          14s


3) wait for some time , check the alert :

[root@localhost ~]# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
[root@localhost ~]# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/alerts' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4278    0  4278    0     0  97227      0 --:--:-- --:--:-- --:--:-- 97227
{
  "status": "success",
  "data": {
    "alerts": [
      {
        "labels": {
          "alertname": "KubePodNotReady",
          "namespace": "openshift-etcd",
          "pod": "etcd-quorum-guard-b8668f655-ck6ps",
          "severity": "warning"
        },
        "annotations": {
          "description": "Pod openshift-etcd/etcd-quorum-guard-b8668f655-ck6ps has been in a non-ready state for longer than 15 minutes.",
          "summary": "Pod has been in a non-ready state for more than 15 minutes."

Comment 11 errata-xmlrpc 2021-07-27 23:04:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Comment 13 Red Hat Bugzilla 2023-09-18 00:26:11 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days