Bug 2057957

Summary: PodDisruptionBudgetAtLimit alert fired in SNO cluster
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: NetworkingAssignee: Omer Tuchfeld <otuchfel>
Networking sub component: ovn-kubernetes QA Contact: Ross Brattain <rbrattai>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: bpickard, hongyli, otuchfel, rbrattai
Version: 4.9   
Target Milestone: ---   
Target Release: 4.9.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-04-08 09:56:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2057961    
Bug Blocks: 2068895    

Description OpenShift BugZilla Robot 2022-02-24 08:50:49 UTC
+++ This bug was initially created as a clone of Bug #2037721 +++

Created attachment 1849250 [details]
PodDisruptionBudgetAtLimit alert fired for openshift-ovn-kubernetes in SNO cluster

Description of problem:
SNO cluster, upgrade from 4.9.13 to 4.10.0-0.nightly-2022-01-05-181126, PodDisruptionBudgetAtLimit alert fired for openshift-ovn-kubernetes
alert rule detail
**************************
        - alert: PodDisruptionBudgetAtLimit
          annotations:
            description: The pod disruption budget is at minimum disruptions allowed level.
              The number of current healthy pods is equal to desired healthy pods.
            summary: The pod disruption budget is preventing further disruption to pods.
          expr: |
            max by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_current_healthy == kube_poddisruptionbudget_status_desired_healthy)
          for: 60m
          labels:
            severity: warning
**************************
# oc -n openshift-ovn-kubernetes get pdb
NAME                    MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
ovn-raft-quorum-guard   1               N/A               0                     12h

# oc -n openshift-ovn-kubernetes get pod
NAME                   READY   STATUS    RESTARTS   AGE
ovnkube-master-flshh   6/6     Running   6          166m
ovnkube-node-g7nhr     5/5     Running   5          167m

# oc -n openshift-ovn-kubernetes get pdb ovn-raft-quorum-guard -oyaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  creationTimestamp: "2022-01-06T00:06:56Z"
  generation: 1
  name: ovn-raft-quorum-guard
  namespace: openshift-ovn-kubernetes
  ownerReferences:
  - apiVersion: operator.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: Network
    name: cluster
    uid: 3f61ab17-7664-406f-916b-1ec447e0595c
  resourceVersion: "187278"
  uid: 87c91ff4-ed0a-42ba-a392-c6fdbfd3174f
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: ovnkube-master
status:
  conditions:
  - lastTransitionTime: "2022-01-06T09:28:45Z"
    message: ""
    observedGeneration: 1
    reason: InsufficientPods
    status: "False"
    type: DisruptionAllowed
  currentHealthy: 1
  desiredHealthy: 1
  disruptionsAllowed: 0
  expectedPods: 1
  observedGeneration: 1


kube_poddisruptionbudget_status_current_healthy =1, kube_poddisruptionbudget_status_desired_healthy = 1
kube_poddisruptionbudget_status_current_healthy=kube_poddisruptionbudget_status_desired_healthy, so the alert fired

no PodDisruptionBudgetAtLimit alert fired for other pdb
# oc get pdb -A
NAMESPACE                              NAME                          MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
openshift-apiserver                    openshift-apiserver-pdb       N/A             1                 1                     12h
openshift-cluster-storage-operator     csi-snapshot-controller-pdb   N/A             1                 1                     12h
openshift-cluster-storage-operator     csi-snapshot-webhook-pdb      N/A             1                 1                     12h
openshift-image-registry               image-registry                0               N/A               1                     11h
openshift-oauth-apiserver              oauth-apiserver-pdb           N/A             1                 1                     12h
openshift-operator-lifecycle-manager   packageserver-pdb             N/A             1                 1                     12h
openshift-ovn-kubernetes               ovn-raft-quorum-guard         1               N/A               0                     12h

need to enhance the pdb in openshift-ovn-kubernetes.

Version-Release number of selected component (if applicable):
4.10.0-0.nightly-2022-01-05-181126

How reproducible:
always

Steps to Reproduce:
1. SNO cluster, login console with admin user, go to "Observe -> Alerting", check PodDisruptionBudgetAtLimit to see if there is alert fired for openshift-ovn-kubernetes
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from bpickard on 2022-02-15 16:38:15 UTC ---

*** Bug 1974183 has been marked as a duplicate of this bug. ***

Comment 1 Omer Tuchfeld 2022-02-28 15:00:38 UTC
*** Bug 2058515 has been marked as a duplicate of this bug. ***

Comment 2 Ross Brattain 2022-03-24 18:48:34 UTC
4.9.0-0.ci.test-2022-03-24-165325-ci-ln-0hccmpt-latest

# oc get  --all-namespaces PodDisruptionBudget
NAMESPACE                              NAME                                MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
openshift-apiserver                    openshift-apiserver-pdb             N/A             1                 1                     85m
openshift-cluster-csi-drivers          aws-ebs-csi-driver-controller-pdb   N/A             1                 1                     85m
openshift-cluster-storage-operator     csi-snapshot-controller-pdb         N/A             1                 1                     85m
openshift-cluster-storage-operator     csi-snapshot-webhook-pdb            N/A             1                 1                     85m
openshift-image-registry               image-registry                      0               N/A               1                     80m
openshift-oauth-apiserver              oauth-apiserver-pdb                 N/A             1                 1                     85m
openshift-operator-lifecycle-manager   packageserver-pdb                   N/A             1                 1                     90m

# oc get -n openshift-ovn-kubernetes PodDisruptionBudget
No resources found in openshift-ovn-kubernetes namespace.

Comment 4 Ross Brattain 2022-03-28 02:28:45 UTC
Verified.

Comment 7 errata-xmlrpc 2022-04-08 09:56:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.9.27 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1158