Bug 2037721

Summary: PodDisruptionBudgetAtLimit alert fired in SNO cluster
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: NetworkingAssignee: Omer Tuchfeld <otuchfel>
Networking sub component: ovn-kubernetes QA Contact: Ross Brattain <rbrattai>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: anusaxen, bpickard, otuchfel, rbrattai
Version: 4.10   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: A PodDisruptionBudget to ensure OVN raft quorum was created even single node OVNKubernetes based clusters Consequence: Unhelpful PodDisruptionBudgetAtLimit alert is raised even on such clusters, where this alert is unhelpful because there is no quorum expectation from single node clusters in the first place Fix: The Cluster Network Operator will avoid creating the openshift-ovn-kubernetes/ovn-raft-quorum-guard PodDisruptionBudget on single node clusters Result: The unhelpful PodDisruptionBudgetAtLimit alert is no longer raised on OVNKubernetes based single node clusters
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 10:41:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2057961    
Attachments:
Description Flags
PodDisruptionBudgetAtLimit alert fired for openshift-ovn-kubernetes in SNO cluster none

Description Junqi Zhao 2022-01-06 12:31:28 UTC
Created attachment 1849250 [details]
PodDisruptionBudgetAtLimit alert fired for openshift-ovn-kubernetes in SNO cluster

Description of problem:
SNO cluster, upgrade from 4.9.13 to 4.10.0-0.nightly-2022-01-05-181126, PodDisruptionBudgetAtLimit alert fired for openshift-ovn-kubernetes
alert rule detail
**************************
        - alert: PodDisruptionBudgetAtLimit
          annotations:
            description: The pod disruption budget is at minimum disruptions allowed level.
              The number of current healthy pods is equal to desired healthy pods.
            summary: The pod disruption budget is preventing further disruption to pods.
          expr: |
            max by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_current_healthy == kube_poddisruptionbudget_status_desired_healthy)
          for: 60m
          labels:
            severity: warning
**************************
# oc -n openshift-ovn-kubernetes get pdb
NAME                    MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
ovn-raft-quorum-guard   1               N/A               0                     12h

# oc -n openshift-ovn-kubernetes get pod
NAME                   READY   STATUS    RESTARTS   AGE
ovnkube-master-flshh   6/6     Running   6          166m
ovnkube-node-g7nhr     5/5     Running   5          167m

# oc -n openshift-ovn-kubernetes get pdb ovn-raft-quorum-guard -oyaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  creationTimestamp: "2022-01-06T00:06:56Z"
  generation: 1
  name: ovn-raft-quorum-guard
  namespace: openshift-ovn-kubernetes
  ownerReferences:
  - apiVersion: operator.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: Network
    name: cluster
    uid: 3f61ab17-7664-406f-916b-1ec447e0595c
  resourceVersion: "187278"
  uid: 87c91ff4-ed0a-42ba-a392-c6fdbfd3174f
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: ovnkube-master
status:
  conditions:
  - lastTransitionTime: "2022-01-06T09:28:45Z"
    message: ""
    observedGeneration: 1
    reason: InsufficientPods
    status: "False"
    type: DisruptionAllowed
  currentHealthy: 1
  desiredHealthy: 1
  disruptionsAllowed: 0
  expectedPods: 1
  observedGeneration: 1


kube_poddisruptionbudget_status_current_healthy =1, kube_poddisruptionbudget_status_desired_healthy = 1
kube_poddisruptionbudget_status_current_healthy=kube_poddisruptionbudget_status_desired_healthy, so the alert fired

no PodDisruptionBudgetAtLimit alert fired for other pdb
# oc get pdb -A
NAMESPACE                              NAME                          MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
openshift-apiserver                    openshift-apiserver-pdb       N/A             1                 1                     12h
openshift-cluster-storage-operator     csi-snapshot-controller-pdb   N/A             1                 1                     12h
openshift-cluster-storage-operator     csi-snapshot-webhook-pdb      N/A             1                 1                     12h
openshift-image-registry               image-registry                0               N/A               1                     11h
openshift-oauth-apiserver              oauth-apiserver-pdb           N/A             1                 1                     12h
openshift-operator-lifecycle-manager   packageserver-pdb             N/A             1                 1                     12h
openshift-ovn-kubernetes               ovn-raft-quorum-guard         1               N/A               0                     12h

need to enhance the pdb in openshift-ovn-kubernetes.

Version-Release number of selected component (if applicable):
4.10.0-0.nightly-2022-01-05-181126

How reproducible:
always

Steps to Reproduce:
1. SNO cluster, login console with admin user, go to "Observe -> Alerting", check PodDisruptionBudgetAtLimit to see if there is alert fired for openshift-ovn-kubernetes
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Ben Pickard 2022-02-15 16:38:15 UTC
*** Bug 1974183 has been marked as a duplicate of this bug. ***

Comment 3 Anurag saxena 2022-02-24 21:33:19 UTC
@

Comment 5 Ross Brattain 2022-02-24 23:45:50 UTC
Verified on 4.11.0-0.nightly-2022-02-24-173451

$ oc get -n openshift-ovn-kubernetes PodDisruptionBudget
No resources found in openshift-ovn-kubernetes namespace.

$ oc get  --all-namespaces PodDisruptionBudget
NAMESPACE                              NAME                               MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
openshift-cluster-csi-drivers          gcp-pd-csi-driver-controller-pdb   N/A             1                 1                     94m
openshift-image-registry               image-registry                     0               N/A               1                     85m
openshift-operator-lifecycle-manager   packageserver-pdb                  N/A             1                 1                     102m

Comment 8 errata-xmlrpc 2022-08-10 10:41:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069