Bug 2037721 - PodDisruptionBudgetAtLimit alert fired in SNO cluster
Summary: PodDisruptionBudgetAtLimit alert fired in SNO cluster
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.11.0
Assignee: Omer Tuchfeld
QA Contact: Ross Brattain
URL:
Whiteboard:
: 1974183 (view as bug list)
Depends On:
Blocks: 2057961
TreeView+ depends on / blocked
 
Reported: 2022-01-06 12:31 UTC by Junqi Zhao
Modified: 2022-08-10 10:42 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: A PodDisruptionBudget to ensure OVN raft quorum was created even single node OVNKubernetes based clusters Consequence: Unhelpful PodDisruptionBudgetAtLimit alert is raised even on such clusters, where this alert is unhelpful because there is no quorum expectation from single node clusters in the first place Fix: The Cluster Network Operator will avoid creating the openshift-ovn-kubernetes/ovn-raft-quorum-guard PodDisruptionBudget on single node clusters Result: The unhelpful PodDisruptionBudgetAtLimit alert is no longer raised on OVNKubernetes based single node clusters
Clone Of:
Environment:
Last Closed: 2022-08-10 10:41:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
PodDisruptionBudgetAtLimit alert fired for openshift-ovn-kubernetes in SNO cluster (113.09 KB, image/png)
2022-01-06 12:31 UTC, Junqi Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 1307 0 None open Bug 2037721: Do not apply OVN-Kubernetes `PodDisruptionBudget` on single-node clusters 2022-02-10 15:29:11 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:42:09 UTC

Description Junqi Zhao 2022-01-06 12:31:28 UTC
Created attachment 1849250 [details]
PodDisruptionBudgetAtLimit alert fired for openshift-ovn-kubernetes in SNO cluster

Description of problem:
SNO cluster, upgrade from 4.9.13 to 4.10.0-0.nightly-2022-01-05-181126, PodDisruptionBudgetAtLimit alert fired for openshift-ovn-kubernetes
alert rule detail
**************************
        - alert: PodDisruptionBudgetAtLimit
          annotations:
            description: The pod disruption budget is at minimum disruptions allowed level.
              The number of current healthy pods is equal to desired healthy pods.
            summary: The pod disruption budget is preventing further disruption to pods.
          expr: |
            max by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_current_healthy == kube_poddisruptionbudget_status_desired_healthy)
          for: 60m
          labels:
            severity: warning
**************************
# oc -n openshift-ovn-kubernetes get pdb
NAME                    MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
ovn-raft-quorum-guard   1               N/A               0                     12h

# oc -n openshift-ovn-kubernetes get pod
NAME                   READY   STATUS    RESTARTS   AGE
ovnkube-master-flshh   6/6     Running   6          166m
ovnkube-node-g7nhr     5/5     Running   5          167m

# oc -n openshift-ovn-kubernetes get pdb ovn-raft-quorum-guard -oyaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  creationTimestamp: "2022-01-06T00:06:56Z"
  generation: 1
  name: ovn-raft-quorum-guard
  namespace: openshift-ovn-kubernetes
  ownerReferences:
  - apiVersion: operator.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: Network
    name: cluster
    uid: 3f61ab17-7664-406f-916b-1ec447e0595c
  resourceVersion: "187278"
  uid: 87c91ff4-ed0a-42ba-a392-c6fdbfd3174f
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: ovnkube-master
status:
  conditions:
  - lastTransitionTime: "2022-01-06T09:28:45Z"
    message: ""
    observedGeneration: 1
    reason: InsufficientPods
    status: "False"
    type: DisruptionAllowed
  currentHealthy: 1
  desiredHealthy: 1
  disruptionsAllowed: 0
  expectedPods: 1
  observedGeneration: 1


kube_poddisruptionbudget_status_current_healthy =1, kube_poddisruptionbudget_status_desired_healthy = 1
kube_poddisruptionbudget_status_current_healthy=kube_poddisruptionbudget_status_desired_healthy, so the alert fired

no PodDisruptionBudgetAtLimit alert fired for other pdb
# oc get pdb -A
NAMESPACE                              NAME                          MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
openshift-apiserver                    openshift-apiserver-pdb       N/A             1                 1                     12h
openshift-cluster-storage-operator     csi-snapshot-controller-pdb   N/A             1                 1                     12h
openshift-cluster-storage-operator     csi-snapshot-webhook-pdb      N/A             1                 1                     12h
openshift-image-registry               image-registry                0               N/A               1                     11h
openshift-oauth-apiserver              oauth-apiserver-pdb           N/A             1                 1                     12h
openshift-operator-lifecycle-manager   packageserver-pdb             N/A             1                 1                     12h
openshift-ovn-kubernetes               ovn-raft-quorum-guard         1               N/A               0                     12h

need to enhance the pdb in openshift-ovn-kubernetes.

Version-Release number of selected component (if applicable):
4.10.0-0.nightly-2022-01-05-181126

How reproducible:
always

Steps to Reproduce:
1. SNO cluster, login console with admin user, go to "Observe -> Alerting", check PodDisruptionBudgetAtLimit to see if there is alert fired for openshift-ovn-kubernetes
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Ben Pickard 2022-02-15 16:38:15 UTC
*** Bug 1974183 has been marked as a duplicate of this bug. ***

Comment 3 Anurag saxena 2022-02-24 21:33:19 UTC
@

Comment 5 Ross Brattain 2022-02-24 23:45:50 UTC
Verified on 4.11.0-0.nightly-2022-02-24-173451

$ oc get -n openshift-ovn-kubernetes PodDisruptionBudget
No resources found in openshift-ovn-kubernetes namespace.

$ oc get  --all-namespaces PodDisruptionBudget
NAMESPACE                              NAME                               MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
openshift-cluster-csi-drivers          gcp-pd-csi-driver-controller-pdb   N/A             1                 1                     94m
openshift-image-registry               image-registry                     0               N/A               1                     85m
openshift-operator-lifecycle-manager   packageserver-pdb                  N/A             1                 1                     102m

Comment 8 errata-xmlrpc 2022-08-10 10:41:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.