Cause:
A PodDisruptionBudget to ensure OVN raft quorum was created even single node OVNKubernetes based clusters
Consequence:
Unhelpful PodDisruptionBudgetAtLimit alert is raised even on such clusters, where this alert is unhelpful because there is no quorum expectation from single node clusters in the first place
Fix:
The Cluster Network Operator will avoid creating the openshift-ovn-kubernetes/ovn-raft-quorum-guard PodDisruptionBudget on single node clusters
Result:
The unhelpful PodDisruptionBudgetAtLimit alert is no longer raised on OVNKubernetes based single node clusters
Created attachment 1849250[details]
PodDisruptionBudgetAtLimit alert fired for openshift-ovn-kubernetes in SNO cluster
Description of problem:
SNO cluster, upgrade from 4.9.13 to 4.10.0-0.nightly-2022-01-05-181126, PodDisruptionBudgetAtLimit alert fired for openshift-ovn-kubernetes
alert rule detail
**************************
- alert: PodDisruptionBudgetAtLimit
annotations:
description: The pod disruption budget is at minimum disruptions allowed level.
The number of current healthy pods is equal to desired healthy pods.
summary: The pod disruption budget is preventing further disruption to pods.
expr: |
max by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_current_healthy == kube_poddisruptionbudget_status_desired_healthy)
for: 60m
labels:
severity: warning
**************************
# oc -n openshift-ovn-kubernetes get pdb
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
ovn-raft-quorum-guard 1 N/A 0 12h
# oc -n openshift-ovn-kubernetes get pod
NAME READY STATUS RESTARTS AGE
ovnkube-master-flshh 6/6 Running 6 166m
ovnkube-node-g7nhr 5/5 Running 5 167m
# oc -n openshift-ovn-kubernetes get pdb ovn-raft-quorum-guard -oyaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
creationTimestamp: "2022-01-06T00:06:56Z"
generation: 1
name: ovn-raft-quorum-guard
namespace: openshift-ovn-kubernetes
ownerReferences:
- apiVersion: operator.openshift.io/v1
blockOwnerDeletion: true
controller: true
kind: Network
name: cluster
uid: 3f61ab17-7664-406f-916b-1ec447e0595c
resourceVersion: "187278"
uid: 87c91ff4-ed0a-42ba-a392-c6fdbfd3174f
spec:
minAvailable: 1
selector:
matchLabels:
app: ovnkube-master
status:
conditions:
- lastTransitionTime: "2022-01-06T09:28:45Z"
message: ""
observedGeneration: 1
reason: InsufficientPods
status: "False"
type: DisruptionAllowed
currentHealthy: 1
desiredHealthy: 1
disruptionsAllowed: 0
expectedPods: 1
observedGeneration: 1
kube_poddisruptionbudget_status_current_healthy =1, kube_poddisruptionbudget_status_desired_healthy = 1
kube_poddisruptionbudget_status_current_healthy=kube_poddisruptionbudget_status_desired_healthy, so the alert fired
no PodDisruptionBudgetAtLimit alert fired for other pdb
# oc get pdb -A
NAMESPACE NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
openshift-apiserver openshift-apiserver-pdb N/A 1 1 12h
openshift-cluster-storage-operator csi-snapshot-controller-pdb N/A 1 1 12h
openshift-cluster-storage-operator csi-snapshot-webhook-pdb N/A 1 1 12h
openshift-image-registry image-registry 0 N/A 1 11h
openshift-oauth-apiserver oauth-apiserver-pdb N/A 1 1 12h
openshift-operator-lifecycle-manager packageserver-pdb N/A 1 1 12h
openshift-ovn-kubernetes ovn-raft-quorum-guard 1 N/A 0 12h
need to enhance the pdb in openshift-ovn-kubernetes.
Version-Release number of selected component (if applicable):
4.10.0-0.nightly-2022-01-05-181126
How reproducible:
always
Steps to Reproduce:
1. SNO cluster, login console with admin user, go to "Observe -> Alerting", check PodDisruptionBudgetAtLimit to see if there is alert fired for openshift-ovn-kubernetes
2.
3.
Actual results:
Expected results:
Additional info:
Verified on 4.11.0-0.nightly-2022-02-24-173451
$ oc get -n openshift-ovn-kubernetes PodDisruptionBudget
No resources found in openshift-ovn-kubernetes namespace.
$ oc get --all-namespaces PodDisruptionBudget
NAMESPACE NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
openshift-cluster-csi-drivers gcp-pd-csi-driver-controller-pdb N/A 1 1 94m
openshift-image-registry image-registry 0 N/A 1 85m
openshift-operator-lifecycle-manager packageserver-pdb N/A 1 1 102m
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2022:5069
Created attachment 1849250 [details] PodDisruptionBudgetAtLimit alert fired for openshift-ovn-kubernetes in SNO cluster Description of problem: SNO cluster, upgrade from 4.9.13 to 4.10.0-0.nightly-2022-01-05-181126, PodDisruptionBudgetAtLimit alert fired for openshift-ovn-kubernetes alert rule detail ************************** - alert: PodDisruptionBudgetAtLimit annotations: description: The pod disruption budget is at minimum disruptions allowed level. The number of current healthy pods is equal to desired healthy pods. summary: The pod disruption budget is preventing further disruption to pods. expr: | max by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_current_healthy == kube_poddisruptionbudget_status_desired_healthy) for: 60m labels: severity: warning ************************** # oc -n openshift-ovn-kubernetes get pdb NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE ovn-raft-quorum-guard 1 N/A 0 12h # oc -n openshift-ovn-kubernetes get pod NAME READY STATUS RESTARTS AGE ovnkube-master-flshh 6/6 Running 6 166m ovnkube-node-g7nhr 5/5 Running 5 167m # oc -n openshift-ovn-kubernetes get pdb ovn-raft-quorum-guard -oyaml apiVersion: policy/v1 kind: PodDisruptionBudget metadata: creationTimestamp: "2022-01-06T00:06:56Z" generation: 1 name: ovn-raft-quorum-guard namespace: openshift-ovn-kubernetes ownerReferences: - apiVersion: operator.openshift.io/v1 blockOwnerDeletion: true controller: true kind: Network name: cluster uid: 3f61ab17-7664-406f-916b-1ec447e0595c resourceVersion: "187278" uid: 87c91ff4-ed0a-42ba-a392-c6fdbfd3174f spec: minAvailable: 1 selector: matchLabels: app: ovnkube-master status: conditions: - lastTransitionTime: "2022-01-06T09:28:45Z" message: "" observedGeneration: 1 reason: InsufficientPods status: "False" type: DisruptionAllowed currentHealthy: 1 desiredHealthy: 1 disruptionsAllowed: 0 expectedPods: 1 observedGeneration: 1 kube_poddisruptionbudget_status_current_healthy =1, kube_poddisruptionbudget_status_desired_healthy = 1 kube_poddisruptionbudget_status_current_healthy=kube_poddisruptionbudget_status_desired_healthy, so the alert fired no PodDisruptionBudgetAtLimit alert fired for other pdb # oc get pdb -A NAMESPACE NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE openshift-apiserver openshift-apiserver-pdb N/A 1 1 12h openshift-cluster-storage-operator csi-snapshot-controller-pdb N/A 1 1 12h openshift-cluster-storage-operator csi-snapshot-webhook-pdb N/A 1 1 12h openshift-image-registry image-registry 0 N/A 1 11h openshift-oauth-apiserver oauth-apiserver-pdb N/A 1 1 12h openshift-operator-lifecycle-manager packageserver-pdb N/A 1 1 12h openshift-ovn-kubernetes ovn-raft-quorum-guard 1 N/A 0 12h need to enhance the pdb in openshift-ovn-kubernetes. Version-Release number of selected component (if applicable): 4.10.0-0.nightly-2022-01-05-181126 How reproducible: always Steps to Reproduce: 1. SNO cluster, login console with admin user, go to "Observe -> Alerting", check PodDisruptionBudgetAtLimit to see if there is alert fired for openshift-ovn-kubernetes 2. 3. Actual results: Expected results: Additional info: