Bug 1806640
| Summary: | PodDisruptionBudgetAtLimit and PodDisruptionBudgetLimit alerts may trigger evaluation errors | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Simon Pasquier <spasquie> |
| Component: | kube-controller-manager | Assignee: | Maciej Szulik <maszulik> |
| Status: | CLOSED ERRATA | QA Contact: | zhou ying <yinzhou> |
| Severity: | low | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.3.z | CC: | aos-bugs, lcosic, mfojtik, nmoraiti |
| Target Milestone: | --- | ||
| Target Release: | 4.5.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-07-13 17:20:43 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
*** Bug 1810947 has been marked as a duplicate of this bug. *** This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing severity from "medium" to "low". If you have further information on the current state of the bug, please update it, otherwise this bug will be automatically closed in 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. The issue still exists. Since the fix is trivial and described in the ticket, I've sent a PR. (In reply to Simon Pasquier from comment #3) > The issue still exists. Since the fix is trivial and described in the > ticket, I've sent a PR. Thanks! Moving this back to backlog. Can't reproduce the issue now with payload: 4.5.0-0.nightly-2020-05-18-012833
[root@dhcp-140-138 ~]# oc get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
cluster-monitoring-operator 0/0 0 0 7h55m
grafana 1/1 1 1 7h34m
kube-state-metrics 2/2 2 2 7h44m
[root@dhcp-140-138 ~]# oc get deployment -n openshift-cluster-version
NAME READY UP-TO-DATE AVAILABLE AGE
cluster-version-operator 0/0 0 0 8h
alert: PodDisruptionBudgetAtLimit
expr: max
by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods
== kube_poddisruptionbudget_status_desired_healthy)
for: 15m
labels:
severity: warning
annotations:
message: The pod disruption budget is preventing further disruption to pods because
it is at the minimum allowed level.
OK 19.057s ago 345.8us
alert: PodDisruptionBudgetLimit
expr: max
by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods
< kube_poddisruptionbudget_status_desired_healthy)
for: 15m
labels:
severity: critical
annotations:
message: The pod disruption budget is below the minimum number allowed pods.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 *** Bug 1940392 has been marked as a duplicate of this bug. *** |
Description of problem: PodDisruptionBudgetAtLimit and PodDisruptionBudgetLimit may trigger evaluation errors. Version-Release number of selected component (if applicable): 4.3.0 How reproducible: Always Steps to Reproduce: 1. Scale down CVO deployment from 1 to 0. 2. Scale down CMO deployment from 1 to 0 (openshift-monitoring namespace). 3. Scale kube-state-metrics deployment from 1 to 2 (openshift-monitoring namespace). 4. Open the Prometheus UI (link in the Monitoring openshift console). 5. Click the Status > Rules link and look for "PodDisruptionBudgetAtLimit" and "PodDisruptionBudgetLimit". Actual results: The alert evaluations fail with the following errors: found duplicate series for the match group {namespace="openshift-machine-config-operator", poddisruptionbudget="etcd-quorum-guard", service="kube-state-metrics"} on the right hand-side of the operation: [{__name__="kube_poddisruptionbudget_status_desired_healthy", endpoint="https-main", instance="10.131.0.3:8443", job="kube-state-metrics", namespace="openshift-machine-config-operator", pod="kube-state-metrics-777f6bf798-kq7tj", poddisruptionbudget="etcd-quorum-guard", service="kube-state-metrics"}, {__name__="kube_poddisruptionbudget_status_desired_healthy", endpoint="https-main", instance="10.129.2.11:8443", job="kube-state-metrics", namespace="openshift-machine-config-operator", pod="kube-state-metrics-777f6bf798-bzmnt", poddisruptionbudget="etcd-quorum-guard", service="kube-state-metrics"}];many-to-many matching not allowed: matching labels must be unique on one side found duplicate series for the match group {namespace="openshift-machine-config-operator", poddisruptionbudget="etcd-quorum-guard", service="kube-state-metrics"} on the right hand-side of the operation: [{__name__="kube_poddisruptionbudget_status_desired_healthy", endpoint="https-main", instance="10.131.0.3:8443", job="kube-state-metrics", namespace="openshift-machine-config-operator", pod="kube-state-metrics-777f6bf798-kq7tj", poddisruptionbudget="etcd-quorum-guard", service="kube-state-metrics"}, {__name__="kube_poddisruptionbudget_status_desired_healthy", endpoint="https-main", instance="10.129.2.11:8443", job="kube-state-metrics", namespace="openshift-machine-config-operator", pod="kube-state-metrics-777f6bf798-bzmnt", poddisruptionbudget="etcd-quorum-guard", service="kube-state-metrics"}];many-to-many matching not allowed: matching labels must be unique on one side Expected results: No alert evaluation error. Additional info: The 'on (namespace, poddisruptionbudget, service)' stanza could be omitted. For instance, PodDisruptionBudgetLimit can be rewritten: kube_poddisruptionbudget_status_expected_pods < kube_poddisruptionbudget_status_desired_healthy If you really want to drop the kube-state-metrics labels, the expression can be wrapped by the max aggregator. For instance: max by(namespace, poddisruptionbudget, service) (kube_poddisruptionbudget_status_expected_pods < kube_poddisruptionbudget_status_desired_healthy)