Description of problem: PodDisruptionBudgetAtLimit and PodDisruptionBudgetLimit may trigger evaluation errors. Version-Release number of selected component (if applicable): 4.3.0 How reproducible: Always Steps to Reproduce: 1. Scale down CVO deployment from 1 to 0. 2. Scale down CMO deployment from 1 to 0 (openshift-monitoring namespace). 3. Scale kube-state-metrics deployment from 1 to 2 (openshift-monitoring namespace). 4. Open the Prometheus UI (link in the Monitoring openshift console). 5. Click the Status > Rules link and look for "PodDisruptionBudgetAtLimit" and "PodDisruptionBudgetLimit". Actual results: The alert evaluations fail with the following errors: found duplicate series for the match group {namespace="openshift-machine-config-operator", poddisruptionbudget="etcd-quorum-guard", service="kube-state-metrics"} on the right hand-side of the operation: [{__name__="kube_poddisruptionbudget_status_desired_healthy", endpoint="https-main", instance="10.131.0.3:8443", job="kube-state-metrics", namespace="openshift-machine-config-operator", pod="kube-state-metrics-777f6bf798-kq7tj", poddisruptionbudget="etcd-quorum-guard", service="kube-state-metrics"}, {__name__="kube_poddisruptionbudget_status_desired_healthy", endpoint="https-main", instance="10.129.2.11:8443", job="kube-state-metrics", namespace="openshift-machine-config-operator", pod="kube-state-metrics-777f6bf798-bzmnt", poddisruptionbudget="etcd-quorum-guard", service="kube-state-metrics"}];many-to-many matching not allowed: matching labels must be unique on one side found duplicate series for the match group {namespace="openshift-machine-config-operator", poddisruptionbudget="etcd-quorum-guard", service="kube-state-metrics"} on the right hand-side of the operation: [{__name__="kube_poddisruptionbudget_status_desired_healthy", endpoint="https-main", instance="10.131.0.3:8443", job="kube-state-metrics", namespace="openshift-machine-config-operator", pod="kube-state-metrics-777f6bf798-kq7tj", poddisruptionbudget="etcd-quorum-guard", service="kube-state-metrics"}, {__name__="kube_poddisruptionbudget_status_desired_healthy", endpoint="https-main", instance="10.129.2.11:8443", job="kube-state-metrics", namespace="openshift-machine-config-operator", pod="kube-state-metrics-777f6bf798-bzmnt", poddisruptionbudget="etcd-quorum-guard", service="kube-state-metrics"}];many-to-many matching not allowed: matching labels must be unique on one side Expected results: No alert evaluation error. Additional info: The 'on (namespace, poddisruptionbudget, service)' stanza could be omitted. For instance, PodDisruptionBudgetLimit can be rewritten: kube_poddisruptionbudget_status_expected_pods < kube_poddisruptionbudget_status_desired_healthy If you really want to drop the kube-state-metrics labels, the expression can be wrapped by the max aggregator. For instance: max by(namespace, poddisruptionbudget, service) (kube_poddisruptionbudget_status_expected_pods < kube_poddisruptionbudget_status_desired_healthy)
*** Bug 1810947 has been marked as a duplicate of this bug. ***
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing severity from "medium" to "low". If you have further information on the current state of the bug, please update it, otherwise this bug will be automatically closed in 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.
The issue still exists. Since the fix is trivial and described in the ticket, I've sent a PR.
(In reply to Simon Pasquier from comment #3) > The issue still exists. Since the fix is trivial and described in the > ticket, I've sent a PR. Thanks! Moving this back to backlog.
Can't reproduce the issue now with payload: 4.5.0-0.nightly-2020-05-18-012833 [root@dhcp-140-138 ~]# oc get deployment NAME READY UP-TO-DATE AVAILABLE AGE cluster-monitoring-operator 0/0 0 0 7h55m grafana 1/1 1 1 7h34m kube-state-metrics 2/2 2 2 7h44m [root@dhcp-140-138 ~]# oc get deployment -n openshift-cluster-version NAME READY UP-TO-DATE AVAILABLE AGE cluster-version-operator 0/0 0 0 8h alert: PodDisruptionBudgetAtLimit expr: max by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods == kube_poddisruptionbudget_status_desired_healthy) for: 15m labels: severity: warning annotations: message: The pod disruption budget is preventing further disruption to pods because it is at the minimum allowed level. OK 19.057s ago 345.8us alert: PodDisruptionBudgetLimit expr: max by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods < kube_poddisruptionbudget_status_desired_healthy) for: 15m labels: severity: critical annotations: message: The pod disruption budget is below the minimum number allowed pods.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409
*** Bug 1940392 has been marked as a duplicate of this bug. ***