1806640 – PodDisruptionBudgetAtLimit and PodDisruptionBudgetLimit alerts may trigger evaluation errors

Bug 1806640 - PodDisruptionBudgetAtLimit and PodDisruptionBudgetLimit alerts may trigger evaluation errors

Summary: PodDisruptionBudgetAtLimit and PodDisruptionBudgetLimit alerts may trigger ev...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-controller-manager
Sub Component:
Version:	4.3.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	low
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Maciej Szulik
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1810947 1940392 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-02-24 16:42 UTC by Simon Pasquier
Modified:	2021-03-22 08:53 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-07-13 17:20:43 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-kube-controller-manager-operator pull 414	0	None	closed	Bug 1806640: fix potential errors in Prometheus alerts	2021-01-09 09:53:34 UTC
Red Hat Product Errata	RHBA-2020:2409	0	None	None	None	2020-07-13 17:21:02 UTC

Description Simon Pasquier 2020-02-24 16:42:24 UTC

Description of problem:
PodDisruptionBudgetAtLimit and PodDisruptionBudgetLimit may trigger evaluation errors.

Version-Release number of selected component (if applicable):
4.3.0

How reproducible:
Always

Steps to Reproduce:
1. Scale down CVO deployment from 1 to 0.
2. Scale down CMO deployment from 1 to 0 (openshift-monitoring namespace).
3. Scale kube-state-metrics deployment from 1 to 2 (openshift-monitoring namespace).
4. Open the Prometheus UI (link in the Monitoring openshift console).
5. Click the Status > Rules link and look for "PodDisruptionBudgetAtLimit" and "PodDisruptionBudgetLimit".

Actual results:
The alert evaluations fail with the following errors:

found duplicate series for the match group {namespace="openshift-machine-config-operator", poddisruptionbudget="etcd-quorum-guard", service="kube-state-metrics"} on the right hand-side of the operation: [{__name__="kube_poddisruptionbudget_status_desired_healthy", endpoint="https-main", instance="10.131.0.3:8443", job="kube-state-metrics", namespace="openshift-machine-config-operator", pod="kube-state-metrics-777f6bf798-kq7tj", poddisruptionbudget="etcd-quorum-guard", service="kube-state-metrics"}, {__name__="kube_poddisruptionbudget_status_desired_healthy", endpoint="https-main", instance="10.129.2.11:8443", job="kube-state-metrics", namespace="openshift-machine-config-operator", pod="kube-state-metrics-777f6bf798-bzmnt", poddisruptionbudget="etcd-quorum-guard", service="kube-state-metrics"}];many-to-many matching not allowed: matching labels must be unique on one side

found duplicate series for the match group {namespace="openshift-machine-config-operator", poddisruptionbudget="etcd-quorum-guard", service="kube-state-metrics"} on the right hand-side of the operation: [{__name__="kube_poddisruptionbudget_status_desired_healthy", endpoint="https-main", instance="10.131.0.3:8443", job="kube-state-metrics", namespace="openshift-machine-config-operator", pod="kube-state-metrics-777f6bf798-kq7tj", poddisruptionbudget="etcd-quorum-guard", service="kube-state-metrics"}, {__name__="kube_poddisruptionbudget_status_desired_healthy", endpoint="https-main", instance="10.129.2.11:8443", job="kube-state-metrics", namespace="openshift-machine-config-operator", pod="kube-state-metrics-777f6bf798-bzmnt", poddisruptionbudget="etcd-quorum-guard", service="kube-state-metrics"}];many-to-many matching not allowed: matching labels must be unique on one side

Expected results:
No alert evaluation error.

Additional info:
The 'on (namespace, poddisruptionbudget, service)' stanza could be omitted. For instance, PodDisruptionBudgetLimit can be rewritten:

kube_poddisruptionbudget_status_expected_pods < kube_poddisruptionbudget_status_desired_healthy

If you really want to drop the kube-state-metrics labels, the expression can be wrapped by the max aggregator. For instance:

max by(namespace, poddisruptionbudget, service) (kube_poddisruptionbudget_status_expected_pods < kube_poddisruptionbudget_status_desired_healthy)

Comment 1 Simon Pasquier 2020-03-10 13:48:23 UTC

*** Bug 1810947 has been marked as a duplicate of this bug. ***

Comment 2 Michal Fojtik 2020-05-12 10:32:23 UTC

This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

As such, we're marking this bug as "LifecycleStale" and decreasing severity from "medium" to "low".

If you have further information on the current state of the bug, please update it, otherwise this bug will be automatically closed in 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

Comment 3 Simon Pasquier 2020-05-12 12:08:58 UTC

The issue still exists. Since the fix is trivial and described in the ticket, I've sent a PR.

Comment 4 Michal Fojtik 2020-05-14 11:51:18 UTC

(In reply to Simon Pasquier from comment #3)
> The issue still exists. Since the fix is trivial and described in the
> ticket, I've sent a PR.

Thanks! Moving this back to backlog.

Comment 7 zhou ying 2020-05-18 13:47:41 UTC

Can't reproduce the issue now with payload: 4.5.0-0.nightly-2020-05-18-012833

[root@dhcp-140-138 ~]# oc get deployment
NAME                          READY   UP-TO-DATE   AVAILABLE   AGE
cluster-monitoring-operator   0/0     0            0           7h55m
grafana                       1/1     1            1           7h34m
kube-state-metrics            2/2     2            2           7h44m


[root@dhcp-140-138 ~]# oc get deployment -n openshift-cluster-version
NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
cluster-version-operator   0/0     0            0           8h


alert: PodDisruptionBudgetAtLimit
expr: max
  by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods
  == kube_poddisruptionbudget_status_desired_healthy)
for: 15m
labels:
  severity: warning
annotations:
  message: The pod disruption budget is preventing further disruption to pods because
    it is at the minimum allowed level.
OK		19.057s ago	345.8us
alert: PodDisruptionBudgetLimit
expr: max
  by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods
  < kube_poddisruptionbudget_status_desired_healthy)
for: 15m
labels:
  severity: critical
annotations:
  message: The pod disruption budget is below the minimum number allowed pods.

Comment 9 errata-xmlrpc 2020-07-13 17:20:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Comment 10 Jan Chaloupka 2021-03-22 08:53:57 UTC

*** Bug 1940392 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.