Bug 2010352 - OpenShift Alerting Rules Style-Guide Compliance
Summary: OpenShift Alerting Rules Style-Guide Compliance
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-controller-manager
Version: 4.10
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: Filip Krepinsky
QA Contact: zhou ying
URL:
Whiteboard:
: 1992537 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-04 13:52 UTC by Brad Ison
Modified: 2022-03-10 16:16 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:16:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-controller-manager-operator pull 573 0 None open Bug 2010352: add summary, description and namespace to prometheus alerts 2021-11-16 19:55:51 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:16:43 UTC

Description Brad Ison 2021-10-04 13:52:02 UTC
Hello,

The OpenShift Monitoring Team has published a set guidelines for
writing alerting rules in OpenShift, including a basic style guide.
You can find these here:

  https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md
  https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide

A subset of these are now being enforced in OpenShift End-to-End
tests [1], with temporary exceptions for existing non-compliant rules.

This component was found to have the following issues:

* Alerts without summary and/or description annotations:

  - KubeControllerManagerDown
  - PodDisruptionBudgetAtLimit
  - PodDisruptionBudgetLimit

Alerts MUST include summary and description annotations.

Think of summary as the first line of a commit message, or an email
subject line. It should be brief but informative. The description is
the longer, more detailed explanation of the alert.

The enhancement document linked above has examples of alerts with
these annotations.


* Alerts found to not include a namespace label:

  - KubeControllerManagerDown

Alerts SHOULD include a namespace label indicating the alert's source.

This requirement originally comes from our SRE team, as they use the
namespace label as the first means of routing alerts. Many alerts
already include a namespace label as a result of the PromQL
expressions used, others may require a static label.

Example of a change to PromQL to include a namespace label:

  https://github.com/openshift/cluster-monitoring-operator/commit/52d1f05#diff-9024dcef0fd244c0267c46858da24fbd1f45633515fafae0f98781b20805ff1dL22-R22

Example of adding a static namespace label:

  https://github.com/openshift/cluster-monitoring-operator/commit/52d1f05#diff-352702e71122d34a1be04c0588356cd8cb8a10df547f1c3c39fec18fa75b1593R304

If you have questions about how to best to modify your alerting rules
to include a namespace label, please reach out to the OpenShift
Monitoring Team in the #forum-monitoring channel on Slack, or on our
mailing list: team-monitoring

Thank you!

Repo: openshift/cluster-kube-controller-manager-operator

[1]: https://github.com/openshift/origin/commit/097e7a6

Comment 1 Filip Krepinsky 2021-11-16 19:59:32 UTC
PR is up. Thanks for the explanations.

Comment 2 Filip Krepinsky 2021-11-16 20:01:35 UTC
*** Bug 1992537 has been marked as a duplicate of this bug. ***

Comment 5 zhou ying 2021-11-23 05:11:15 UTC
Confirmed with latest ocp , the issue has fixed:

[root@localhost ~]# oc get  clusterversion 
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-11-22-195410   True        False         130m    Cluster version is 4.10.0-0.nightly-2021-11-22-195410

name: PodDisruptionBudgetAtLimit
expr: max by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_current_healthy == kube_poddisruptionbudget_status_desired_healthy)
for: 1h
labels:
severity: warning
annotations:
description: The pod disruption budget is at minimum disruptions allowed level. The number of current healthy pods is equal to desired healthy pods.
summary: The pod disruption budget is preventing further disruption to pods.

Comment 8 errata-xmlrpc 2022-03-10 16:16:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.