Bug 2010366

Summary: OpenShift Alerting Rules Style-Guide Compliance
Product: OpenShift Container Platform Reporter: Brad Ison <brad.ison>
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED DEFERRED QA Contact: Anping Li <anli>
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.10CC: aos-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: logging-exploration, logging-core
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-12 13:37:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Brad Ison 2021-10-04 14:01:36 UTC
Hello,

The OpenShift Monitoring Team has published a set guidelines for
writing alerting rules in OpenShift, including a basic style guide.
You can find these here:

  https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md
  https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide

A subset of these are now being enforced in OpenShift End-to-End
tests [1], with temporary exceptions for existing non-compliant rules.

This component was found to have the following issues:

* Alerts found to not include a namespace label:

  - ElasticsearchDiskSpaceRunningLow
  - ElasticsearchNodeDiskWatermarkReached

Alerts SHOULD include a namespace label indicating the alert's source.

This requirement originally comes from our SRE team, as they use the
namespace label as the first means of routing alerts. Many alerts
already include a namespace label as a result of the PromQL
expressions used, others may require a static label.

Example of a change to PromQL to include a namespace label:

  https://github.com/openshift/cluster-monitoring-operator/commit/52d1f05#diff-9024dcef0fd244c0267c46858da24fbd1f45633515fafae0f98781b20805ff1dL22-R22

Example of adding a static namespace label:

  https://github.com/openshift/cluster-monitoring-operator/commit/52d1f05#diff-352702e71122d34a1be04c0588356cd8cb8a10df547f1c3c39fec18fa75b1593R304

If you have questions about how to best to modify your alerting rules
to include a namespace label, please reach out to the OpenShift
Monitoring Team in the #forum-monitoring channel on Slack, or on our
mailing list: team-monitoring

Thank you!

Repo: openshift/elasticsearch-operator

[1]: https://github.com/openshift/origin/commit/097e7a6

Comment 1 Jeff Cantrill 2021-10-12 13:37:25 UTC
Closing in favor of https://issues.redhat.com/browse/LOG-1822 given OpenShift Logging post OCP 4.6 tracks in JIRA