1992536 – all the alert rules' annotations "summary" and "description" should comply with the OpenShift alerting guidelines

Bug 1992536 - all the alert rules' annotations "summary" and "description" should comply with the OpenShift alerting guidelines

Summary: all the alert rules' annotations "summary" and "description" should comply wi...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-scheduler
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Ross Peoples
QA Contact:	RamaKasturi
Docs Contact:
URL:
Whiteboard:	LifecycleStale
Depends On:	2010354
Blocks:
TreeView+	depends on / blocked

Reported:	2021-08-11 09:37 UTC by hongyan li
Modified:	2022-08-03 12:52 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-03 12:52:56 UTC
Target Upstream Version:
Embargoed:
Flags:	hongyli: needinfo-

Attachments	(Terms of Use)

Description hongyan li 2021-08-11 09:37:45 UTC

Description of problem:
all the alert rules'  annotations "summary" and "description"  should comply with the OpenShift alerting guidelines

Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-08-07-175228

How reproducible:
always

Steps to Reproduce:
1.
2.
3.

Actual results:
$ oc get prometheusrules -n openshift-kube-scheduler-operator -oyaml
apiVersion: v1
items:
- apiVersion: monitoring.coreos.com/v1
  kind: PrometheusRule
  metadata:
    annotations:
      exclude.release.openshift.io/internal-openshift-hosted: "true"
      include.release.openshift.io/self-managed-high-availability: "true"
      include.release.openshift.io/single-node-developer: "true"
    creationTimestamp: "2021-08-10T23:12:04Z"
    generation: 1
    name: kube-scheduler-operator
    namespace: openshift-kube-scheduler-operator
    ownerReferences:
    - apiVersion: config.openshift.io/v1
      kind: ClusterVersion
      name: version
      uid: 9fc7b5b6-6c23-4335-be07-ecfe1b9a142f
    resourceVersion: "1798"
    uid: 5b4fb182-09ca-4606-98d2-cd2db004e218
  spec:
    groups:
    - name: cluster-version
      rules:
      - alert: KubeSchedulerDown
        annotations:
          message: KubeScheduler has disappeared from Prometheus target discovery.
        expr: |
          absent(up{job="scheduler"} == 1)
        for: 15m
        labels:
          severity: critical
    - name: scheduler-legacy-policy-deprecated
      rules:
      - alert: SchedulerLegacyPolicySet
        annotations:
          message: The scheduler is currently configured to use a legacy scheduler
            policy API. Use of the policy API is deprecated and removed in 4.10.
        expr: |
          cluster_legacy_scheduler_policy > 0
        for: 60m
        labels:
          severity: warning
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""



Expected results:
alert rules have annotations "summary" and "description"

Additional info:
the "summary" and "description" annotations comply with the OpenShift alerting guidelines [1]

[1] https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#documentation-required

Comment 1 Maciej Szulik 2021-08-19 12:03:14 UTC

Ross sync with Mike about those changes, he knows the code in https://github.com/openshift/cluster-kube-scheduler-operator/
While at it also check if the alerts in https://github.com/openshift/cluster-kube-controller-manager-operator/ are following these rules.

Comment 2 Michal Fojtik 2021-10-04 00:30:07 UTC

This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 3 hongyan li 2021-10-08 01:44:02 UTC

just checked, the the alert rules'  annotations "summary" and "description"  still not comply with the OpenShift alerting guidelines, it should be like this

```
- alert: KubeAPIDown
  annotations:
    summary: Target disappeared from Prometheus target discovery.
    description: KubeAPI has disappeared from Prometheus target discovery.
    runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/KubeAPIDown.md
  expr: 

```

Comment 4 Jan Chaloupka 2022-08-03 12:52:07 UTC

KCM addressed as well in https://bugzilla.redhat.com/show_bug.cgi?id=2010352

Comment 5 Jan Chaloupka 2022-08-03 12:52:56 UTC

Only critical fixes as backported to 4.9.

Note You need to log in before you can comment on or make changes to this bug.