Bug 1992536

Summary: all the alert rules' annotations "summary" and "description" should comply with the OpenShift alerting guidelines
Product: OpenShift Container Platform Reporter: hongyan li <hongyli>
Component: kube-schedulerAssignee: Ross Peoples <rpeoples>
Status: CLOSED WONTFIX QA Contact: RamaKasturi <knarra>
Severity: medium Docs Contact:
Priority: low    
Version: 4.9CC: aos-bugs, jchaloup, mfojtik
Target Milestone: ---Flags: hongyli: needinfo-
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: LifecycleStale
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-03 12:52:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2010354    
Bug Blocks:    

Description hongyan li 2021-08-11 09:37:45 UTC
Description of problem:
all the alert rules'  annotations "summary" and "description"  should comply with the OpenShift alerting guidelines

Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-08-07-175228

How reproducible:
always

Steps to Reproduce:
1.
2.
3.

Actual results:
$ oc get prometheusrules -n openshift-kube-scheduler-operator -oyaml
apiVersion: v1
items:
- apiVersion: monitoring.coreos.com/v1
  kind: PrometheusRule
  metadata:
    annotations:
      exclude.release.openshift.io/internal-openshift-hosted: "true"
      include.release.openshift.io/self-managed-high-availability: "true"
      include.release.openshift.io/single-node-developer: "true"
    creationTimestamp: "2021-08-10T23:12:04Z"
    generation: 1
    name: kube-scheduler-operator
    namespace: openshift-kube-scheduler-operator
    ownerReferences:
    - apiVersion: config.openshift.io/v1
      kind: ClusterVersion
      name: version
      uid: 9fc7b5b6-6c23-4335-be07-ecfe1b9a142f
    resourceVersion: "1798"
    uid: 5b4fb182-09ca-4606-98d2-cd2db004e218
  spec:
    groups:
    - name: cluster-version
      rules:
      - alert: KubeSchedulerDown
        annotations:
          message: KubeScheduler has disappeared from Prometheus target discovery.
        expr: |
          absent(up{job="scheduler"} == 1)
        for: 15m
        labels:
          severity: critical
    - name: scheduler-legacy-policy-deprecated
      rules:
      - alert: SchedulerLegacyPolicySet
        annotations:
          message: The scheduler is currently configured to use a legacy scheduler
            policy API. Use of the policy API is deprecated and removed in 4.10.
        expr: |
          cluster_legacy_scheduler_policy > 0
        for: 60m
        labels:
          severity: warning
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""



Expected results:
alert rules have annotations "summary" and "description"

Additional info:
the "summary" and "description" annotations comply with the OpenShift alerting guidelines [1]

[1] https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#documentation-required

Comment 1 Maciej Szulik 2021-08-19 12:03:14 UTC
Ross sync with Mike about those changes, he knows the code in https://github.com/openshift/cluster-kube-scheduler-operator/
While at it also check if the alerts in https://github.com/openshift/cluster-kube-controller-manager-operator/ are following these rules.

Comment 2 Michal Fojtik 2021-10-04 00:30:07 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 3 hongyan li 2021-10-08 01:44:02 UTC
just checked, the the alert rules'  annotations "summary" and "description"  still not comply with the OpenShift alerting guidelines, it should be like this

```
- alert: KubeAPIDown
  annotations:
    summary: Target disappeared from Prometheus target discovery.
    description: KubeAPI has disappeared from Prometheus target discovery.
    runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/KubeAPIDown.md
  expr: 

```

Comment 4 Jan Chaloupka 2022-08-03 12:52:07 UTC
KCM addressed as well in https://bugzilla.redhat.com/show_bug.cgi?id=2010352

Comment 5 Jan Chaloupka 2022-08-03 12:52:56 UTC
Only critical fixes as backported to 4.9.