Bug 1992555 - all the alert rules' annotations "summary" and "description" should comply with the OpenShift alerting guidelines
Summary: all the alert rules' annotations "summary" and "description" should comply wi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.9.0
Assignee: Luigi Mario Zuccarelli
QA Contact: jechen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-11 10:04 UTC by hongyan li
Modified: 2022-08-04 22:39 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:45:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-dns-operator pull 288 0 None None None 2021-08-17 11:42:50 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:46:09 UTC

Description hongyan li 2021-08-11 10:04:21 UTC
Description of problem:
all the alert rules'  annotations "summary" and "description"  should comply with the OpenShift alerting guidelines

Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-08-07-175228

How reproducible:
always

Steps to Reproduce:
1.
2.
3.

Actual results:
$ oc get prometheusrules -n openshift-dns-operator -oyaml
apiVersion: v1
items:
- apiVersion: monitoring.coreos.com/v1
  kind: PrometheusRule
  metadata:
    annotations:
      include.release.openshift.io/ibm-cloud-managed: "true"
      include.release.openshift.io/self-managed-high-availability: "true"
      include.release.openshift.io/single-node-developer: "true"
    creationTimestamp: "2021-08-10T23:12:04Z"
    generation: 1
    labels:
      role: alert-rules
    name: dns
    namespace: openshift-dns-operator
    ownerReferences:
    - apiVersion: config.openshift.io/v1
      kind: ClusterVersion
      name: version
      uid: 9fc7b5b6-6c23-4335-be07-ecfe1b9a142f
    resourceVersion: "1797"
    uid: 8ae3dbc8-4839-4be2-a7e6-543fbc64f26a
  spec:
    groups:
    - name: openshift-dns.rules
      rules:
      - alert: CoreDNSPanicking
        annotations:
          message: '{{ $value }} CoreDNS panics observed on {{ $labels.instance }}'
        expr: increase(coredns_panics_total[10m]) > 0
        for: 5m
        labels:
          severity: warning
      - alert: CoreDNSHealthCheckSlow
        annotations:
          message: CoreDNS Health Checks are slowing down (instance {{ $labels.instance
            }})
        expr: histogram_quantile(.95, sum(rate(coredns_health_request_duration_seconds_bucket[5m]))
          by (instance, le)) > 10
        for: 5m
        labels:
          severity: warning
      - alert: CoreDNSErrorsHigh
        annotations:
          message: CoreDNS is returning SERVFAIL for {{ $value | humanizePercentage
            }} of requests.
        expr: |
          (sum(rate(coredns_dns_responses_total{rcode="SERVFAIL"}[5m]))
            /
          sum(rate(coredns_dns_responses_total[5m])))
          > 0.01
        for: 5m
        labels:
          severity: warning
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""


Expected results:
alert rules have annotations "summary" and "description"

Additional info:
the "summary" and "description" annotations comply with the OpenShift alerting guidelines [1]

[1] https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#documentation-required

Comment 2 jechen 2021-08-20 14:34:07 UTC
Verified in 4.9.0-0.nightly-2021-08-20-074005

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-08-20-074005   True        False         9m12s   Cluster version is 4.9.0-0.nightly-2021-08-20-074005


$ oc get prometheusrules -n openshift-dns-operator -oyaml
apiVersion: v1
items:
- apiVersion: monitoring.coreos.com/v1
  kind: PrometheusRule
  metadata:
    annotations:
      include.release.openshift.io/ibm-cloud-managed: "true"
      include.release.openshift.io/self-managed-high-availability: "true"
      include.release.openshift.io/single-node-developer: "true"
    creationTimestamp: "2021-08-20T13:54:51Z"
    generation: 1
    labels:
      role: alert-rules
    name: dns
    namespace: openshift-dns-operator
    ownerReferences:
    - apiVersion: config.openshift.io/v1
      kind: ClusterVersion
      name: version
      uid: 11507a0e-dc7b-4870-9791-9d54090536f9
    resourceVersion: "1716"
    uid: 45426d29-bf82-41c8-b9ae-d38b1c016da6
  spec:
    groups:
    - name: openshift-dns.rules
      rules:
      - alert: CoreDNSPanicking
        annotations:
          description: '{{ $value }} CoreDNS panics observed on {{ $labels.instance     <--verified fix by https://github.com/openshift/cluster-dns-operator/pull/288/files
            }}'
          summary: CoreDNS panic                                                        <--verified fix by https://github.com/openshift/cluster-dns-operator/pull/288/files
        expr: increase(coredns_panics_total[10m]) > 0
        for: 5m
        labels:
          severity: warning
      - alert: CoreDNSHealthCheckSlow
        annotations:
          description: CoreDNS Health Checks are slowing down (instance {{ $labels.instance  <--verified fix by https://github.com/openshift/cluster-dns-operator/pull/288/files
            }})
          summary: CoreDNS health checks                                                     <--verified fix by https://github.com/openshift/cluster-dns-operator/pull/288/files
        expr: histogram_quantile(.95, sum(rate(coredns_health_request_duration_seconds_bucket[5m]))
          by (instance, le)) > 10
        for: 5m
        labels:
          severity: warning
      - alert: CoreDNSErrorsHigh
        annotations:
          description: CoreDNS is returning SERVFAIL for {{ $value | humanizePercentage     <--verified fix by https://github.com/openshift/cluster-dns-operator/pull/288/files
            }} of requests.
          summary: CoreDNS serverfail                                                       <--verified fix by https://github.com/openshift/cluster-dns-operator/pull/288/files
        expr: |
          (sum(rate(coredns_dns_responses_total{rcode="SERVFAIL"}[5m]))
            /
          sum(rate(coredns_dns_responses_total[5m])))
          > 0.01
        for: 5m
        labels:
          severity: warning
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Comment 5 errata-xmlrpc 2021-10-18 17:45:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.