Description of problem: all the alert rules' annotations "summary" and "description" should comply with the OpenShift alerting guidelines Version-Release number of selected component (if applicable): 4.9.0-0.nightly-2021-08-07-175228 How reproducible: always Steps to Reproduce: 1. 2. 3. Actual results: $ oc get prometheusrules -n openshift-sdn -oyaml apiVersion: v1 items: - apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: annotations: networkoperator.openshift.io/ignore-errors: "" creationTimestamp: "2021-08-10T23:12:52Z" generation: 1 labels: prometheus: k8s role: alert-rules name: networking-rules namespace: openshift-sdn ownerReferences: - apiVersion: operator.openshift.io/v1 blockOwnerDeletion: true controller: true kind: Network name: cluster uid: f3f79f33-0ad6-4115-bb39-3dbd18324808 resourceVersion: "2834" uid: 92537d98-cb35-4f74-9291-e1b6f3952277 spec: groups: - name: cluster-network-operator-sdn.rules rules: - alert: NodeWithoutSDNPod annotations: message: | All nodes should be running an sdn pod, {{ $labels.node }} is not. expr: | (kube_node_info unless on(node) topk by (node) (1, kube_pod_info{namespace="openshift-sdn", pod=~"sdn.*"})) > 0 for: 10m labels: severity: warning - alert: NodeProxyApplySlow annotations: message: SDN pod {{ $labels.pod }} on node {{ $labels.node }} is taking too long, on average, to apply kubernetes service rules to iptables. expr: "histogram_quantile(.95, kubeproxy_sync_proxy_rules_duration_seconds_bucket) \n* on(namespace, pod) group_right topk by (namespace, pod) (1, kube_pod_info{namespace=\"openshift-sdn\", \ pod=~\"sdn-[^-]*\"}) > 15\n" labels: severity: warning - alert: ClusterProxyApplySlow annotations: message: The cluster is taking too long, on average, to apply kubernetes service rules to iptables. expr: | histogram_quantile(0.95, sum(rate(kubeproxy_sync_proxy_rules_duration_seconds_bucket[5m])) by (le)) > 10 labels: severity: warning - alert: NodeProxyApplyStale annotations: message: SDN pod {{ $labels.pod }} on node {{ $labels.node }} has stale kubernetes service rules in iptables. expr: | (kubeproxy_sync_proxy_rules_last_queued_timestamp_seconds - kubeproxy_sync_proxy_rules_last_timestamp_seconds) * on(namespace, pod) group_right() topk by (namespace, pod) (1, kube_pod_info{namespace="openshift-sdn",pod=~"sdn-[^-]*"}) > 30 for: 5m labels: severity: warning - alert: SDNPodNotReady annotations: message: SDN pod {{ $labels.pod }} on node {{ $labels.node }} is not ready. expr: | kube_pod_status_ready{namespace='openshift-sdn', condition='true'} == 0 for: 10m labels: severity: warning kind: List metadata: resourceVersion: "" selfLink: "" Expected results: alert rules have annotations "summary" and "description" Additional info: the "summary" and "description" annotations comply with the OpenShift alerting guidelines [1] [1] https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#documentation-required
The following rules have issue also $ oc get prometheusrules -n openshift-ingress-operator -oyaml apiVersion: v1 items: - apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: annotations: include.release.openshift.io/ibm-cloud-managed: "true" include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" creationTimestamp: "2021-08-10T23:12:03Z" generation: 1 labels: role: alert-rules name: ingress-operator namespace: openshift-ingress-operator ownerReferences: - apiVersion: config.openshift.io/v1 kind: ClusterVersion name: version uid: 9fc7b5b6-6c23-4335-be07-ecfe1b9a142f resourceVersion: "1790" uid: 0efb31c0-440b-408b-aee6-aba2fa472459 spec: groups: - name: openshift-ingress.rules rules: - alert: HAProxyReloadFail annotations: message: HAProxy reloads are failing on {{ $labels.pod }}. Router is not respecting recently created or modified routes expr: template_router_reload_failure == 1 for: 5m labels: severity: warning - alert: HAProxyDown annotations: message: HAProxy metrics are reporting that HAProxy is down on pod {{ $labels.namespace }} / {{ $labels.pod }} expr: haproxy_up == 0 for: 5m labels: severity: critical - alert: IngressControllerDegraded annotations: message: | The {{ $labels.namespace }}/{{ $labels.name }} ingresscontroller is degraded: {{ $labels.reason }}. expr: ingress_controller_conditions{condition="Degraded"} == 1 for: 5m labels: severity: warning - alert: IngressControllerUnavailable annotations: message: | The {{ $labels.namespace }}/{{ $labels.name }} ingresscontroller is unavailable: {{ $labels.reason }}. expr: ingress_controller_conditions{condition="Available"} == 0 for: 5m labels: severity: warning kind: List metadata: resourceVersion: "" selfLink: ""
Checked on version below, prometheusrules for openshift-sdn and openshift-ovn have added summary annotation lilia@liliadeMacBook-Pro mytest % oc get prometheusrules -n openshift-sdn -oyaml apiVersion: v1 items: - apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: annotations: networkoperator.openshift.io/ignore-errors: "" creationTimestamp: "2021-08-18T10:18:25Z" generation: 1 labels: prometheus: k8s role: alert-rules managedFields: - apiVersion: monitoring.coreos.com/v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:networkoperator.openshift.io/ignore-errors: {} f:labels: .: {} f:prometheus: {} f:role: {} f:ownerReferences: .: {} k:{"uid":"923b3b1a-ad7a-48b7-84f2-cf96afa79aaa"}: {} f:spec: .: {} f:groups: {} manager: cluster-network-operator operation: Update time: "2021-08-18T10:18:25Z" name: networking-rules namespace: openshift-sdn ownerReferences: - apiVersion: operator.openshift.io/v1 blockOwnerDeletion: true controller: true kind: Network name: cluster uid: 923b3b1a-ad7a-48b7-84f2-cf96afa79aaa resourceVersion: "2771" uid: 0d3e5958-76ee-4b19-9f1c-bb3e4e3fb81c spec: groups: - name: cluster-network-operator-sdn.rules rules: - alert: NodeWithoutSDNPod annotations: summary: All nodes should be running an sdn pod, {{ $labels.node }} is not. expr: | (kube_node_info unless on(node) topk by (node) (1, kube_pod_info{namespace="openshift-sdn", pod=~"sdn.*"})) > 0 for: 10m labels: severity: warning - alert: NodeProxyApplySlow annotations: summary: SDN pod {{ $labels.pod }} on node {{ $labels.node }} is taking too long, on average, to apply kubernetes service rules to iptables. expr: "histogram_quantile(.95, kubeproxy_sync_proxy_rules_duration_seconds_bucket) \n* on(namespace, pod) group_right topk by (namespace, pod) (1, kube_pod_info{namespace=\"openshift-sdn\", pod=~\"sdn-[^-]*\"}) > 15\n" labels: severity: warning - alert: ClusterProxyApplySlow annotations: summary: The cluster is taking too long, on average, to apply kubernetes service rules to iptables. expr: | histogram_quantile(0.95, sum(rate(kubeproxy_sync_proxy_rules_duration_seconds_bucket[5m])) by (le)) > 10 labels: severity: warning - alert: NodeProxyApplyStale annotations: summary: SDN pod {{ $labels.pod }} on node {{ $labels.node }} has stale kubernetes service rules in iptables. expr: | (kubeproxy_sync_proxy_rules_last_queued_timestamp_seconds - kubeproxy_sync_proxy_rules_last_timestamp_seconds) * on(namespace, pod) group_right() topk by (namespace, pod) (1, kube_pod_info{namespace="openshift-sdn",pod=~"sdn-[^-]*"}) > 30 for: 5m labels: severity: warning - alert: SDNPodNotReady annotations: summary: SDN pod {{ $labels.pod }} on node {{ $labels.node }} is not ready. expr: | kube_pod_status_ready{namespace='openshift-sdn', condition='true'} == 0 for: 10m labels: severity: warning kind: List metadata: resourceVersion: "" selfLink: ""
Add version for verification lilia@liliadeMacBook-Pro mytest % oc version Client Version: 4.7.5 Server Version: 4.9.0-0.nightly-2021-08-17-122812 Kubernetes Version: v1.22.0-rc.0+3dfed96
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759