Hello, The OpenShift Monitoring Team has published a set guidelines for writing alerting rules in OpenShift, including a basic style guide. You can find these here: https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide A subset of these are now being enforced in OpenShift End-to-End tests [1], with temporary exceptions for existing non-compliant rules. This component was found to have the following issues: * Alerts without summary and/or description annotations: - MachineApproverMaxPendingCSRsReached Alerts MUST include summary and description annotations. Think of summary as the first line of a commit message, or an email subject line. It should be brief but informative. The description is the longer, more detailed explanation of the alert. The enhancement document linked above has examples of alerts with these annotations. Thank you! Repo: openshift/cluster-machine-approver [1]: https://github.com/openshift/origin/commit/097e7a6
Set up cluster using cluster-bot with https://github.com/openshift/cluster-machine-approver/pull/138. Verified MachineApproverMaxPendingCSRsReached alert with summary and description annotations now. liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.ci.test-2021-11-22-061614-ci-ln-w81ycxb-latest True False 81s Cluster version is 4.10.0-0.ci.test-2021-11-22-061614-ci-ln-w81ycxb-latest liuhuali@Lius-MacBook-Pro huali-test % oc get prometheusrule machineapprover-rules -n openshift-cluster-machine-approver -o yaml apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: annotations: exclude.release.openshift.io/internal-openshift-hosted: "true" include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" creationTimestamp: "2021-11-22T06:24:00Z" generation: 1 labels: prometheus: k8s role: alert-rules name: machineapprover-rules namespace: openshift-cluster-machine-approver ownerReferences: - apiVersion: config.openshift.io/v1 kind: ClusterVersion name: version uid: 822e2653-acf9-4662-9520-9012064ab66f resourceVersion: "1782" uid: eb426fcf-c5ff-47a1-8ec5-71a4ec4b2118 spec: groups: - name: cluster-machine-approver.rules rules: - alert: MachineApproverMaxPendingCSRsReached annotations: description: | The number of pending CertificateSigningRequests has exceeded the maximum threshold (current number of machine + 100). Check the pending CSRs to determine which machines need approval, also check that the nodelink controller is running in the openshift-machine-api namespace. summary: max pending CSRs threshold reached. expr: | mapi_current_pending_csr > mapi_max_pending_csr for: 5m labels: severity: warning liuhuali@Lius-MacBook-Pro huali-test %
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056