Created attachment 1806820 [details] Critical Alert Table Created attachment 1806820 [details] Critical Alert Table Description of problem: After reviewing critical alerts in OCP, we find out the 21 alerts that need adjustments: - Recommend changing Critical to Warning: 13 - KubePersistentVolumeErrors - PrometheusBadConfig - PrometheusRemoteStorageFailures - PrometheusRuleFailures - AlertmanagerMembersInconsistent - AlertmanagerClusterFailedToSendAlerts - AlertmanagerConfigInconsistent - AlertmanagerClusterDown - KubeStateMetricsListErrors - KubeStateMetricsWatchErrors - ThanosRuleSenderIsFailingAlerts - ThanosRuleHighRuleEvaluationFailures - ThanosNoRuleEvaluations - Recommend removing alert: 2 - PrometheusErrorSendingAlertsToAnyAlertmanager - AlertmanagerClusterCrashlooping - Recommend changing Critical to Info: 1 - PrometheusRemoteWriteBehind - Threshold Tweaks: 5 - KubePersistentVolumeFillingUp - KubeletDown - NodeFilesystemFilesFillingUp - NodeFilesystemSpaceFillingUp - PrometheusRemoteStorageFailures Please refer to this table for details(proposed modification are in column F "Comments") : https://docs.google.com/spreadsheets/d/10rL3loHz6a8lBfKsU2W9TVZSrSqndrnVmkzDeA3Z2kI/edit?usp=sharing This table can be also found in the attachment. Version-Release number of selected component (if applicable): 4.9 How reproducible: N/A Steps to Reproduce: N/A Actual results: N/A Expected results: N/A Additional info:
PR created: https://github.com/openshift/cluster-monitoring-operator/pull/1310
*** Bug 1986983 has been marked as a duplicate of this bug. ***
Need a fix on Thanos related alerts. Set its status to "assigned" for now.
Fix in progress: https://github.com/openshift/cluster-monitoring-operator/pull/1317
Test with payload 4.9.0-0.nightly-2021-08-22-070405 Every alerts rules are consistent with doc https://docs.google.com/spreadsheets/d/10rL3loHz6a8lBfKsU2W9TVZSrSqndrnVmkzDeA3Z2kI/edit?usp=sharing
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759