Description of problem: The alerting rule for AlertmanagerReceiversNotConfigured has a bug. The alerting expression is `cluster:alertmanager_routing_enabled:max == 0` `cluster:alertmanager_routing_enabled:max` is rule `clamp_max(sum(alertmanager_notifications_total), 1)` `alertmanager_notifications_total` is simply the number of notifications an alertmanager instance has sent since it started. However, in a newly started alertmanager instance, this is 0. Thus, until alertmanager has sent the first alert, AlertmanagerReceiversNotConfigured will fire. Hilariously enough, AlertmanagerReceiversNotConfigured firing and triggering an alert increases alertmanager_notifications_total and resolves the alert. The end result is that every time a customer upgrades or rolls out a new MachineConfig to workers (i.e. anything that causes all the alertmanager instances to restart), they will get this alert. Version-Release number of selected component (if applicable): 4.6.6 How reproducible: Always Steps to Reproduce: 1. Configure alertmanager receivers 2. oc scale statefulset alertmanager-main --replicas=0 3. CVO will override and scale alertmanager back up 4. After 10m, AlertmanagerReceiversNotConfigured will fire Actual results: AlertmanagerReceiversNotConfigured fires when receivers are configured Expected results: AlertmanagerReceiversNotConfigured should not fire when receivers are configured Additional info:
tested with 4.7.0-0.nightly-2020-12-09-112139, followed the steps in comment 0, AlertmanagerReceiversNotConfigured alert was not fired again - expr: clamp_max(sum(alertmanager_integrations),1) record: cluster:alertmanager_routing_enabled:max
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633
*** Bug 1965406 has been marked as a duplicate of this bug. ***