Description of problem: Alerting rule responsible for checking if MAO is up doesn't fire when MAO is down. Version-Release number of selected component (if applicable): OpenShift 4.2.0-0.ci-2019-08-22-144620 How reproducible: Start a cluster and scale MAO to 0 Steps to Reproduce: 1. Scale CVO to 0 or add exception for MAO 2. Scale MAO and MAC to 0 3. Check prometheus UI to see if alert is firing Actual results: MachineMAOMetricsDown is not firing Expected results: MachineMAOMetricsDown is firing Additional info: Prometheus won't send alerts based on a metric when there are no metric to alert on, so expression like `mapi_mao_collector_up == 0` will always be true. To check if target is down you need to use `absent()` function. Changing alerting rule from `mapi_mao_collector_up == 0` to absent(up{job="machine-api-operator"} == 1) should solve this issue. Additionally changing this makes `mapi_mao_collector_up` metric unnecessary.
Created attachment 1611034 [details] screenshot Verified in 4.2.0-0.nightly-2019-09-02-172410, alert is fired when machine-api-operator is down.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922