Bug 1744752
| Summary: | MachineMAOMetricsDown is not firing when MAO is down | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Pawel Krupa <pkrupa> | ||||
| Component: | Cloud Compute | Assignee: | Alberto <agarcial> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Jianwei Hou <jhou> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.2.0 | CC: | agarcial | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.2.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2019-10-16 06:37:16 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Created attachment 1611034 [details]
screenshot
Verified in 4.2.0-0.nightly-2019-09-02-172410, alert is fired when machine-api-operator is down.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |
Description of problem: Alerting rule responsible for checking if MAO is up doesn't fire when MAO is down. Version-Release number of selected component (if applicable): OpenShift 4.2.0-0.ci-2019-08-22-144620 How reproducible: Start a cluster and scale MAO to 0 Steps to Reproduce: 1. Scale CVO to 0 or add exception for MAO 2. Scale MAO and MAC to 0 3. Check prometheus UI to see if alert is firing Actual results: MachineMAOMetricsDown is not firing Expected results: MachineMAOMetricsDown is firing Additional info: Prometheus won't send alerts based on a metric when there are no metric to alert on, so expression like `mapi_mao_collector_up == 0` will always be true. To check if target is down you need to use `absent()` function. Changing alerting rule from `mapi_mao_collector_up == 0` to absent(up{job="machine-api-operator"} == 1) should solve this issue. Additionally changing this makes `mapi_mao_collector_up` metric unnecessary.