Bug 1744752 - MachineMAOMetricsDown is not firing when MAO is down
Summary: MachineMAOMetricsDown is not firing when MAO is down
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.2.0
Assignee: Alberto
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-22 19:22 UTC by Pawel Krupa
Modified: 2019-10-16 06:37 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:37:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
screenshot (135.45 KB, image/png)
2019-09-03 07:30 UTC, Jianwei Hou
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-api-operator pull 388 0 None closed Bug 1744752: Add a new alert rule for reporting the Machine-api Operator failure 2020-10-22 02:08:28 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:37:27 UTC

Description Pawel Krupa 2019-08-22 19:22:55 UTC
Description of problem:

Alerting rule responsible for checking if MAO is up doesn't fire when MAO is down.


Version-Release number of selected component (if applicable): 
OpenShift 4.2.0-0.ci-2019-08-22-144620


How reproducible:

Start a cluster and scale MAO to 0


Steps to Reproduce:
1. Scale CVO to 0 or add exception for MAO
2. Scale MAO and MAC to 0
3. Check prometheus UI to see if alert is firing

Actual results:

MachineMAOMetricsDown is not firing

Expected results:

MachineMAOMetricsDown is firing



Additional info:

Prometheus won't send alerts based on a metric when there are no metric to alert on, so expression like `mapi_mao_collector_up == 0` will always be true. To check if target is down you need to use `absent()` function. 

Changing alerting rule from `mapi_mao_collector_up == 0` to absent(up{job="machine-api-operator"} == 1) should solve this issue. Additionally changing this makes `mapi_mao_collector_up` metric unnecessary.

Comment 2 Jianwei Hou 2019-09-03 07:30:02 UTC
Created attachment 1611034 [details]
screenshot

Verified in 4.2.0-0.nightly-2019-09-02-172410, alert is fired when machine-api-operator is down.

Comment 4 errata-xmlrpc 2019-10-16 06:37:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.