Bug 1775494 - [IPI Baremetal]: Alerts are constantly firing: "machine is in phase"
Summary: [IPI Baremetal]: Alerts are constantly firing: "machine is in phase"
Keywords:
Status: CLOSED DUPLICATE of bug 1801238
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.0
Assignee: Steven Hardy
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-22 05:52 UTC by Udi Kalifon
Modified: 2020-06-04 11:41 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-06-04 11:41:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Machine is in phase (223.59 KB, image/png)
2019-11-29 15:46 UTC, Udi Kalifon
no flags Details

Description Udi Kalifon 2019-11-22 05:52:07 UTC
Description of problem:
In the dashboards page you see a lot of alerts like:

MachineWithNoRunningPhase
machine ostest-worker-0-whd5k is in phase


Version-Release number of selected component (if applicable):
4.3.0-0.ci-2019-11-19-095016


How reproducible:
100%


Steps to Reproduce:
1. Look in the dashboards page


Actual results:
Lots of "machine is in phase" errors. What is a phase???


Expected results:
No errors should appear under normal operation.

Comment 1 Udi Kalifon 2019-11-29 15:46:49 UTC
Created attachment 1640689 [details]
Machine is in phase

Attached a screenshot of the dashboards with this error on it.

Additional info:
In the Machines page, all bare metal machines have a blank phase, as well as a blank region and availability zone. It could be that phase is something related to AWS instances, and the error is firing because the phase is something unknown (that also would explain why it just says "in phase" without saying what phase).

Comment 2 Jiri Tomasek 2019-11-29 15:50:49 UTC
IIUC These alerts are coming from alert manager. My guess is that the code which triggers the alert does not expect the phase to be empty (which is the case for Bare Metal host based machines) and treats the empty phase as a problem which it reports on.

Comment 3 Steven Hardy 2019-12-13 13:28:23 UTC
 jtomasek - what component should this be assigned to so we can fix that behavior in the alert manager?

Comment 4 Jiri Tomasek 2019-12-13 14:35:10 UTC
I managed to find the rule name and its definition. It is defined in machine-api-operator prometheus alerting rules [1]. The rule either needs to be updated to not to trigger an alert when there is no phase defined, or maybe the phase is a required thing and should be added to CAPBM (IIUC)?


[1] https://github.com/openshift/machine-api-operator/blob/57f529071966836be0cbb1bebc531f191e843691/install/0000_90_machine-api-operator_04_alertrules.yaml#L23

Comment 5 Alberto 2020-02-17 16:41:39 UTC
The machine phase is set by the core machine controller. May be baremetal actuator is using an old version of the controller.

Comment 8 Stephen Cuppett 2020-04-15 17:21:15 UTC
Setting target release to current development version (4.5) for investigation. Where fixes (if any) are required/requested for prior versions, cloned BZs will be created when appropriate.

Comment 9 Eduardo Minguez 2020-05-13 11:53:54 UTC
Just in case, OCP 4.4:

NAMESPACE               NAME                                PHASE   TYPE   REGION   ZONE   AGE
openshift-machine-api   ocp-edge-cluster-0-master-0                                        21h
openshift-machine-api   ocp-edge-cluster-0-master-1                                        21h
openshift-machine-api   ocp-edge-cluster-0-master-2                                        21h
openshift-machine-api   ocp-edge-cluster-0-worker-0-gm2km                                  20h
openshift-machine-api   ocp-edge-cluster-0-worker-0-vxnt8                                  20h

vs OCP 4.5 (4.5.0-0.nightly-2020-05-12-035058)

NAMESPACE               NAME              PHASE          TYPE   REGION   ZONE   AGE
openshift-machine-api   ostest-master-0   Provisioning                          86m
openshift-machine-api   ostest-master-1   Provisioning                          86m
openshift-machine-api   ostest-master-2   Provisioning                          86m

Comment 13 Stephen Benjamin 2020-06-04 11:41:06 UTC
Phase is now populated on 4.5, it shows "provisioned as node." This was fixed by making the node/machine link in BZ1801238.

*** This bug has been marked as a duplicate of bug 1801238 ***


Note You need to log in before you can comment on or make changes to this bug.