Description of problem: In the dashboards page you see a lot of alerts like: MachineWithNoRunningPhase machine ostest-worker-0-whd5k is in phase Version-Release number of selected component (if applicable): 4.3.0-0.ci-2019-11-19-095016 How reproducible: 100% Steps to Reproduce: 1. Look in the dashboards page Actual results: Lots of "machine is in phase" errors. What is a phase??? Expected results: No errors should appear under normal operation.
Created attachment 1640689 [details] Machine is in phase Attached a screenshot of the dashboards with this error on it. Additional info: In the Machines page, all bare metal machines have a blank phase, as well as a blank region and availability zone. It could be that phase is something related to AWS instances, and the error is firing because the phase is something unknown (that also would explain why it just says "in phase" without saying what phase).
IIUC These alerts are coming from alert manager. My guess is that the code which triggers the alert does not expect the phase to be empty (which is the case for Bare Metal host based machines) and treats the empty phase as a problem which it reports on.
jtomasek - what component should this be assigned to so we can fix that behavior in the alert manager?
I managed to find the rule name and its definition. It is defined in machine-api-operator prometheus alerting rules [1]. The rule either needs to be updated to not to trigger an alert when there is no phase defined, or maybe the phase is a required thing and should be added to CAPBM (IIUC)? [1] https://github.com/openshift/machine-api-operator/blob/57f529071966836be0cbb1bebc531f191e843691/install/0000_90_machine-api-operator_04_alertrules.yaml#L23
The machine phase is set by the core machine controller. May be baremetal actuator is using an old version of the controller.
Setting target release to current development version (4.5) for investigation. Where fixes (if any) are required/requested for prior versions, cloned BZs will be created when appropriate.
Just in case, OCP 4.4: NAMESPACE NAME PHASE TYPE REGION ZONE AGE openshift-machine-api ocp-edge-cluster-0-master-0 21h openshift-machine-api ocp-edge-cluster-0-master-1 21h openshift-machine-api ocp-edge-cluster-0-master-2 21h openshift-machine-api ocp-edge-cluster-0-worker-0-gm2km 20h openshift-machine-api ocp-edge-cluster-0-worker-0-vxnt8 20h vs OCP 4.5 (4.5.0-0.nightly-2020-05-12-035058) NAMESPACE NAME PHASE TYPE REGION ZONE AGE openshift-machine-api ostest-master-0 Provisioning 86m openshift-machine-api ostest-master-1 Provisioning 86m openshift-machine-api ostest-master-2 Provisioning 86m
Phase is now populated on 4.5, it shows "provisioned as node." This was fixed by making the node/machine link in BZ1801238. *** This bug has been marked as a duplicate of bug 1801238 ***