Description of problem: if the PHASE is empty for the machine, the MachineWithNoRunningPhase alert would indicate like below "machine ocp-edge-cluster-master-0 is in phase", this makes people think the expression is not complete $ oc -n openshift-machine-api get machine NAME PHASE TYPE REGION ZONE AGE ocp-edge-cluster-master-0 10h ocp-edge-cluster-master-1 10h ocp-edge-cluster-master-2 10h ocp-edge-cluster-worker-0-b2k2j 70m ocp-edge-cluster-worker-0-btsql 10m ocp-edge-cluster-worker-0-n6wfc 10h alert: MachineWithNoRunningPhase expr: (mapi_machine_created_timestamp_seconds{phase!="Running"}) > 0 for: 10m labels: severity: critical annotations: message: machine {{ $labels.name }} is in {{ $labels.phase }} phase Version-Release number of selected component (if applicable): 4.3.0-0.nightly-2020-03-04-165955 How reproducible: only when the PHASE is empty Steps to Reproduce: 1. See the description 2. 3. Actual results: Expected results: Additional info:
This seems a pretty edge case where the machine controller was never run. If anything we can try to rephrase the message to make that more obvious. https://github.com/openshift/machine-api-operator/pull/549
Validated on : NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-04-05-214011 True False 16m Cluster version is 4.5.0-0.nightly-2020-04-05-214011 One machine was set to have no phase : [miyadav@miyadav ManualRun]$ oc get machines NAME PHASE TYPE REGION ZONE AGE miyadav-0604-hbfwf-master-0 Running m4.xlarge us-east-2 us-east-2a 89m miyadav-0604-hbfwf-master-1 Running m4.xlarge us-east-2 us-east-2b 89m miyadav-0604-hbfwf-master-2 Running m4.xlarge us-east-2 us-east-2c 89m miyadav-0604-hbfwf-worker-us-east-2a-56p54 Running m4.large us-east-2 us-east-2a 76m miyadav-0604-hbfwf-worker-us-east-2b-new m4.large us-east-2 us-east-2b 40m miyadav-0604-hbfwf-worker-us-east-2c-5m7cz Running m4.large us-east-2 us-east-2c 76m Seeing the alert message : machine miyadav-0604-hbfwf-worker-us-east-2b-new is in phase: Which seems as per the change in pull request : https://github.com/openshift/machine-api-operator/pull/549 Will consult with reporter .
(In reply to Milind Yadav from comment #4) > Validated on : > NAME VERSION AVAILABLE PROGRESSING > SINCE STATUS > version 4.5.0-0.nightly-2020-04-05-214011 True False 16m > Cluster version is 4.5.0-0.nightly-2020-04-05-214011 > > One machine was set to have no phase : > [miyadav@miyadav ManualRun]$ oc get machines > NAME PHASE TYPE REGION > ZONE AGE > miyadav-0604-hbfwf-master-0 Running m4.xlarge us-east-2 > us-east-2a 89m > miyadav-0604-hbfwf-master-1 Running m4.xlarge us-east-2 > us-east-2b 89m > miyadav-0604-hbfwf-master-2 Running m4.xlarge us-east-2 > us-east-2c 89m > miyadav-0604-hbfwf-worker-us-east-2a-56p54 Running m4.large us-east-2 > us-east-2a 76m > miyadav-0604-hbfwf-worker-us-east-2b-new m4.large us-east-2 > us-east-2b 40m > miyadav-0604-hbfwf-worker-us-east-2c-5m7cz Running m4.large us-east-2 > us-east-2c 76m > > > Seeing the alert message : > machine miyadav-0604-hbfwf-worker-us-east-2b-new is in phase: > > Which seems as per the change in pull request : > https://github.com/openshift/machine-api-operator/pull/549 > > Will consult with reporter . machine miyadav-0604-hbfwf-worker-us-east-2b-new is in phase: I am afraid the user will confused by the status if the PHASE is empty, it is not user friendly
I have the same issue on a fresh cluster running on VMware 6.7. Curious how to safely remove the checks since there's no machineset controller for VMware? Client Version: 4.4.3 Server Version: 4.4.3 Kubernetes Version: v1.17.1 oc get machinesets NAME DESIRED CURRENT READY AVAILABLE AGE ocp4-ctmtp-worker 0 0 28h oc get machines -n openshift-machine-api NAME PHASE TYPE REGION ZONE AGE ocp4-ctmtp-master-0 28h ocp4-ctmtp-master-1 28h ocp4-ctmtp-master-2 28h
The same issue confirmed on the fresh installation on VMWare 6.5 as well (In reply to Saul Alanis from comment #6) > I have the same issue on a fresh cluster running on VMware 6.7. Curious how > to safely remove the checks since there's no machineset controller for > VMware? > > Client Version: 4.4.3 > Server Version: 4.4.3 > Kubernetes Version: v1.17.1 >
The vSphere scenario is covered here https://bugzilla.redhat.com/show_bug.cgi?id=1834966 This ticket is to track a more suer friendly to communicate when the phase happens to be empty. Tagging with upcomingSprint.
All PRs merged, this should be on Modified now
VERIFIED ON: 4.6.0-0.nightly-2020-09-08-123737 Steps : Created a machine with empty phase (scaled machineset with machinecontroller kept down using cvo and machine-controller deployment) [miyadav@miyadav vsphere]$ oc get machines -o wide --config vsp NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE vs-miyadav-0909-rpkms-master-0 Running 3h48m vs-miyadav-0909-rpkms-master-0 vsphere://422b4df8-b303-505b-99e2-592c3ae20465 poweredOn vs-miyadav-0909-rpkms-master-1 Running 3h48m vs-miyadav-0909-rpkms-master-1 vsphere://422b787d-c6a3-ff14-0f2d-c5ebb7f113db poweredOn vs-miyadav-0909-rpkms-master-2 Running 3h48m vs-miyadav-0909-rpkms-master-2 vsphere://422b4f4d-7d45-94be-3bc0-0e86d431fd01 poweredOn vs-miyadav-0909-rpkms-worker-hmjgn 21s vs-miyadav-0909-rpkms-worker-ptjkn Provisioned 3h36m vsphere://422bde74-788b-7af8-9383-44408377bd62 poweredOn vs-miyadav-0909-rpkms-worker-rrq7s Running 3h36m vs-miyadav-0909-rpkms-worker-rrq7s vsphere://422be000-30eb-31f9-7f41-65b1d3545ede poweredOn Expected & Actual : No Alert fired after 10m for MachineWithNoRunningPhase Additional Info: Moved to VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196