Description of problem: When a node is deleted from our infrastructure but the machine object is still present then the machine controller should move the node into Faild Phase. See[1] for more information [1] https://github.com/openshift/enhancements/blob/master/enhancements/machine-api/machine-instance-lifecycle.md#failed Steps to Reproduce: 1. Manually delete an ovirt VM that correlates to a OCP worker from the oVirt Engine. 2. Watch the machine object Actual results: Machine remains in a Running phase Expected results: Machine should move to a Failed state
From the Machine controller side, your actuator Exists function should be returning `false, nil` in this scenario, which then causes the Machine controller to mark the Machine as failed. I would recommend checking what your Exists returns in this scenario to understand why this isn't working as expected.
This issue was resolved by https://bugzilla.redhat.com/show_bug.cgi?id=1897138 but still needs to be tested
Verify on: 4.7.0-0.nightly-2020-12-20-003733 Step: 1) In the command line check 'oc get nodes' and verify there are all VMs 1) Open RHV UI 2) In the 'Virtual Machine' screen, choose any worker virtual machine and 'Power Off' 3) Remove the virtual machine 4) come back to the command line and press again 'oc get nodes'- verify that node was deleted 5) check 'oc get machines' - verify that one machine became to 'failed' and after a will it will delete also 6) check 'oc get machineset' - verify that 'available' updated with available VMs Result: deleted vm from rhv was updated on nodes and machines list
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633