Bug 1898655

Summary: [oVirt] Node deleted in oVirt should cause the Machine to go into a Failed phase
Product: OpenShift Container Platform Reporter: Gal Zaidman <gzaidman>
Component: Cloud ComputeAssignee: Gal Zaidman <gzaidman>
Cloud Compute sub component: oVirt Provider QA Contact: michal <mgold>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: lleistne, mgold
Version: 4.7   
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:34:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1897138    
Bug Blocks: 1898487    

Description Gal Zaidman 2020-11-17 18:13:15 UTC
Description of problem:

When a node is deleted from our infrastructure but the machine object is still present then the machine controller should move the node into Faild Phase.
See[1] for more information

[1] https://github.com/openshift/enhancements/blob/master/enhancements/machine-api/machine-instance-lifecycle.md#failed


Steps to Reproduce:
1. Manually delete an ovirt VM that correlates to a OCP worker from the oVirt Engine.
2. Watch the machine object

Actual results:
Machine remains in a Running phase

Expected results:
Machine should move to a Failed state

Comment 1 Joel Speed 2020-11-18 10:08:23 UTC
From the Machine controller side, your actuator Exists function should be returning `false, nil` in this scenario, which then causes the Machine controller to mark the Machine as failed. I would recommend checking what your Exists returns in this scenario to understand why this isn't working as expected.

Comment 2 Gal Zaidman 2020-11-18 16:58:49 UTC
This issue was resolved  by https://bugzilla.redhat.com/show_bug.cgi?id=1897138 but still needs to be tested

Comment 6 michal 2020-12-23 11:41:27 UTC
Verify on:
4.7.0-0.nightly-2020-12-20-003733

Step:
1) In the command line check 'oc get nodes' and verify there are all VMs
1) Open RHV UI
2) In the 'Virtual Machine' screen, choose any worker virtual machine and 'Power Off'
3) Remove the virtual machine
4) come back to the command line and press again 'oc get nodes'- verify that node was deleted
5) check 'oc get machines' - verify that one machine became to 'failed' and after a will it will delete also
6) check 'oc get machineset' - verify that 'available' updated with available VMs


Result:
deleted vm from rhv was updated on nodes and machines list

Comment 8 errata-xmlrpc 2021-02-24 15:34:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633