Bug 1898655

Summary:	[oVirt] Node deleted in oVirt should cause the Machine to go into a Failed phase
Product:	OpenShift Container Platform	Reporter:	Gal Zaidman <gzaidman>
Component:	Cloud Compute	Assignee:	Gal Zaidman <gzaidman>
Cloud Compute sub component:	oVirt Provider	QA Contact:	michal <mgold>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	high	CC:	lleistne, mgold
Version:	4.7
Target Milestone:	---
Target Release:	4.7.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-02-24 15:34:16 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1897138
Bug Blocks:	1898487

Description Gal Zaidman 2020-11-17 18:13:15 UTC

Description of problem:

When a node is deleted from our infrastructure but the machine object is still present then the machine controller should move the node into Faild Phase.
See[1] for more information

[1] https://github.com/openshift/enhancements/blob/master/enhancements/machine-api/machine-instance-lifecycle.md#failed


Steps to Reproduce:
1. Manually delete an ovirt VM that correlates to a OCP worker from the oVirt Engine.
2. Watch the machine object

Actual results:
Machine remains in a Running phase

Expected results:
Machine should move to a Failed state

Comment 1 Joel Speed 2020-11-18 10:08:23 UTC

From the Machine controller side, your actuator Exists function should be returning `false, nil` in this scenario, which then causes the Machine controller to mark the Machine as failed. I would recommend checking what your Exists returns in this scenario to understand why this isn't working as expected.

Comment 2 Gal Zaidman 2020-11-18 16:58:49 UTC

This issue was resolved  by https://bugzilla.redhat.com/show_bug.cgi?id=1897138 but still needs to be tested

Comment 6 michal 2020-12-23 11:41:27 UTC

Verify on:
4.7.0-0.nightly-2020-12-20-003733

Step:
1) In the command line check 'oc get nodes' and verify there are all VMs
1) Open RHV UI
2) In the 'Virtual Machine' screen, choose any worker virtual machine and 'Power Off'
3) Remove the virtual machine
4) come back to the command line and press again 'oc get nodes'- verify that node was deleted
5) check 'oc get machines' - verify that one machine became to 'failed' and after a will it will delete also
6) check 'oc get machineset' - verify that 'available' updated with available VMs


Result:
deleted vm from rhv was updated on nodes and machines list

Comment 8 errata-xmlrpc 2021-02-24 15:34:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633