1898655 – [oVirt] Node deleted in oVirt should cause the Machine to go into a Failed phase

Bug 1898655 - [oVirt] Node deleted in oVirt should cause the Machine to go into a Failed phase

Summary: [oVirt] Node deleted in oVirt should cause the Machine to go into a Failed phase

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cloud Compute
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Gal Zaidman
QA Contact:	michal
Docs Contact:
URL:
Whiteboard:
Depends On:	1897138
Blocks:	1898487
TreeView+	depends on / blocked

Reported:	2020-11-17 18:13 UTC by Gal Zaidman
Modified:	2021-02-24 15:34 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-02-24 15:34:16 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2020:5633	0	None	None	None	2021-02-24 15:34:53 UTC

Description Gal Zaidman 2020-11-17 18:13:15 UTC

Description of problem:

When a node is deleted from our infrastructure but the machine object is still present then the machine controller should move the node into Faild Phase.
See[1] for more information

[1] https://github.com/openshift/enhancements/blob/master/enhancements/machine-api/machine-instance-lifecycle.md#failed


Steps to Reproduce:
1. Manually delete an ovirt VM that correlates to a OCP worker from the oVirt Engine.
2. Watch the machine object

Actual results:
Machine remains in a Running phase

Expected results:
Machine should move to a Failed state

Comment 1 Joel Speed 2020-11-18 10:08:23 UTC

From the Machine controller side, your actuator Exists function should be returning `false, nil` in this scenario, which then causes the Machine controller to mark the Machine as failed. I would recommend checking what your Exists returns in this scenario to understand why this isn't working as expected.

Comment 2 Gal Zaidman 2020-11-18 16:58:49 UTC

This issue was resolved  by https://bugzilla.redhat.com/show_bug.cgi?id=1897138 but still needs to be tested

Comment 6 michal 2020-12-23 11:41:27 UTC

Verify on:
4.7.0-0.nightly-2020-12-20-003733

Step:
1) In the command line check 'oc get nodes' and verify there are all VMs
1) Open RHV UI
2) In the 'Virtual Machine' screen, choose any worker virtual machine and 'Power Off'
3) Remove the virtual machine
4) come back to the command line and press again 'oc get nodes'- verify that node was deleted
5) check 'oc get machines' - verify that one machine became to 'failed' and after a will it will delete also
6) check 'oc get machineset' - verify that 'available' updated with available VMs


Result:
deleted vm from rhv was updated on nodes and machines list

Comment 8 errata-xmlrpc 2021-02-24 15:34:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Note You need to log in before you can comment on or make changes to this bug.