Bug 1898655 - [oVirt] Node deleted in oVirt should cause the Machine to go into a Failed phase
Summary: [oVirt] Node deleted in oVirt should cause the Machine to go into a Failed phase
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.7.0
Assignee: Gal Zaidman
QA Contact: michal
URL:
Whiteboard:
Depends On: 1897138
Blocks: 1898487
TreeView+ depends on / blocked
 
Reported: 2020-11-17 18:13 UTC by Gal Zaidman
Modified: 2021-02-24 15:34 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:34:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:34:53 UTC

Description Gal Zaidman 2020-11-17 18:13:15 UTC
Description of problem:

When a node is deleted from our infrastructure but the machine object is still present then the machine controller should move the node into Faild Phase.
See[1] for more information

[1] https://github.com/openshift/enhancements/blob/master/enhancements/machine-api/machine-instance-lifecycle.md#failed


Steps to Reproduce:
1. Manually delete an ovirt VM that correlates to a OCP worker from the oVirt Engine.
2. Watch the machine object

Actual results:
Machine remains in a Running phase

Expected results:
Machine should move to a Failed state

Comment 1 Joel Speed 2020-11-18 10:08:23 UTC
From the Machine controller side, your actuator Exists function should be returning `false, nil` in this scenario, which then causes the Machine controller to mark the Machine as failed. I would recommend checking what your Exists returns in this scenario to understand why this isn't working as expected.

Comment 2 Gal Zaidman 2020-11-18 16:58:49 UTC
This issue was resolved  by https://bugzilla.redhat.com/show_bug.cgi?id=1897138 but still needs to be tested

Comment 6 michal 2020-12-23 11:41:27 UTC
Verify on:
4.7.0-0.nightly-2020-12-20-003733

Step:
1) In the command line check 'oc get nodes' and verify there are all VMs
1) Open RHV UI
2) In the 'Virtual Machine' screen, choose any worker virtual machine and 'Power Off'
3) Remove the virtual machine
4) come back to the command line and press again 'oc get nodes'- verify that node was deleted
5) check 'oc get machines' - verify that one machine became to 'failed' and after a will it will delete also
6) check 'oc get machineset' - verify that 'available' updated with available VMs


Result:
deleted vm from rhv was updated on nodes and machines list

Comment 8 errata-xmlrpc 2021-02-24 15:34:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.