1912567 – [OCP on RHV] Node becomes to 'NotReady' status when shutdown vm from RHV UI only on the second deletion

Bug 1912567 - [OCP on RHV] Node becomes to 'NotReady' status when shutdown vm from RHV UI only on the second deletion

Summary: [OCP on RHV] Node becomes to 'NotReady' status when shutdown vm from RHV UI o...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cloud Compute
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Gal Zaidman
QA Contact:	michal
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-01-04 19:11 UTC by michal
Modified:	2021-02-24 15:50 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-02-24 15:49:41 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-api-provider-ovirt pull 82	0	None	closed	Bug 1912567: handle node removal from oVirt	2021-01-28 16:48:40 UTC
Red Hat Product Errata	RHSA-2020:5633	0	None	None	None	2021-02-24 15:50:01 UTC

Description michal 2021-01-04 19:11:18 UTC

Description of problem:
Node becomes to 'NotReady' status when shutdown vm from RHV UI only on the second deletion 

Version-Release number of selected component (if applicable):
OCP- 4.6.0-0.nightly-2021-01-03-162024
RHV- 4.4.4.3-0.5

How reproducible:
always


Steps to Reproduce:
1) In the command line check 'oc get nodes' and verify that all VMs there
2) Open RHV UI
3) In the 'Virtual Machine' screen, choose any worker virtual machine and 'Shutdown'
4) Remove the virtual machine
5) come back to the command line and press again 'oc get nodes'- verify that node was deleted
6) check 'oc get machines' - verify that one machine became to 'failed' and after a will it will delete also
7) perform these steps again


Actual results:
node became to 'NotReady' status and machine status doesn't change
[root@mgold-ocp-engine primary]# oc get machines
NAME                           PHASE     TYPE   REGION   ZONE   AGE
ovirt10-7c7kw-master-0         Running                          4h1m
ovirt10-7c7kw-master-1         Running                          4h1m
ovirt10-7c7kw-master-2         Running                          4h1m
ovirt10-7c7kw-worker-0-9t49p   Failed                           14m
ovirt10-7c7kw-worker-0-svn7p   Running                          104m
[root@mgold-ocp-engine primary]# oc get nodes
NAME                           STATUS     ROLES    AGE     VERSION
ovirt10-7c7kw-master-0         Ready      master   3h57m   v1.19.0+9c69bdc
ovirt10-7c7kw-master-1         Ready      master   3h57m   v1.19.0+9c69bdc
ovirt10-7c7kw-master-2         Ready      master   3h57m   v1.19.0+9c69bdc
ovirt10-7c7kw-worker-0-svn7p   NotReady   worker   96m     v1.19.0+9c69bdc


Expected results:
node was deleted and relevant machine became to 'failed

Additional info:

Comment 2 michal 2021-01-21 19:54:55 UTC

verify on: 
ocp : ./openshift-install 4.7.0-0.nightly-2021-01-21-012810
rhv: 4.4.4.7

steps:
1) In the command line check 'oc get nodes' and verify that all VMs there
2) Open RHV UI
3) In the 'Virtual Machine' screen, choose any worker virtual machine and 'Shutdown'
4) Remove the virtual machine
5) come back to the command line and press again 'oc get nodes'- verify that node was deleted
6) check 'oc get machines' - verify that one machine became to 'failed' and after a while it will delete also
7) perform these steps again
8)verify that machine became to 'failed' and 'nodes' delete

result: 
the machine became to 'failed' and 'node' delete

[root@ocp-ge-2 primary]# oc get nodes
NAME                           STATUS   ROLES    AGE     VERSION
ovirt10-vk8dz-master-0         Ready    master   3h49m   v1.20.0+d9c52cc
ovirt10-vk8dz-master-1         Ready    master   3h47m   v1.20.0+d9c52cc
ovirt10-vk8dz-master-2         Ready    master   3h47m   v1.20.0+d9c52cc
ovirt10-vk8dz-worker-0-ljl24   Ready    worker   9m17s   v1.20.0+d9c52cc
[root@ocp-ge-2 primary]# oc get machine
NAME                           PHASE     TYPE   REGION   ZONE   AGE
ovirt10-vk8dz-master-0         Running                          3h53m
ovirt10-vk8dz-master-1         Running                          3h53m
ovirt10-vk8dz-master-2         Running                          3h53m
ovirt10-vk8dz-worker-0-ljl24   Running                          17m
ovirt10-vk8dz-worker-0-mtsj4   Failed                           3h42m

Comment 5 errata-xmlrpc 2021-02-24 15:49:41 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Note You need to log in before you can comment on or make changes to this bug.