Bug 1939360 - [oVirt] split ovirt providerIDReconciler logic into NodeController and ProviderIDController
Summary: [oVirt] split ovirt providerIDReconciler logic into NodeController and Provid...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.8
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.7.z
Assignee: Gal Zaidman
QA Contact: Guilherme Santos
URL:
Whiteboard:
Depends On: 1937694
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-16 08:41 UTC by OpenShift BugZilla Robot
Modified: 2021-04-20 18:53 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-20 18:52:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-api-provider-ovirt pull 99 0 None open [release-4.7] Bug 1939360: providerIDController ignore nodes that have no machine 2021-03-25 19:06:57 UTC
Red Hat Product Errata RHBA-2021:1149 0 None None None 2021-04-20 18:53:02 UTC

Description OpenShift BugZilla Robot 2021-03-16 08:41:54 UTC
+++ This bug was initially created as a clone of Bug #1937694 +++

Description of problem:

We need to split the logic of providerIDReconciler into NodeController and ProviderIDController.
This will allow us to ignore nodes that don't have the ovirt providerID on the delete node flow:

ProviderIDController:
If Node already has providerID, ignore
If Node doesn't have providerID, attempt to find it from oVirt/(Machine with oVirt spec)
If Node doesn't have providerID and isn't oVirt, error and look again later 

NodeController:

If no providerID or providerID not prefixed with ovirt, ignore.
If ovirt providerID and this vm no longer exists on provider, remove

How to test:
Try to challenge the machine logic, scaling up and down, removing VM from ovirt and so on

--- Additional comment from aos-team-art-private on 2021-03-15 13:05:50 UTC ---

Elliott changed bug status from MODIFIED to ON_QA.

Comment 3 Dan Kenigsberg 2021-04-01 12:38:54 UTC
This regression incapacitated our internal cnv.engineering.redhat.com. It does the same for any user running RHV-IPI masters and UPI workers. Virtual masters and bare-metal UPI workers is favorable by many customers, most notably those who try out bare-metal features such as OpenShift Virtualization or device pass-through (e.g GPU).

Please accept this fix in sooner rather than later.

Comment 6 Guilherme Santos 2021-04-13 16:14:06 UTC
Verified on:
4.7.0-0.nightly-2021-04-10-082109

Steps:
1. scaled up the cluster
2. on ovirt, manually removed some worker vms (some while it's been deployed)
3. scale down and then up few times repeating the deletion in the middle

Results:
deletion and addition of nodes/vms working as expected
missing vms properly reported as failed (or stuck in provisioning/provisioned if deletion while being created - they would be the first ones to be deleted on scale down though)

Comment 8 errata-xmlrpc 2021-04-20 18:52:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.7 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1149


Note You need to log in before you can comment on or make changes to this bug.