Bug 1937694

Summary: [oVirt] split ovirt providerIDReconciler logic into NodeController and ProviderIDController
Product: OpenShift Container Platform Reporter: Gal Zaidman <gzaidman>
Component: Cloud ComputeAssignee: Gal Zaidman <gzaidman>
Cloud Compute sub component: oVirt Provider QA Contact: Guilherme Santos <gdeolive>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent    
Version: 4.8   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 22:52:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1929702, 1939360    

Description Gal Zaidman 2021-03-11 11:08:15 UTC
Description of problem:

We need to split the logic of providerIDReconciler into NodeController and ProviderIDController.
This will allow us to ignore nodes that don't have the ovirt providerID on the delete node flow:

ProviderIDController:
If Node already has providerID, ignore
If Node doesn't have providerID, attempt to find it from oVirt/(Machine with oVirt spec)
If Node doesn't have providerID and isn't oVirt, error and look again later 

NodeController:

If no providerID or providerID not prefixed with ovirt, ignore.
If ovirt providerID and this vm no longer exists on provider, remove

How to test:
Try to challenge the machine logic, scaling up and down, removing VM from ovirt and so on

Comment 2 Guilherme Santos 2021-03-23 17:27:57 UTC
Verified on:
4.8.0-0.nightly-2021-03-19-075500

Steps:
1. scaled up the cluster
2. on ovirt, manually removed some worker vms (some while it's been deployed)
3. scale down and then up few times repeating the deletion in the middle

Results:
deletion and addition of nodes/vms working as expected
missing vms properly reported as failed (or stuck in provisioning/provisioned if deletion while being created - they would be the first ones to be deleted on scale down though)

Comment 5 errata-xmlrpc 2021-07-27 22:52:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438