Bug 2122236

Summary: Failing to delete HCO with SSP sticking around
Product: Container Native Virtualization (CNV) Reporter: Alex Kalenyuk <akalenyu>
Component: SSPAssignee: Andrej Krejcir <akrejcir>
Status: CLOSED ERRATA QA Contact: zhe peng <zpeng>
Severity: high Docs Contact:
Priority: high    
Version: 4.12.0CC: akrejcir, gkapoor, kbidarka, stirabos
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kubevirt-ssp-operator-container-v4.12.0-45 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-24 13:39:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alex Kalenyuk 2022-08-29 15:02:07 UTC
Description of problem:
Unable to delete HCO with logs suggesting SSP still sticks around failing to delete

Version-Release number of selected component (if applicable):
CNV bundle v4.12.0-425


How reproducible:
Not 100%, but have seen it in release pipelines twice these past few days


Steps to Reproduce:
1. Delete HCO

Actual results:
Failing to remove HCO

Expected results:
Successfully removed

Additional info:
Attached relevant logs.
Is it possible SSP depends on CDI being around to delete successfully?
Looks like the reconcile loop backs out very quickly, not reaching cleanup

Jenkins job logs that show we timed out after 2h:
12:32:35  + oc delete hyperconvergeds --all-namespaces --all --ignore-not-found
12:32:37  hyperconverged.hco.kubevirt.io "kubevirt-hyperconverged" deleted
14:31:25  Cancelling nested steps due to timeout
14:31:25  Sending interrupt signal to process
HCO operator filled with a bunch of these:
{"level":"error","ts":1661782888.8078818,"logger":"controller_hyperconverged","msg":"Failed to manually delete objects","Request.Namespace":"openshift-cnv","Request.Name":"kubevirt-hyperconverged","error":"timed out waiting for &{map[\"apiVersion\":\"ssp.kubevirt.io/v1beta1\" \"kind\":\"SSP\"

Comment 5 Andrej Krejcir 2022-09-01 10:46:19 UTC
SSP needs CDI to work correctly. It waits until CDI CRDs are installed, but there is no logic to react to deletion of these CRDs while SSP is running.

Simone, could  HCO be modified, to first delete SSP and wait until it is removed, and then remove CDI?

Comment 6 Simone Tiraboschi 2022-09-01 15:41:17 UTC
(In reply to Andrej Krejcir from comment #5)
> SSP needs CDI to work correctly. It waits until CDI CRDs are installed, but
> there is no logic to react to deletion of these CRDs while SSP is running.
> 
> Simone, could  HCO be modified, to first delete SSP and wait until it is
> removed, and then remove CDI?

We are trying, as much as possible, to avoid this kind of interdependencies on creation/deletion orders to avoid to have to track and sync different asynchronous operations.

Comment 8 zhe peng 2022-11-21 09:42:23 UTC
verify with build:
OCP-4.12.0-rc.0
CNV-v4.12.0-693
step:
1. Set the 'uninstallStrategy' of HCO CR to 'RemoveWorkloads'
# oc edit hco kubevirt-hyperconverged -n openshift-cnv

2. Remove the HCO CR
$oc delete hyperconvergeds --all-namespaces --all --ignore-not-found
hyperconverged.hco.kubevirt.io "kubevirt-hyperconverged" deleted

check HCO is removed
$ oc edit hco kubevirt-hyperconverged -n openshift-cnv
Error from server (NotFound): hyperconvergeds.hco.kubevirt.io "kubevirt-hyperconverged" not found

check ssp is removed
$ oc get ssp -A
No resources found

repeat 5 times, no issues found,HCO and ssp all removed.
Move bug to verified.

Comment 12 errata-xmlrpc 2023-01-24 13:39:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.12.0 Images security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:0408