Bug 2122236 - Failing to delete HCO with SSP sticking around
Summary: Failing to delete HCO with SSP sticking around
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: SSP
Version: 4.12.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.12.0
Assignee: Andrej Krejcir
QA Contact: zhe peng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-29 15:02 UTC by Alex Kalenyuk
Modified: 2023-01-24 13:41 UTC (History)
4 users (show)

Fixed In Version: kubevirt-ssp-operator-container-v4.12.0-45
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-24 13:39:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt ssp-operator pull 417 0 None open Watch required CRDs, and restart operator if they are removed 2022-10-03 14:06:14 UTC
Github kubevirt ssp-operator pull 420 0 None Merged [release-v0.16] Watch required CRDs, and restart operator if they are removed 2022-10-12 08:05:00 UTC
Red Hat Issue Tracker CNV-20910 0 None None None 2022-11-16 11:42:43 UTC
Red Hat Product Errata RHSA-2023:0408 0 None None None 2023-01-24 13:41:11 UTC

Internal Links: 2087608

Description Alex Kalenyuk 2022-08-29 15:02:07 UTC
Description of problem:
Unable to delete HCO with logs suggesting SSP still sticks around failing to delete

Version-Release number of selected component (if applicable):
CNV bundle v4.12.0-425


How reproducible:
Not 100%, but have seen it in release pipelines twice these past few days


Steps to Reproduce:
1. Delete HCO

Actual results:
Failing to remove HCO

Expected results:
Successfully removed

Additional info:
Attached relevant logs.
Is it possible SSP depends on CDI being around to delete successfully?
Looks like the reconcile loop backs out very quickly, not reaching cleanup

Jenkins job logs that show we timed out after 2h:
12:32:35  + oc delete hyperconvergeds --all-namespaces --all --ignore-not-found
12:32:37  hyperconverged.hco.kubevirt.io "kubevirt-hyperconverged" deleted
14:31:25  Cancelling nested steps due to timeout
14:31:25  Sending interrupt signal to process
HCO operator filled with a bunch of these:
{"level":"error","ts":1661782888.8078818,"logger":"controller_hyperconverged","msg":"Failed to manually delete objects","Request.Namespace":"openshift-cnv","Request.Name":"kubevirt-hyperconverged","error":"timed out waiting for &{map[\"apiVersion\":\"ssp.kubevirt.io/v1beta1\" \"kind\":\"SSP\"

Comment 5 Andrej Krejcir 2022-09-01 10:46:19 UTC
SSP needs CDI to work correctly. It waits until CDI CRDs are installed, but there is no logic to react to deletion of these CRDs while SSP is running.

Simone, could  HCO be modified, to first delete SSP and wait until it is removed, and then remove CDI?

Comment 6 Simone Tiraboschi 2022-09-01 15:41:17 UTC
(In reply to Andrej Krejcir from comment #5)
> SSP needs CDI to work correctly. It waits until CDI CRDs are installed, but
> there is no logic to react to deletion of these CRDs while SSP is running.
> 
> Simone, could  HCO be modified, to first delete SSP and wait until it is
> removed, and then remove CDI?

We are trying, as much as possible, to avoid this kind of interdependencies on creation/deletion orders to avoid to have to track and sync different asynchronous operations.

Comment 8 zhe peng 2022-11-21 09:42:23 UTC
verify with build:
OCP-4.12.0-rc.0
CNV-v4.12.0-693
step:
1. Set the 'uninstallStrategy' of HCO CR to 'RemoveWorkloads'
# oc edit hco kubevirt-hyperconverged -n openshift-cnv

2. Remove the HCO CR
$oc delete hyperconvergeds --all-namespaces --all --ignore-not-found
hyperconverged.hco.kubevirt.io "kubevirt-hyperconverged" deleted

check HCO is removed
$ oc edit hco kubevirt-hyperconverged -n openshift-cnv
Error from server (NotFound): hyperconvergeds.hco.kubevirt.io "kubevirt-hyperconverged" not found

check ssp is removed
$ oc get ssp -A
No resources found

repeat 5 times, no issues found,HCO and ssp all removed.
Move bug to verified.

Comment 12 errata-xmlrpc 2023-01-24 13:39:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.12.0 Images security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:0408


Note You need to log in before you can comment on or make changes to this bug.