Bug 2031919 - [SNO] we cannot cleanly remove the product on SNO due to kubevirt apiservices leftovers
Summary: [SNO] we cannot cleanly remove the product on SNO due to kubevirt apiservices...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.10.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.0
Assignee: Jed Lejosne
QA Contact: Kedar Bidarkar
URL:
Whiteboard:
Depends On: 2026336
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-13 17:07 UTC by Simone Tiraboschi
Modified: 2022-03-16 15:57 UTC (History)
4 users (show)

Fixed In Version: virt-operator-container-v4.10.0-185 hco-bundle-registry-container-v4.10.0-576
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-16 15:57:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
kubevirt oeprator logs 1/2 (537.36 KB, text/plain)
2021-12-13 17:10 UTC, Simone Tiraboschi
no flags Details
kubevirt operator logs 2/2 (39.07 KB, text/plain)
2021-12-13 17:11 UTC, Simone Tiraboschi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 6949 0 None Merged KubeVirt CR: fix update of finalizers and infra replica count 2021-12-17 22:53:51 UTC

Description Simone Tiraboschi 2021-12-13 17:07:36 UTC
Description of problem:
Now, after https://bugzilla.redhat.com/show_bug.cgi?id=2026336 on an SNO cluster we have only one replica of virt-api and virt-controller (although we still have 2 virt-operators due to https://github.com/operator-framework/operator-lifecycle-manager/issues/2453 ).


When the user will delete HCO CR, HCO will try to delete kubert CR and it succeed but on SNO the kubevirt apiservices are still there as leftovers:

 $ oc get apiservices | grep virt-api
v1.subresources.kubevirt.io                          kubevirt-hyperconverged/virt-api                             False (MissingEndpoints)   42m
v1alpha3.subresources.kubevirt.io                    kubevirt-hyperconverged/virt-api                             False (MissingEndpoints)   42m

and this prevents the namespace from being successfully cleaned up,
when the user will try to clean up the namespace it will get stuck with:

    message: 'Discovery failed for some groups, 2 failing: unable to retrieve the
      complete list of server APIs: subresources.kubevirt.io/v1: the server is currently
      unable to handle the request, subresources.kubevirt.io/v1alpha3: the server
      is currently unable to handle the request'
    reason: DiscoveryFailed
    status: "True"
    type: NamespaceDeletionDiscoveryFailure


Version-Release number of selected component (if applicable):
4.10

How reproducible:
pretty often but not 100% systematic

Steps to Reproduce:
1. deploy CNV on SNO
2. remove the HCO CR
3. check for kubevirt apiservices leftovers

Actual results:
v1.subresources.kubevirt.io and v1alpha3.subresources.kubevirt.io apiservices are still there as leftovers.
When the user will try to remove the namespace, it will become stuck with:
    message: 'Discovery failed for some groups, 2 failing: unable to retrieve the
      complete list of server APIs: subresources.kubevirt.io/v1: the server is currently
      unable to handle the request, subresources.kubevirt.io/v1alpha3: the server
      is currently unable to handle the request'
    reason: DiscoveryFailed
    status: "True"
    type: NamespaceDeletionDiscoveryFailure


Expected results:
kubevirt-operator successfully removes all its managed resources before removing the finalizer from its CR.
The product can be cleanly removed.

Additional info:
it happens only on SNO with a single instance of virt-api and virt-controller, with 2 instances each everything is smooth.

Comment 1 Simone Tiraboschi 2021-12-13 17:10:16 UTC
Created attachment 1846111 [details]
kubevirt oeprator logs 1/2

Comment 2 Simone Tiraboschi 2021-12-13 17:11:02 UTC
Created attachment 1846112 [details]
kubevirt operator logs 2/2

Comment 3 Kedar Bidarkar 2021-12-14 19:25:45 UTC
Just for the record, with SNO setup, we currently see this issue,

1) If we decide to have 1 Replica of virt-api and virt-controller , we cannot cleanly remove the product ( this bug )
2) If we decide to have 2 Replicas of virt-api and virt-controller , we cannot install SR-IOV Operator successfully.
https://bugzilla.redhat.com/show_bug.cgi?id=2027420 ( [SNO] SR-IOV operator fails to install after CNV is installed )

Comment 4 Kedar Bidarkar 2021-12-27 15:59:48 UTC
We are now able to successfully clean up CNV on SNO without traces of virt-api, apiservices.

 + oc delete hyperconvergeds --all-namespaces --all --ignore-not-found
hyperconverged.hco.kubevirt.io "kubevirt-hyperconverged" deleted


 ]$ oc get apiservices -n openshift-cnv | grep virt-
[kbidarka@localhost cnv]$

openshift-cnv got deleted successfully. 

+ oc delete namespace openshift-cnv --ignore-not-found
namespace "openshift-cnv" deleted

Comment 5 Kedar Bidarkar 2021-12-27 16:16:43 UTC
Tested with container-native-virtualization/virt-operator/images/v4.10.0-164

Comment 6 Kedar Bidarkar 2022-01-20 17:05:52 UTC
 oc delete namespace openshift-cnv --ignore-not-found
namespace "openshift-cnv" deleted
[kbidarka@localhost cnv]$ oc get apiservices -A | grep virt-api 
[kbidarka@localhost cnv]$ oc projects | grep openshift-cnv
[kbidarka@localhost cnv]$ 

Tested with:
HCO Version: v4.10.0-605
Virt-operator: v4.10.0-197

We are now able to successfully clean up CNV on SNO without traces of virt-api, apiservices.

Comment 11 errata-xmlrpc 2022-03-16 15:57:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0947


Note You need to log in before you can comment on or make changes to this bug.