Bug 1939753 - Delete HCO is stucking if there is still VM in the cluster
Summary: Delete HCO is stucking if there is still VM in the cluster
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Management Console
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: ralpert
QA Contact: Siva Reddy
URL:
Whiteboard: Scrubbed
Depends On:
Blocks: 1971667
TreeView+ depends on / blocked
 
Reported: 2021-03-17 00:16 UTC by Guohua Ouyang
Modified: 2021-07-27 22:54 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Error message wasn't being passed correctly to the component. Consequence: Error message wasn't displayed. Fix: Amended error handling. Result: Bug is fixed.
Clone Of:
: 1971667 (view as bug list)
Environment:
Last Closed: 2021-07-27 22:53:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
OCP 4.6 (165.85 KB, image/png)
2021-03-17 00:16 UTC, Guohua Ouyang
no flags Details
Delete Error (181.99 KB, image/png)
2021-05-04 03:11 UTC, Siva Reddy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift console pull 8439 0 None open Bug 1939753: Track and show error messages in modals 2021-03-22 17:51:59 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:54:20 UTC

Description Guohua Ouyang 2021-03-17 00:16:05 UTC
Created attachment 1763858 [details]
OCP 4.6

Description of problem:
If there is still a VM in the cluster, delete the HCO is stucking, click delete button nothing is happening and stays on the delete dialog.

This is a regression from OCP 4.6, on OCP 4.6, it pops up a proper error like below:

An error occurred
admission webhook "validate-hco.kubevirt.io" denied the request: admission webhook "kubevirt-validator.kubevirt.io" denied the request: Rejecting the uninstall request, since there are still Virtual Machines present. Either delete all KubeVirt related workloads or change the uninstall strategy before uninstalling KubeVirt.

Version-Release number of selected component (if applicable):
OCP 4.7 and OCP 4.8

How reproducible:
100%

Steps to Reproduce:
1. have a VM in the cluster
2. go to Operators -> installed operators
3. select 'Openshift virtualization' in openshift-cnv namespace
4. select 'Openshift virtualization Deployment'
5. delete 'kubevirt-HyperConverged'

Actual results:
delete is stucking

Expected results:
a proper error message shows.

Additional info:

Comment 1 Yaacov Zamir 2021-03-17 07:26:37 UTC
Hi,

This bug was originally filed under kubevirt-console-plugin, the missing error msg happen in the OLM-console-plugin, maybe it's the actual operator HCO that does not response with currect error message ?

If OLM can't help here, please refer this bug to the correct component.

Comment 2 Oren Cohen 2021-03-17 11:19:54 UTC
This is a UI bug, not HCO or OLM.
The expected error message is shown when performing the CR deletion from the CLI (oc delete hco kubevirt-hyperconverged -n openshift-cnv), and the thrown error message, originating from the validating webhook, is:

Error from server (admission webhook "kubevirt-validator.kubevirt.io" denied the request: Rejecting the uninstall request, since there are still Virtual Machines present. Either delete all KubeVirt related workloads or change the uninstall strategy before uninstalling KubeVirt.): admission webhook "validate-hco.kubevirt.io" denied the request: admission webhook "kubevirt-validator.kubevirt.io" denied the request: Rejecting the uninstall request, since there are still Virtual Machines present. Either delete all KubeVirt related workloads or change the uninstall strategy before uninstalling KubeVirt.

The same error message should be displayed when performing the same action in the UI. But instead the red "Delete" button is stuck at "pressed" state.
Note: the error message is displayed at the console of the browser when in developer mode (F12), when clicking on the Delete button.

In OCP 4.6, that message appeared on the UI, as expected. See the screenshot Guohua attached.

Comment 3 Yaacov Zamir 2021-03-17 11:31:36 UTC
@Oren thanks, do you know what is the correct component for this bug ?

The kubvirt-plugin UI does not cover this part of the code, so this UI may come from HCO UI ? or it's a generic UI that OLM generate from HCO definitions ?

Comment 4 Yaacov Zamir 2021-03-17 11:39:42 UTC
Hi, moving to management console per comment#2 and commant#3

Not sure if this is the correct component, if you can't help, please move to correct component.

Comment 5 Oren Cohen 2021-03-17 12:15:17 UTC
I think the issue is indeed with the general management console, since it occurs both in "Installed Operators" page --> the CR tab, and in Administration --> CustomResourceDefinitions --> find the relevant CRD --> Instances.

I didn't check, but I assume the issue is not affecting only CNV, but every resource that is protected by a validating webhook, and it's denying the deletion request of the resource.

Comment 6 Jakub Hadvig 2021-03-17 19:28:12 UTC
Rebecca could you please check if the issue is somehow related to your change in https://github.com/openshift/console/pull/6887
I doubt that, from first look but lets check it first. If not please assign it back to me.

Comment 7 ralpert 2021-03-19 18:34:47 UTC
That's a weird one. I'll look into it!

Comment 8 ralpert 2021-03-19 20:22:44 UTC
Hi @gouyang - I've trying to recreate this. Are there other steps required that weren't outlined in the ticket? The delete button is working just fine for me. I'm trying to figure out if there's a step I'm missing! Thanks.

Comment 9 Oren Cohen 2021-03-21 14:01:08 UTC
Hi @ralpert, the reproducer for this bug is exactly what outlines by @gouyang:
1. Install CNV 2.6.0 from production on OCP 4.7
2. Create a Virtual Machine
3. Delete the HyperConverged Custom Resource named "kubevirt-hyperconverged" in namespace "openshift-cnv" via the UI.
Result: the red delete button is stuck. the error message is shown in browser's developer mode.

Please see attached screen recording:
https://drive.google.com/file/d/1iC-RIytxZO9M0Cqfpo0OKzycC1aYYrH6/view?usp=sharing

Comment 10 Guohua Ouyang 2021-03-22 00:12:00 UTC
Oren, thanks for the video.

@ralpert, Maybe there are some differences between your cluster and CNV QE's cluster which cause you cannot reproduce the issue.

Comment 11 Oren Cohen 2021-03-22 08:17:06 UTC
I reproduced the issue on a QuickLab cluster, which is a different infrastructure than CNV QE (PSI).

Comment 12 ralpert 2021-03-22 13:32:04 UTC
Thanks @gouyang and @ocohen!

I'm going to talk to my team and see if anyone has any ideas what may be different between the clusters.

Comment 13 ralpert 2021-03-22 14:20:30 UTC
Hi @gouyang and @ocohen - I spoke to folks on my team and it sounds like we may need a cluster with the problem to debug this issue. Would you be able to provide us with one so we can knock this out? Thanks so much!

Comment 14 Oren Cohen 2021-03-22 15:20:26 UTC
Hi ralpert,
I'll provide you the details for accessing the cluster shortly, over a private channel.

Comment 16 Siva Reddy 2021-05-04 03:10:29 UTC
Version:
 4.8.0-0.nightly-2021-04-30-201824

Steps to verify:
 1. Login to console, go to operator hub and install "OpenShift Virtualization" operator
 2. Once installed, go to operator details Openshift Virtualization Deployment and create a HyperConverged.
 3. After HyperConverged is created, go to workloads>virtualization and create a VM
 4. While VM is running, go to Installed Operator and try to delete the HyperConverged.

    Now the error is getting displayed, attached is the screen shot.

Comment 17 Siva Reddy 2021-05-04 03:11:41 UTC
Created attachment 1779178 [details]
Delete Error

Comment 20 errata-xmlrpc 2021-07-27 22:53:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.