Bug 1780473 - Delete VM is hanging if the corresponding template does not exist anymore
Summary: Delete VM is hanging if the corresponding template does not exist anymore
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: SSP
Version: 2.2.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 2.4.0
Assignee: Karel Šimon
QA Contact: Guohua Ouyang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-06 06:11 UTC by Guohua Ouyang
Modified: 2020-07-28 19:09 UTC (History)
7 users (show)

Fixed In Version: kubevirt-template-validator-container-v2.4.0-5
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-28 19:09:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vm_delete_hang (31.85 KB, image/png)
2019-12-06 06:11 UTC, Guohua Ouyang
no flags Details
delete vm is hanging (2.63 MB, video/mp4)
2020-03-09 04:52 UTC, Guohua Ouyang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2020:3194 0 None None None 2020-07-28 19:09:53 UTC

Description Guohua Ouyang 2019-12-06 06:11:25 UTC
Created attachment 1642517 [details]
vm_delete_hang

Description of problem:
I meet the issue during verify bug 1739447, after the VM is created by following its steps, try to delete the VM, it hangs there forever. Cancel the operations the VM is not deleted.

Version-Release number of selected component (if applicable):
v2.2.0-10

How reproducible:
Only for some VMs in special state.

Steps to Reproduce:
1. See stesp in bug 1739447
2.
3.

Actual results:
'oc delete vm xxxx' is hanging.

Expected results:
'oc delete vm xxxx' delete the VM.

Additional info:
log snippet in virt-handler:

{"component":"virt-handler","kind":"Domain","level":"warning","msg":"detected expired watchdog for domain","name":"fedora","namespace":"default","pos":"cache.go:114","timestamp":"2019-12-06T02:12:59.030384Z","uid":""}
{"component":"virt-handler","level":"info","msg":"Processing event openshift-cnv/testurl","pos":"vm.go:788","timestamp":"2019-12-06T02:13:04.848351Z"}
{"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: Running\n","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:790","timestamp":"2019-12-06T02:13:04.848440Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"}
{"component":"virt-handler","kind":"Domain","level":"info","msg":"Domain status: Running, reason: Unknown\n","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:795","timestamp":"2019-12-06T02:13:04.848493Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"}
{"component":"virt-handler","kind":"","level":"info","msg":"Grace period expired, killing deleted VirtualMachineInstance testurl","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:1137","timestamp":"2019-12-06T02:13:04.849414Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"}
{"component":"virt-handler","level":"info","msg":"Processing event openshift-cnv/testurl","pos":"vm.go:788","timestamp":"2019-12-06T02:13:05.080429Z"}
{"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: Running\n","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:790","timestamp":"2019-12-06T02:13:05.080517Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"}
{"component":"virt-handler","kind":"Domain","level":"info","msg":"Domain status: Running, reason: Unknown\n","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:795","timestamp":"2019-12-06T02:13:05.080829Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"}
{"component":"virt-handler","kind":"","level":"info","msg":"Grace period expired, killing deleted VirtualMachineInstance testurl","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:1137","timestamp":"2019-12-06T02:13:05.081591Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"}
{"component":"virt-handler","kind":"Domain","level":"info","msg":"Received Domain Event of type MODIFIED","name":"testurl","namespace":"openshift-cnv","pos":"server.go:77","timestamp":"2019-12-06T02:13:05.084297Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"}
{"component":"virt-handler","kind":"Domain","level":"info","msg":"Domain is in state Shutoff reason Destroyed","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:1582","timestamp":"2019-12-06T02:13:05.084552Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"}
{"component":"virt-handler","level":"info","msg":"Processing event openshift-cnv/testurl","pos":"vm.go:788","timestamp":"2019-12-06T02:13:05.084713Z"}
{"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: Running\n","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:790","timestamp":"2019-12-06T02:13:05.084748Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"}
{"component":"virt-handler","kind":"Domain","level":"info","msg":"Domain status: Shutoff, reason: Destroyed\n","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:795","timestamp":"2019-12-06T02:13:05.084768Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"}
{"component":"virt-handler","kind":"","level":"info","msg":"Signaled deletion for testurl","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:1166","timestamp":"2019-12-06T02:13:05.085256Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"}
{"component":"virt-handler","kind":"Domain","level":"info","msg":"Received Domain Event of type DELETED","name":"testurl","namespace":"openshift-cnv","pos":"server.go:77","timestamp":"2019-12-06T02:13:05.088057Z","uid":""}
{"component":"virt-handler","kind":"Domain","level":"info","msg":"Domain deleted","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:1572","timestamp":"2019-12-06T02:13:05.088168Z","uid":""}
{"component":"virt-handler","level":"info","msg":"Processing event openshift-cnv/testurl","pos":"vm.go:788","timestamp":"2019-12-06T02:13:05.175049Z"}
{"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: Failed\n","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:790","timestamp":"2019-12-06T02:13:05.175161Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"}
{"component":"virt-handler","level":"info","msg":"Domain does not exist","pos":"vm.go:797","timestamp":"2019-12-06T02:13:05.175184Z"}
{"component":"virt-handler","kind":"Domain","level":"warning","msg":"watchdog file removed for domain","name":"testurl","namespace":"openshift-cnv","pos":"inotify.go:136","timestamp":"2019-12-06T02:13:05.175662Z","uid":""}
{"component":"virt-handler","level":"info","msg":"Processing event openshift-cnv/testurl","pos":"vm.go:788","timestamp":"2019-12-06T02:13:05.175811Z"}
{"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: Failed\n","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:790","timestamp":"2019-12-06T02:13:05.175871Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"}
{"component":"virt-handler","level":"info","msg":"Domain does not exist","pos":"vm.go:797","timestamp":"2019-12-06T02:13:05.175888Z"}
{"component":"virt-handler","level":"info","msg":"Processing event openshift-cnv/testurl","pos":"vm.go:788","timestamp":"2019-12-06T02:13:08.492948Z"}
{"component":"virt-handler","level":"info","msg":"VMI does not exist","pos":"vm.go:792","timestamp":"2019-12-06T02:13:08.493058Z"}
{"component":"virt-handler","level":"info","msg":"Domain does not exist","pos":"vm.go:797","timestamp":"2019-12-06T02:13:08.493136Z"}
{"component":"virt-handler","kind":"Domain","level":"warning","msg":"detected expired watchdog for domain","name":"fedora","namespace":"default","pos":"cache.go:114","timestamp":"2019-12-06T02:13:14.030635Z","uid":""}
{"component":"virt-handler","kind":"Domain","level":"warning","msg":"detected expired watchdog for domain","name":"fedora","namespace":"default","pos":"cache.go:114","timestamp":"2019-12-06T02:13:29.030059Z","uid":""}
{"component":"virt-handler","kind":"Domain","level":"warning","msg":"detected expired watchdog for 

{"component":"virt-handler","kind":"Domain","level":"warnin

Comment 2 sgott 2019-12-06 14:57:24 UTC
Thanks Guohua. Could you please clarify the steps you followed to reach this state? Understood that there's a reference to another bug, but that bug is about a workflow to create a VM using the UI. It appears here that you were using a delete command from the CLI.

Comment 3 Guohua Ouyang 2019-12-09 02:51:42 UTC
I suspect this bug might be caused by bug 177954. I see this issue in two situation:
1. Set hostpath-provisioner as default in the cluster and create on UI, delete the vm on UI or cli, the VM is not deleted on UI and the action "oc delte vm xxx" is hanging on cli.
2. Due to bug 177954, it's able to create a VM by following its workaround, and then see this issue as well.

Maybe  the bug is gone after bug 177954 is fixed, but the VM cannot be deleted in certain situation is the issue I have seen.

Comment 4 Guohua Ouyang 2019-12-10 08:36:33 UTC
Didn't see this issue now.

Comment 5 sgott 2019-12-10 22:08:20 UTC
Guohua, to clarify your statement in Comment #4, are you saying you can't reproduce this bug now that https://bugzilla.redhat.com/show_bug.cgi?id=1739447 is fixed?

Comment 6 Guohua Ouyang 2019-12-11 05:31:55 UTC
I cannot reproduce it now, not sure it's related to bug 1739447 or bug 1779504.
When the issue exists, the VM cannot be deleted on UI or Cli.

Feel free to close this.

Comment 7 sgott 2019-12-11 13:12:25 UTC
Closing this based on Comment #6. Please re-open if it turns out this bug is reproducible.

Comment 8 Guohua Ouyang 2020-03-09 04:52:51 UTC
Created attachment 1668563 [details]
delete vm is hanging

Comment 9 Guohua Ouyang 2020-03-09 04:57:20 UTC
Reopen with a video attached.

Comment 10 sgott 2020-03-10 13:31:50 UTC
David, can you investigate this?

Comment 11 David Vossel 2020-03-10 16:44:46 UTC
Hey Guohua,

I looked through the video, it's not enough to determine what's going on though to prevent deletion.

Can you provide both virt-handler and virt-controller logs during the timeframe that this issue occurs please?

Thanks,
- David

Comment 12 Guohua Ouyang 2020-03-11 02:23:27 UTC
It seems the delete command is not posting to the API, there are no new logs in virt-api, virt-controller and virt-handler during the delete hanging.

@David, please ping me when you're online today, you can take a look at the environment.

Comment 13 David Vossel 2020-03-12 13:28:28 UTC
The issue is caused by the kubemacpool controller not removing the finalizer off of the VM.


{"level":"error","ts":1584018401.2259636,"logger":"VirtualMachine Controller","msg":"failed to remove the finalizer","virtualMachineName":"dsa","virtualMachineNamespace":"default","error":"admission webhook \"virt-template-admission.kubevirt.io\" denied the request: missing parent template (key=default/vm-template-example) for dsa","stacktrace":"github.com/k8snetworkplumbingwg/kubemacpool/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/k8snetworkplumbingwg/kubemacpool/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/k8snetworkplumbingwg/kubemacpool/pkg/controller/virtualmachine.(*ReconcilePolicy).removeFinalizerAndReleaseMac\n\t/go/src/github.com/k8snetworkplumbingwg/kubemacpool/pkg/controller/virtualmachine/virtualmachine_controller.go:159\ngithub.com/k8snetworkplumbingwg/kubemacpool/pkg/controller/virtualmachine.(*ReconcilePolicy).Reconcile\n\t/go/src/github.com/k8snetworkplumbingwg/kubemacpool/pkg/controller/virtualmachine/virtualmachine_controller.go:108\ngithub.com/k8snetworkplumbingwg/kubemacpool/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/k8snetworkplumbingwg/kubemacpool/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215\ngithub.com/k8snetworkplumbingwg/kubemacpool/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/k8snetworkplumbingwg/kubemacpool/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/k8snetworkplumbingwg/kubemacpool/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/k8snetworkplumbingwg/kubemacpool/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/k8snetworkplumbingwg/kubemacpool/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/k8snetworkplumbingwg/kubemacpool/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/k8snetworkplumbingwg/kubemacpool/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/k8snetworkplumbingwg/kubemacpool/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}


It's the virt-template-admission webhook that's causing the issue.

"admission webhook \"virt-template-admission.kubevirt.io\" denied the request: missing parent template (key=default/vm-template-example) for dsa"

That webhook is preventing the VM from being updated even though kubemacpool is just trying to remove a finalizer.

Comment 14 David Vossel 2020-03-12 13:37:06 UTC
Here's exactly where the issue is occurring in code.

https://github.com/kubevirt/kubevirt-template-validator/blob/master/pkg/webhooks/validating/uplink.go#L101

Part of the fix is that the kubevirt-template-validator should not be validating when the DeletionTimestamp != nil. It's unclear why the template validator should be doing any validation outside of the creation of the VM to begin with though.

Comment 15 David Vossel 2020-03-12 13:39:54 UTC
@stu Can you please re-assign this to someone working on the cnv templates. I can help review the patch upstream when it lands

Comment 18 Francesco Romani 2020-03-13 15:05:53 UTC
(In reply to David Vossel from comment #14)
> Here's exactly where the issue is occurring in code.
> 
> https://github.com/kubevirt/kubevirt-template-validator/blob/master/pkg/
> webhooks/validating/uplink.go#L101
> 
> Part of the fix is that the kubevirt-template-validator should not be
> validating when the DeletionTimestamp != nil. It's unclear why the template
> validator should be doing any validation outside of the creation of the VM
> to begin with though.

We started only with creation. Then we decided to validate also during updates to prevent (accidental) tampering of templates. See https://github.com/MarSik/kubevirt-ssp-operator/blob/master/roles/KubevirtTemplateValidator/templates/webhook.yaml.j2

Comment 19 Karel Šimon 2020-03-13 15:15:17 UTC
This PR should fix it (https://github.com/kubevirt/kubevirt-template-validator/pull/40). But as Francesco said in comment in that PR, we should decide how to fix it.

Comment 20 Ruth Netser 2020-06-29 10:22:17 UTC
Verified on CNV-2.4.0, OCP: 4.5.0-rc.2, SSP build 58:
1. From UI console create a VM template
2. Create a VM which will use the above template:
  kind: VirtualMachine
  metadata:
    labels:
      vm.kubevirt.io/template: <template name>
      vm.kubevirt.io/template.namespace: <template namespace>
3. Delete the template
4. Delete the VM

VM is successfully deleted, nothing is logged in template validator log.

Comment 23 errata-xmlrpc 2020-07-28 19:09:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3194


Note You need to log in before you can comment on or make changes to this bug.