Created attachment 1642517 [details] vm_delete_hang Description of problem: I meet the issue during verify bug 1739447, after the VM is created by following its steps, try to delete the VM, it hangs there forever. Cancel the operations the VM is not deleted. Version-Release number of selected component (if applicable): v2.2.0-10 How reproducible: Only for some VMs in special state. Steps to Reproduce: 1. See stesp in bug 1739447 2. 3. Actual results: 'oc delete vm xxxx' is hanging. Expected results: 'oc delete vm xxxx' delete the VM. Additional info: log snippet in virt-handler: {"component":"virt-handler","kind":"Domain","level":"warning","msg":"detected expired watchdog for domain","name":"fedora","namespace":"default","pos":"cache.go:114","timestamp":"2019-12-06T02:12:59.030384Z","uid":""} {"component":"virt-handler","level":"info","msg":"Processing event openshift-cnv/testurl","pos":"vm.go:788","timestamp":"2019-12-06T02:13:04.848351Z"} {"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: Running\n","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:790","timestamp":"2019-12-06T02:13:04.848440Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Domain status: Running, reason: Unknown\n","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:795","timestamp":"2019-12-06T02:13:04.848493Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"} {"component":"virt-handler","kind":"","level":"info","msg":"Grace period expired, killing deleted VirtualMachineInstance testurl","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:1137","timestamp":"2019-12-06T02:13:04.849414Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"} {"component":"virt-handler","level":"info","msg":"Processing event openshift-cnv/testurl","pos":"vm.go:788","timestamp":"2019-12-06T02:13:05.080429Z"} {"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: Running\n","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:790","timestamp":"2019-12-06T02:13:05.080517Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Domain status: Running, reason: Unknown\n","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:795","timestamp":"2019-12-06T02:13:05.080829Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"} {"component":"virt-handler","kind":"","level":"info","msg":"Grace period expired, killing deleted VirtualMachineInstance testurl","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:1137","timestamp":"2019-12-06T02:13:05.081591Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Received Domain Event of type MODIFIED","name":"testurl","namespace":"openshift-cnv","pos":"server.go:77","timestamp":"2019-12-06T02:13:05.084297Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Domain is in state Shutoff reason Destroyed","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:1582","timestamp":"2019-12-06T02:13:05.084552Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"} {"component":"virt-handler","level":"info","msg":"Processing event openshift-cnv/testurl","pos":"vm.go:788","timestamp":"2019-12-06T02:13:05.084713Z"} {"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: Running\n","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:790","timestamp":"2019-12-06T02:13:05.084748Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Domain status: Shutoff, reason: Destroyed\n","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:795","timestamp":"2019-12-06T02:13:05.084768Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"} {"component":"virt-handler","kind":"","level":"info","msg":"Signaled deletion for testurl","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:1166","timestamp":"2019-12-06T02:13:05.085256Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Received Domain Event of type DELETED","name":"testurl","namespace":"openshift-cnv","pos":"server.go:77","timestamp":"2019-12-06T02:13:05.088057Z","uid":""} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Domain deleted","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:1572","timestamp":"2019-12-06T02:13:05.088168Z","uid":""} {"component":"virt-handler","level":"info","msg":"Processing event openshift-cnv/testurl","pos":"vm.go:788","timestamp":"2019-12-06T02:13:05.175049Z"} {"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: Failed\n","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:790","timestamp":"2019-12-06T02:13:05.175161Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"} {"component":"virt-handler","level":"info","msg":"Domain does not exist","pos":"vm.go:797","timestamp":"2019-12-06T02:13:05.175184Z"} {"component":"virt-handler","kind":"Domain","level":"warning","msg":"watchdog file removed for domain","name":"testurl","namespace":"openshift-cnv","pos":"inotify.go:136","timestamp":"2019-12-06T02:13:05.175662Z","uid":""} {"component":"virt-handler","level":"info","msg":"Processing event openshift-cnv/testurl","pos":"vm.go:788","timestamp":"2019-12-06T02:13:05.175811Z"} {"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: Failed\n","name":"testurl","namespace":"openshift-cnv","pos":"vm.go:790","timestamp":"2019-12-06T02:13:05.175871Z","uid":"ae23bf8c-058f-461c-81ed-5797e6926ac9"} {"component":"virt-handler","level":"info","msg":"Domain does not exist","pos":"vm.go:797","timestamp":"2019-12-06T02:13:05.175888Z"} {"component":"virt-handler","level":"info","msg":"Processing event openshift-cnv/testurl","pos":"vm.go:788","timestamp":"2019-12-06T02:13:08.492948Z"} {"component":"virt-handler","level":"info","msg":"VMI does not exist","pos":"vm.go:792","timestamp":"2019-12-06T02:13:08.493058Z"} {"component":"virt-handler","level":"info","msg":"Domain does not exist","pos":"vm.go:797","timestamp":"2019-12-06T02:13:08.493136Z"} {"component":"virt-handler","kind":"Domain","level":"warning","msg":"detected expired watchdog for domain","name":"fedora","namespace":"default","pos":"cache.go:114","timestamp":"2019-12-06T02:13:14.030635Z","uid":""} {"component":"virt-handler","kind":"Domain","level":"warning","msg":"detected expired watchdog for domain","name":"fedora","namespace":"default","pos":"cache.go:114","timestamp":"2019-12-06T02:13:29.030059Z","uid":""} {"component":"virt-handler","kind":"Domain","level":"warning","msg":"detected expired watchdog for {"component":"virt-handler","kind":"Domain","level":"warnin
Thanks Guohua. Could you please clarify the steps you followed to reach this state? Understood that there's a reference to another bug, but that bug is about a workflow to create a VM using the UI. It appears here that you were using a delete command from the CLI.
I suspect this bug might be caused by bug 177954. I see this issue in two situation: 1. Set hostpath-provisioner as default in the cluster and create on UI, delete the vm on UI or cli, the VM is not deleted on UI and the action "oc delte vm xxx" is hanging on cli. 2. Due to bug 177954, it's able to create a VM by following its workaround, and then see this issue as well. Maybe the bug is gone after bug 177954 is fixed, but the VM cannot be deleted in certain situation is the issue I have seen.
Didn't see this issue now.
Guohua, to clarify your statement in Comment #4, are you saying you can't reproduce this bug now that https://bugzilla.redhat.com/show_bug.cgi?id=1739447 is fixed?
I cannot reproduce it now, not sure it's related to bug 1739447 or bug 1779504. When the issue exists, the VM cannot be deleted on UI or Cli. Feel free to close this.
Closing this based on Comment #6. Please re-open if it turns out this bug is reproducible.
Created attachment 1668563 [details] delete vm is hanging
Reopen with a video attached.
David, can you investigate this?
Hey Guohua, I looked through the video, it's not enough to determine what's going on though to prevent deletion. Can you provide both virt-handler and virt-controller logs during the timeframe that this issue occurs please? Thanks, - David
It seems the delete command is not posting to the API, there are no new logs in virt-api, virt-controller and virt-handler during the delete hanging. @David, please ping me when you're online today, you can take a look at the environment.
The issue is caused by the kubemacpool controller not removing the finalizer off of the VM. {"level":"error","ts":1584018401.2259636,"logger":"VirtualMachine Controller","msg":"failed to remove the finalizer","virtualMachineName":"dsa","virtualMachineNamespace":"default","error":"admission webhook \"virt-template-admission.kubevirt.io\" denied the request: missing parent template (key=default/vm-template-example) for dsa","stacktrace":"github.com/k8snetworkplumbingwg/kubemacpool/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/k8snetworkplumbingwg/kubemacpool/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/k8snetworkplumbingwg/kubemacpool/pkg/controller/virtualmachine.(*ReconcilePolicy).removeFinalizerAndReleaseMac\n\t/go/src/github.com/k8snetworkplumbingwg/kubemacpool/pkg/controller/virtualmachine/virtualmachine_controller.go:159\ngithub.com/k8snetworkplumbingwg/kubemacpool/pkg/controller/virtualmachine.(*ReconcilePolicy).Reconcile\n\t/go/src/github.com/k8snetworkplumbingwg/kubemacpool/pkg/controller/virtualmachine/virtualmachine_controller.go:108\ngithub.com/k8snetworkplumbingwg/kubemacpool/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/k8snetworkplumbingwg/kubemacpool/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215\ngithub.com/k8snetworkplumbingwg/kubemacpool/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/k8snetworkplumbingwg/kubemacpool/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/k8snetworkplumbingwg/kubemacpool/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/k8snetworkplumbingwg/kubemacpool/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/k8snetworkplumbingwg/kubemacpool/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/k8snetworkplumbingwg/kubemacpool/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/k8snetworkplumbingwg/kubemacpool/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/k8snetworkplumbingwg/kubemacpool/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"} It's the virt-template-admission webhook that's causing the issue. "admission webhook \"virt-template-admission.kubevirt.io\" denied the request: missing parent template (key=default/vm-template-example) for dsa" That webhook is preventing the VM from being updated even though kubemacpool is just trying to remove a finalizer.
Here's exactly where the issue is occurring in code. https://github.com/kubevirt/kubevirt-template-validator/blob/master/pkg/webhooks/validating/uplink.go#L101 Part of the fix is that the kubevirt-template-validator should not be validating when the DeletionTimestamp != nil. It's unclear why the template validator should be doing any validation outside of the creation of the VM to begin with though.
@stu Can you please re-assign this to someone working on the cnv templates. I can help review the patch upstream when it lands
(In reply to David Vossel from comment #14) > Here's exactly where the issue is occurring in code. > > https://github.com/kubevirt/kubevirt-template-validator/blob/master/pkg/ > webhooks/validating/uplink.go#L101 > > Part of the fix is that the kubevirt-template-validator should not be > validating when the DeletionTimestamp != nil. It's unclear why the template > validator should be doing any validation outside of the creation of the VM > to begin with though. We started only with creation. Then we decided to validate also during updates to prevent (accidental) tampering of templates. See https://github.com/MarSik/kubevirt-ssp-operator/blob/master/roles/KubevirtTemplateValidator/templates/webhook.yaml.j2
This PR should fix it (https://github.com/kubevirt/kubevirt-template-validator/pull/40). But as Francesco said in comment in that PR, we should decide how to fix it.
Verified on CNV-2.4.0, OCP: 4.5.0-rc.2, SSP build 58: 1. From UI console create a VM template 2. Create a VM which will use the above template: kind: VirtualMachine metadata: labels: vm.kubevirt.io/template: <template name> vm.kubevirt.io/template.namespace: <template namespace> 3. Delete the template 4. Delete the VM VM is successfully deleted, nothing is logged in template validator log.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3194